Talk:Semantic Web: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Tom Morris
m (Talk:Semantic web moved to Talk:Semantic Web over redirect)
imported>Pat Palmer
(→‎References: adding a grouse)
 
(8 intermediate revisions by 4 users not shown)
Line 1: Line 1:
{{subpages}}
{{subpages}}
== Excellent Eduzendium work ==
I'm enjoying the added content in this article, and learning from it
A mechanical observation: starting from an existing stub that has all the CZ metadata and overhead in place, a standard lede, etc., may be a very promising way to do Eduzendium.
If you can split the code lines so they don't have to scroll horizontally, that would be desirable, but I don't know if the example is allowed to split to multiple lines. [[User:Howard C. Berkowitz|Howard C. Berkowitz]] 18:28, 10 August 2010 (UTC)
: Thanks! I followed your advice on line-splitting (the syntax is okay) and tried to simplify some of the language. There's still some cleaning up to do before tomorrow evening (due date). I can understand if CZ editors may wish to move off some of the technical stuff to separate pages, we just wanted to cover a bit of everything.
: I'm not sure what your mechanical observation referred to, and if we did it correctly or not. [[User:Blake Willmarth|Blake Willmarth]] 04:14, 13 August 2010 (UTC)
::My observation meant you did something wise: starting with an article that had the proper CZ formatting rather than fight the learning curve. Your efforts, which are appreciated, went into adding meaningful content. Thanks! 04:47, 13 August 2010 (UTC)
== 'Triplestore' and other observations ==
In §2.1, the article states that "triplestore" is the "data convention" for the Semantic Web. This is kind of a clumsy way of putting it. You can have triples that aren't in a triplestore: the triples exist in a conceptual space before actually being codified into a representation or put into a triplestore.
Rather, I'd say that the concept of triples underlies RDF - although a few other things underly RDF, and to understand RDF you need to understand the constraints placed on the triple: namely, the data types that can be put into each of the three positions in the triple (subjects containing URIs or BNodes, predicates containing URIs and objects being URIs, BNodes, untyped literals, untyped literals with language tags or typed literals). The set of those triples put together can produce a graph, and you can store one or more triples in a triplestore - although the triplestore may add further constraints on what gets put into it - like that the triple must be affiliated with a graph. Again, you can have RDF that isn't in a triplestore - in fact, RDF that is stored in a file (say, XML or RDFa) - is not necessarily in a triplestore, just as data in a CSV file or a spreadsheet needn't be loaded into a traditional database. To use that analogy, you can have a relational model without it existing in an actual database - it may exist on a scrap of paper or just inside the mind of the programmer.
"Although using RDF is compact, it is not easily human readable." This is not true because it conflates RDF with one particular '''serialization''' of RDF - namely RDF/XML. RDF is just the data-model. It is true that RDF/XML isn't particularly human readable, but to say that RDF isn't easily readable by humans is mistaken. I know people who can read RDF/XML perfectly well! (I may be the rare exception there.) But rather the point is that there are other RDF serializations - Notation3, Turtle, TriX and so on. One could even produce a graphical RDF serialization that used basically vectors and a graphical format to structure triples - and that would be very easy-to-read.
A few other issues:
Facebook's Open Graph Protocol is a proper name and ought to be capitalised.
I'm also a bit unsure about the section on microformats. It says that the Semantic Web is closely tied to microformats. I don't buy that: opinions about the Semantic Web differ within the microformats community (and vice versa) - many see the Semantic Web as being antithetical to the approach of microformats - they say it is too academic, not driven by the practical realities of the web and so on, and reject a number of the philosophical and practical ideals of the "upper-case Semantic Web" community (as they say) - namely, they reject things like namespaces and the separation of syntax from semantics. They also see that many implementations of RDF on the web fail the "don't repeat yourself" principle that they hold dear. Indeed, microformats are put forward by some as an ''alternative'' to RDF-based data on the web. So, to say they are "closely tied" seems to be wrong in my experience. –[[User:Tom Morris|Tom Morris]] 16:46, 13 August 2010 (UTC)
==Pat's review of this article==
I appreciate this detailed and thoughtful beginning (though it is still unpolished and possibly would benefit from some restructuring).  In the following subsections, I  am recording details notes about things I noticed in the version available at the end of the course:[[User:Pat Palmer|Pat Palmer]] 19:16, 18 August 2010 (UTC)
===Introductory section===
The intro is simply too long, too stringy, and it's not easy to determine what the focus of the article will be.  IMO, in addition to somehow briefly defining the notion of SW, the intro needs to clearly state WHEN the idea was proposed, that there has been widespread screpticism about the idea, and yet state that SW is no longer a "pipe dream", but rather that it proponents have recently forged ahead (in the teeth of nay-sayers) so that tools and standards are already being used to practical effect.  All in just a couple of paragraphs, too.  I would place, in sections directly below the intro, the parts (now at the end of the article) explaining where SW is being used, and supplementing these paragraphs with screen shots of information displayable as a results of SW--to add flesh to the bones, so to speak.  The first part of the article is for the intelligent lay person who wants to get a sense of what it is.[[User:Pat Palmer|Pat Palmer]] 20:05, 18 August 2010 (UTC)
===Technology sections===
I kept a list of jargon and terminology mentioned, if not always explained, as part of the tools and standards for SW.  The list is very long (couple of dozen words, not to mention some fairly obscure product names).  The technology explanations ("here's how it's being done") need to be grouped together into a single uber-section, following after the opening sections that just say (from a lay person's point of view) "here's what it is, and where it is being used". [[User:Pat Palmer|Pat Palmer]] 20:05, 18 August 2010 (UTC)
===Tech writing style===
In terms of actual writing style, there were a number of phrases that I recommend we edit out.  The frequency of these type phrases makes them seem worth mentioning here:
* it is probably not good to use "I" in wiki articles, since they are supposed to be collaborative
* "rather simple" (or the word "simple" anywhere in tech. writing) should probably be omitted; what is "simple" for one person may not be simple to another, and saying a thing is simple can lead to bad feelings from some readers
* along the same lines as above, "easy to" might better be reworded as "possible to"
* "is perfect for" could be reworded to something like "may be useful for" (as this is a matter of opinion, thus it's tactful to be less forceful about it)
* "will mean" could be reworded to something like "may result in" (as the "meaning of things" is often subject to interpretation)
Finally, in terms of writing style, I recommend against generalizations and theory, if a good example can be found instead.  I have marked these issues on my paper copy and will copy-edit some of them if I get time, but I wanted to explain this for future development reasons.
===Balance and emphasis===
Although I'm not sure exactly how to do it, I think the article would benefit from more clarity about the underlying assumption that assembling data for the Semantic Web is collaborative and may only succeed in situations where a certain trust or oversight is present (to prevent gimmicking the system to drive traffic to sites for bad reasons).  There are hints of this here and there, yet I don't think an intelligent lay reader would come to understand this matter without further study.  This is the kind of realization that is poorly elucidated in most of the writings I've read on SW (or maybe it's just kind of mentioned "in passing").  I think CZ's article could differentiate itself by bringing more clarity to this, somewhere in the early sections of the article (before diving into the dirty technical details where not all readers will want to follow).[[User:Pat Palmer|Pat Palmer]] 20:05, 18 August 2010 (UTC)
===Important insights and concepts buried here and there===
Sometimes while reading, I happened upon important, even starting, concepts buried among a pile of details, and I would like to see these concepts highlighted a little more.  They may even be original to these student authors.  For example, in the "Triplestore" section, there is the fact that RDF/XML N-Triples (because they are XML and thus plain text) travel through firewalls.  This has important implications for the practicality of SW (and it would not have been possible until after maybe 2000 when XML tools became widely available in programming languages).  IMO this lends weight to the possible usefulness of SW tools.  The dependence of SW growth on altruism is another such insight.[[User:Pat Palmer|Pat Palmer]] 20:05, 18 August 2010 (UTC)
===Standards effort===
I wonder if there has been any overlap in the Semantic Web standards effort and the efforts to grow HTML5 (and maybe also XHTML).  Possibly there has been little, if any, in which case, the accomplishments to date of the determined and altruistic developers (who forged ahead despite the nay-sayers) is all the more impressive.
===Drupal===
Halt the presses!  IMO, the push to incorporate some "automatic" SW functionality in Drupal 7 deserves to be highlighted (in brief) near the very beginning of the article.[[User:Pat Palmer|Pat Palmer]] 20:05, 18 August 2010 (UTC)
===Ontology section===
I couldn't tell if this is a tool, a standard, or just a definition.  The section got gutted and didn't quite survive in a state of repair.[[User:Pat Palmer|Pat Palmer]] 20:05, 18 August 2010 (UTC)
===Issues and criticism===
I like this section.  It's existence needs (IMO) to be noted somewhere near the beginning of the article.  Maybe it would be helpful just to note, in the introductory definition of Semantic Web, that it is a controversial technology (foreshadowing this later section).[[User:Pat Palmer|Pat Palmer]] 20:05, 18 August 2010 (UTC)
===References===
I'd like to see DOI's on the "deep web" references wherever possible.  Otherwise, the links may die and become useless.  Also, I feel that this article would benefit from having taken advantage of the many refereed journals (particularly in library science) which have articles regarding the topic.  I did one search in ABI/Inform and got 44 hits, several of which looked promising but did not appear to have been mined for this article.  Here is one example: ''Wimalasuriya, D., & Dou, D.. (2010). Ontology-based information extraction: An introduction and a survey of current approaches. Journal of Information Science, 36(3), 306.  Retrieved August 19, 2010, from ABI/INFORM Global. (Document ID: 2048138451)'', and here is its abstract:
:"Information extraction (IE) aims to retrieve certain types of information from natural language text by processing them automatically. For example, an IE system might retrieve information about geopolitical indicators of countries from a set of web pages while ignoring other types of information. Ontology-based information extraction (OBIE) has recently emerged as a subfield of information extraction. Here, ontologies - which provide formal and explicit specifications of conceptualizations - play a crucial role in the IE process. Because of the use of ontologies, this field is related to knowledge representation and has the potential to assist the development of the Semantic Web. In this paper, we provide an introduction to ontology-based information extraction and review the details of different OBIE systems developed so far. We attempt to identify a common architecture among these systems and classify them based on different factors, which leads to a better understanding on their operation. We also discuss the implementation details of these systems including the tools used by them and the metrics used to measure their performance. In addition, we attempt to identify the possible future directions for this field".[[User:Pat Palmer|Pat Palmer]] 20:05, 18 August 2010 (UTC)

Latest revision as of 19:53, 19 August 2010

This article is a stub and thus not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
 
To learn how to update the categories for this article, see here. To update categories, edit the metadata template.
 Definition Tim Berners-Lee's concept of a "web of knowledge", whereby web-based document contents would be annotated and classified so that computers can parse the classifications and provide search results based on the semantic information (what the content means), rather than simply on matching of text strings. [d] [e]
Checklist and Archives
 Workgroup categories Computers and Library_and_Information_Science [Please add or review categories]
 Talk Archive none  English language variant American English

Excellent Eduzendium work

I'm enjoying the added content in this article, and learning from it

A mechanical observation: starting from an existing stub that has all the CZ metadata and overhead in place, a standard lede, etc., may be a very promising way to do Eduzendium.

If you can split the code lines so they don't have to scroll horizontally, that would be desirable, but I don't know if the example is allowed to split to multiple lines. Howard C. Berkowitz 18:28, 10 August 2010 (UTC)

Thanks! I followed your advice on line-splitting (the syntax is okay) and tried to simplify some of the language. There's still some cleaning up to do before tomorrow evening (due date). I can understand if CZ editors may wish to move off some of the technical stuff to separate pages, we just wanted to cover a bit of everything.
I'm not sure what your mechanical observation referred to, and if we did it correctly or not. Blake Willmarth 04:14, 13 August 2010 (UTC)
My observation meant you did something wise: starting with an article that had the proper CZ formatting rather than fight the learning curve. Your efforts, which are appreciated, went into adding meaningful content. Thanks! 04:47, 13 August 2010 (UTC)

'Triplestore' and other observations

In §2.1, the article states that "triplestore" is the "data convention" for the Semantic Web. This is kind of a clumsy way of putting it. You can have triples that aren't in a triplestore: the triples exist in a conceptual space before actually being codified into a representation or put into a triplestore.

Rather, I'd say that the concept of triples underlies RDF - although a few other things underly RDF, and to understand RDF you need to understand the constraints placed on the triple: namely, the data types that can be put into each of the three positions in the triple (subjects containing URIs or BNodes, predicates containing URIs and objects being URIs, BNodes, untyped literals, untyped literals with language tags or typed literals). The set of those triples put together can produce a graph, and you can store one or more triples in a triplestore - although the triplestore may add further constraints on what gets put into it - like that the triple must be affiliated with a graph. Again, you can have RDF that isn't in a triplestore - in fact, RDF that is stored in a file (say, XML or RDFa) - is not necessarily in a triplestore, just as data in a CSV file or a spreadsheet needn't be loaded into a traditional database. To use that analogy, you can have a relational model without it existing in an actual database - it may exist on a scrap of paper or just inside the mind of the programmer.

"Although using RDF is compact, it is not easily human readable." This is not true because it conflates RDF with one particular serialization of RDF - namely RDF/XML. RDF is just the data-model. It is true that RDF/XML isn't particularly human readable, but to say that RDF isn't easily readable by humans is mistaken. I know people who can read RDF/XML perfectly well! (I may be the rare exception there.) But rather the point is that there are other RDF serializations - Notation3, Turtle, TriX and so on. One could even produce a graphical RDF serialization that used basically vectors and a graphical format to structure triples - and that would be very easy-to-read.

A few other issues:

Facebook's Open Graph Protocol is a proper name and ought to be capitalised.

I'm also a bit unsure about the section on microformats. It says that the Semantic Web is closely tied to microformats. I don't buy that: opinions about the Semantic Web differ within the microformats community (and vice versa) - many see the Semantic Web as being antithetical to the approach of microformats - they say it is too academic, not driven by the practical realities of the web and so on, and reject a number of the philosophical and practical ideals of the "upper-case Semantic Web" community (as they say) - namely, they reject things like namespaces and the separation of syntax from semantics. They also see that many implementations of RDF on the web fail the "don't repeat yourself" principle that they hold dear. Indeed, microformats are put forward by some as an alternative to RDF-based data on the web. So, to say they are "closely tied" seems to be wrong in my experience. –Tom Morris 16:46, 13 August 2010 (UTC)

Pat's review of this article

I appreciate this detailed and thoughtful beginning (though it is still unpolished and possibly would benefit from some restructuring). In the following subsections, I am recording details notes about things I noticed in the version available at the end of the course:Pat Palmer 19:16, 18 August 2010 (UTC)

Introductory section

The intro is simply too long, too stringy, and it's not easy to determine what the focus of the article will be. IMO, in addition to somehow briefly defining the notion of SW, the intro needs to clearly state WHEN the idea was proposed, that there has been widespread screpticism about the idea, and yet state that SW is no longer a "pipe dream", but rather that it proponents have recently forged ahead (in the teeth of nay-sayers) so that tools and standards are already being used to practical effect. All in just a couple of paragraphs, too. I would place, in sections directly below the intro, the parts (now at the end of the article) explaining where SW is being used, and supplementing these paragraphs with screen shots of information displayable as a results of SW--to add flesh to the bones, so to speak. The first part of the article is for the intelligent lay person who wants to get a sense of what it is.Pat Palmer 20:05, 18 August 2010 (UTC)

Technology sections

I kept a list of jargon and terminology mentioned, if not always explained, as part of the tools and standards for SW. The list is very long (couple of dozen words, not to mention some fairly obscure product names). The technology explanations ("here's how it's being done") need to be grouped together into a single uber-section, following after the opening sections that just say (from a lay person's point of view) "here's what it is, and where it is being used". Pat Palmer 20:05, 18 August 2010 (UTC)

Tech writing style

In terms of actual writing style, there were a number of phrases that I recommend we edit out. The frequency of these type phrases makes them seem worth mentioning here:

  • it is probably not good to use "I" in wiki articles, since they are supposed to be collaborative
  • "rather simple" (or the word "simple" anywhere in tech. writing) should probably be omitted; what is "simple" for one person may not be simple to another, and saying a thing is simple can lead to bad feelings from some readers
  • along the same lines as above, "easy to" might better be reworded as "possible to"
  • "is perfect for" could be reworded to something like "may be useful for" (as this is a matter of opinion, thus it's tactful to be less forceful about it)
  • "will mean" could be reworded to something like "may result in" (as the "meaning of things" is often subject to interpretation)

Finally, in terms of writing style, I recommend against generalizations and theory, if a good example can be found instead. I have marked these issues on my paper copy and will copy-edit some of them if I get time, but I wanted to explain this for future development reasons.

Balance and emphasis

Although I'm not sure exactly how to do it, I think the article would benefit from more clarity about the underlying assumption that assembling data for the Semantic Web is collaborative and may only succeed in situations where a certain trust or oversight is present (to prevent gimmicking the system to drive traffic to sites for bad reasons). There are hints of this here and there, yet I don't think an intelligent lay reader would come to understand this matter without further study. This is the kind of realization that is poorly elucidated in most of the writings I've read on SW (or maybe it's just kind of mentioned "in passing"). I think CZ's article could differentiate itself by bringing more clarity to this, somewhere in the early sections of the article (before diving into the dirty technical details where not all readers will want to follow).Pat Palmer 20:05, 18 August 2010 (UTC)

Important insights and concepts buried here and there

Sometimes while reading, I happened upon important, even starting, concepts buried among a pile of details, and I would like to see these concepts highlighted a little more. They may even be original to these student authors. For example, in the "Triplestore" section, there is the fact that RDF/XML N-Triples (because they are XML and thus plain text) travel through firewalls. This has important implications for the practicality of SW (and it would not have been possible until after maybe 2000 when XML tools became widely available in programming languages). IMO this lends weight to the possible usefulness of SW tools. The dependence of SW growth on altruism is another such insight.Pat Palmer 20:05, 18 August 2010 (UTC)

Standards effort

I wonder if there has been any overlap in the Semantic Web standards effort and the efforts to grow HTML5 (and maybe also XHTML). Possibly there has been little, if any, in which case, the accomplishments to date of the determined and altruistic developers (who forged ahead despite the nay-sayers) is all the more impressive.

Drupal

Halt the presses! IMO, the push to incorporate some "automatic" SW functionality in Drupal 7 deserves to be highlighted (in brief) near the very beginning of the article.Pat Palmer 20:05, 18 August 2010 (UTC)

Ontology section

I couldn't tell if this is a tool, a standard, or just a definition. The section got gutted and didn't quite survive in a state of repair.Pat Palmer 20:05, 18 August 2010 (UTC)

Issues and criticism

I like this section. It's existence needs (IMO) to be noted somewhere near the beginning of the article. Maybe it would be helpful just to note, in the introductory definition of Semantic Web, that it is a controversial technology (foreshadowing this later section).Pat Palmer 20:05, 18 August 2010 (UTC)

References

I'd like to see DOI's on the "deep web" references wherever possible. Otherwise, the links may die and become useless. Also, I feel that this article would benefit from having taken advantage of the many refereed journals (particularly in library science) which have articles regarding the topic. I did one search in ABI/Inform and got 44 hits, several of which looked promising but did not appear to have been mined for this article. Here is one example: Wimalasuriya, D., & Dou, D.. (2010). Ontology-based information extraction: An introduction and a survey of current approaches. Journal of Information Science, 36(3), 306. Retrieved August 19, 2010, from ABI/INFORM Global. (Document ID: 2048138451), and here is its abstract:

"Information extraction (IE) aims to retrieve certain types of information from natural language text by processing them automatically. For example, an IE system might retrieve information about geopolitical indicators of countries from a set of web pages while ignoring other types of information. Ontology-based information extraction (OBIE) has recently emerged as a subfield of information extraction. Here, ontologies - which provide formal and explicit specifications of conceptualizations - play a crucial role in the IE process. Because of the use of ontologies, this field is related to knowledge representation and has the potential to assist the development of the Semantic Web. In this paper, we provide an introduction to ontology-based information extraction and review the details of different OBIE systems developed so far. We attempt to identify a common architecture among these systems and classify them based on different factors, which leads to a better understanding on their operation. We also discuss the implementation details of these systems including the tools used by them and the metrics used to measure their performance. In addition, we attempt to identify the possible future directions for this field".Pat Palmer 20:05, 18 August 2010 (UTC)