Sunday, May 14, 2006

The Semantic Web

I know that this is an important topic. I don't full understand it, but intend on learning more about it. It is / will be a significant part of the Google Algorithm.

The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation. The first steps in weaving the Semantic Web into the structure of the existing Web are already under way. In the near future, these developments will usher in significant new functionality as machines become much better able to process and "understand" the data that they merely display at present.

The essential property of the World Wide Web is its universality. The power of a hypertext link is that "anything can link to anything." Web technology, therefore, must not discriminate between the scribbled draft and the polished performance, between commercial and academic information, or among cultures, languages, media and so on. Information varies along many axes. One of these is the difference between information produced primarily for human consumption and that produced mainly for machines. At one end of the scale we have everything from the five-second TV commercial to poetry. At the other end we have databases, programs and sensor output. To date, the Web has developed most rapidly as a medium of documents for people rather than for data and information that can be processed automatically. The Semantic Web aims to make up for this.

Like the Internet, the Semantic Web will be as decentralized as possible. Such Web-like systems generate a lot of excitement at every level, from major corporation to individual user, and provide benefits that are hard or impossible to predict in advance. Decentralization requires compromises: the Web had to throw away the ideal of total consistency of all of its interconnections, ushering in the infamous message "Error 404: Not Found" but allowing unchecked exponential growth.

What is the Semantic Web?

The semantic web is an ongoing project, currently headed by the creator of the World Wide Web, Tim Berners-Lee. It attempts to address some of the semantic shortcomings of the current web, by introducing a much more versatile and dynamic markup to the documents that comprise the network.

HTML, the markup language used for the bulk of the World Wide Web, has a very limited set of tags which may be used to tell a parser what the text within those tags is meant to be. For the most part, the tags used in HTML are used for various layout and stylistic instructions — with a move towards having HTML focus exclusively on structural markup, and Cascading Style Sheets address style issues. There are a few exceptions to this, such as the header tags (title, meta, etc) and embedded object tags.

The semantic web, however, utilizes XML (as well as other technologies, such as OWL and RDF), to give information of essentially unlimited detail. Some of Lee's ideas for this include the ability to embed information about links within the links themselves: adding metadata to each link indicating the title of the webpage, perhaps a rating, the nature of the relationship between the two people doing the linking, and an assortment of other informational tidbits.

A fundamental aspect of the semantic web is the linking of data to centralized data-points, rather than independently defining it. For example, currently if I want my text to all appear red, I would indicate: font color=red, or perhaps use a style sheet to specify: p { color: red; }. With the semantic web I would indicate something akin to: font, and the color would then be retrieved from a central point.

The semantic web is being built and spreading beneath the structure of the existing web. Anyone who has seen a Creative Commons license has seen the semantic web in action: the copyrights themselves are kept at a central database where they are stored, and in turn websites utilizing a Creative Commons license link to that database entry, rather than simply replicating the license themselves. If you examine the markup of a webpage using one of these licenses, you will see a number of tags hidden from normal sight, intended to let search-engine crawlers, news feeders, and other automated tools actually understand the copyright you have placed on your site.

Ultimately, the future of the semantic web looks bright. With more and more sites integrating small semantic components, such as copyright information, meta keywords, creator names, and global coordinates, it would appear to simply be too good an idea to fail.

