Dave's Diary: Semantic Tangle

A couple of years ago an on-line acquaintance put me on to something called the Semantic Web. This is something that Tim Berners-Lee, the father of the existing World Wide Web, believes to be the future of the Internet.
Currently information on the web is easy for human beings to understand, but much more difficult for computers to understand. The Semantic Web seeks to wrap information in ordinary web pages inside semantic tags, so that computers can understand the information as easily as humans. Theoretically this allows the web to ecome a massive database that can be mined for information, and for people and organisations to exchange information more easily.

In order for a computer to understand a word on a web page, it must have some way of knowing its meaning. Humans can usually guess the meaning of a word through its context, but computers are currently unable to do this easily. Semantics is the study of meaning. In order to help computers understand specific words we have to provide hints. We can do this by wrapping significant words in semantic tags.

Having decided its a good idea to wrap important words in semantic tags, we face another problem. Standards. How do we develop a set of standard semantic tags that can be readily understood by everyone and, even more importantly, every machine?

The answer so far is to allow people to develop an ontology that describes their own area of interest, and to share it with others in a standard way. At present the way people do that is using the Resource Description Framework (RDF).

This is where the Semantic Web starts to get tangled. Anyone can develop an ontology to describe something, and that ontology can drag in ontologies developed by other people. My own experience over the past few weeks is that no existing ontology seems to describe what I want to record, although several describe parts of it. So it looks like I'm going to have to roll my own, yet another highly specific ontology. Odds are it will meet my needs, but not the needs of others.

Another tangled strand of the web concerns how people embed semantic information in web pages. It sounds simple, but it doesn't work out that way in practice. There are various competing "standards" out there, each with their own advantages and disadvantages.

The ideal semantic tagging system would allow semantic information to be included transparently in web pages. It could be added easily to existing pages, and the semantically tagged pages would look exactly the same in web browsers after tagging as they did before. The only difference is that a computer could easily parse and extract significant data from the page, discarding all the irrelevant padding that humans find so impressive.

This ideal tagging system doesn't exist. Some tagging systems change the presentation of the page. Others introduce elements that upset some web browser programs. Most require a perfectly formed XHTML web page or the machine trying to read them spits the dummy and crashes. Most existing web pages are far from perfectly formed XHTML, and to make things even worse, many older web browsers spit the dummy when they encounter XHTML!

So, at the state of the art, even if we write web pages for the Semantic Web according to the best standards available, they may crash existing web browsers and hardly anyone will be able to make use of them.

But they say a journey of a thousand miles begins with a single step, so I took the step.

My Australian Politics Resource has always contained a section that describes political organisations. For the past few months the pages in the organisations section have been dynamically created from information stored in a database. That made it easy for me to wrap the name, description and website details of each organisation in semantic tags and make my resource part of the Semantic Web. Each page describing an organisation now contains an icon

letting people know semantic information is available within the source.

As I said above, picking an ontology to use is a tangled web in itself. The one I picked to take my first step on the journey was FOAF. This stands for friend of a friend. Not surprisingly it was first developed for social networking. FOAF describes people primarily, but it can be used to describe organisations. The listing for Amnesty International on my website now looks like this:

Amnesty International

Amnesty International (AI) is a worldwide movement of people who campaign for internationally recognized human rights.

Website:http://www.amnesty.org/

To see what this looks like to a machine capable of esxtracting the semantically tagged information, click here. Firefox users can see it directly and view the source, IE and other users may have to save the file locally and open it in a text editor.

See what I mean about how tangled the Semantic Web is at present?

Dave's Diary

My Website

Saturday, January 26, 2008

Semantic Tangle

Amnesty International

No comments: