Wednesday, October 24, 2007

Semantic Wishfull Thinking? Or Semantics for turing lead into gold?

I'm seeing quite the meme these days on the 'Semantic Web' as a way to build the next big thing (See Twine, AdaptiveBlue, more). The essence of the Semantic Web is the markup of knowledge in such a way as to enable machines to reason about it.

The idea of having every HTML page you download contain markup that enables a smart web browser or search engine to know that you are looking for (or browsing about) Anthrax the UK punk band, the US heavy metal band, the fly, or the toxin. This vision is basically one of the Structured Web.

There are issues in my mind:

1) The Semantic Web has been around for years. During all those years the content of the web grew from nearly nothing to the mountain of (mostly unstructured) goo we all browse daily. Why/How will all that knowledge be 'structured'?

Take Home: People do not want to 'structure' knowledge themselves. They are writing their content for people and not machines (except the SEO people).

2) Formally structured data is an OLD idea in AI. See expert systems. How will the 'semantic web' over come the basic problem that structuring human knowledge is DAMN hard. And by hard I mean making it consistent (this is what mostly broke expert systems).

Have you been following what Cyc Corp has been doing since 1984? Attempting to structure human knowledge. These guys have invented whole new ways of representing human knowledge.. where is it on the web? Can anyone tell me an application that uses it? I am very certain that the CycCorp guys could (and likely have) a way to export their databases into RDF, OWL, etc.

Also.. the old white-haired guys of AI invented various forms of Semantics and 'Knowledge Representation' way-back in AI history (see chapter 10 of Russell-Norvig).

Take Home: Once you have the knowledge structured and embedded, what happens next? Magic? Merely inventing a representation of knowledge relies on the 'if you build it they will come' doctrine of AI.. which has NEVER been true.

3) Reasoning with said structured knowledge is unsolved in general. Given a specific knowledgebase (or Semantic database) and specific questions (or semantic queries) systems can reason about the question and deliver results.. but it's still a garbage-in-garbage-out world.

This is especially true when most people really expect a search engine to read their minds (Sorry Udi - I agree with Greg) or they tend to give up on their search queries.

How do we prevent such systems from becoming SEO spammed? I suppose a reputation system on the source of semantic markup data could be created.

Take Home: How in the hell do I build a search engine that uses Semantics that really understands what I am looking for and delivers me the Answer? Such a system pretty much is an AI Oracle.

Ok.. enough with the half-empty-glass negativity! What can we really do with the Semantic web NOW?

For sure we can build a semantically enhanced 'filter' of the web. Google/Yahoo/MSN/Ask are great, but in the end the are giant databases that serve you up link-graph weighted & keyword-filtered URLs.

However, if you are trying to build a money making business, a new search box that returns URLs seems like an insane idea.. unless you can co-opt the browser and augment the results that the big-boys are returning (See Search Radar). Or pull a StumbleUpon strategy.

For a business, the Semantic Web is a potential tool in a step along the path in creating a valuable application. Remember your history here.. creating a giant repository and/or formal structure of knowledge will not alone result in something novel.. nor is using it required to create novelty in AI.

I'd probably make the argument that delicious itself (and similar data) is a growing embodiment of a user-generated database that clever software could derive semantic-data from.

I am NOT arguing that the semantic web is a bad idea... but be careful of the hype you read. The Semantic Web is merely the first step (and a hard one) at stitching together knowledge in a way that can be usefully used to reason. The S-M is as necessary for a smarter web as databases are for useful applications... yet the database is the data-store and NOT the application logic.

No comments: