Thursday, November 1, 2007
When people ask whether FirstRain is contributing to the semantic web (which some of my more plugged in friends do) I have to first ask what they mean.
There seem to be several definitions and misperceptions out there. The Holy Grail is clearly a web where machines can discover and use content to infer new information without human intervention. But how we achieve this is where the misunderstandings arise. There are many techniques—from author-added, in-document markup to computer-extracted, database archived metadata—and each one will probably contribute to the final goal in the end.
The most commonly discussed idea is to encourage authors to add “semantic markup” to their HTML. This which would allow computers to crawl pages and understand context; for instance, that obscure numbers are actually temperatures or words are actually cities. This solution seems most popular among a fastidious minority. More realistically, tools can be built to allow motivated volunteers to create markup which overlays existing pages. This solution, although distributed, is still unlikely to track many valuable new documents.
From another perspective, special sites have been developed to store structured information and distribute the labor of database maintenance among volunteers (e.g. metaweb.com) wikipedia-like. Even more ingeniously, software could be written which automatically parses existing pages and extracts information, allowing one to find and consume the data as desired (e.g. Powerset.com). This might allow for automatic population of those previously mentioned databases of knowledge, avoiding the armies of human volunteers and supporting the extraction of data in almost real time; but, this raises questions about the data’s quality and the differences between human and machine fallibility.
So, where does FirstRain fit into this? We specialize in identifying the subtle qualities of new information, like relevance, obscurity, and investment value which have real meaning to our market. We’re firmly part of the next generation of tools which leverage combinations of textual information, semantic markup, and semantic database resources to sift through newly published data and deliver relevant results to our users.