Wednesday, December 5, 2007

To auto-categorize or human tag?

It’s not obvious how to make people and machines work well together. We deal with the interface between targeted human effort and computer automation every day at FirstRain – and our technology for categorizing content is an integrated mix of human analysts and algorithms. It helps that we have a fairly uniform set of customers (investment managers, analysts, CxOs and marketing types) when compared to the challenge the consumer search companies face. But because we have a wide diversity of customer interests within that base, we filter, sort, and tag documents with a variety of techniques from the purely human, librarian-like research to the purely mechanical, algorithm-based tagging.

In the history of consumer search applications, this dichotomy is interesting. Yahoo’s early success was built on a human generated directory. It was extremely effective in an early, smaller web. A map was both feasible and necessary since the web was smaller and people hadn’t yet embraced the web’s breadth. The Open Directory Project (DMOZ), was (and is) an attempt to duplicate Yahoo’s human effort through a team of volunteers. Google, of course, is almost entirely automated. Attempting to swing the pendulum back is Mahalo, a “human-powered” search engine based on the belief that the focus of the list of keyword search terms is short enough, and that human common sense can create better results than Google for this small number of commonly searched terms.

FirstRain solves a different problem – but an adjacent one. We support very demanding clients who require precise information across differentiated sources about global markets, events, and trends. Our technology is honed to identify these documents, allowing us to ignore much of what consumer tools must digest, and to deliver extremely high quality and targeted results in as close to real time as possible.

So we wrestle the human/machine interface every day – and incrementally improve our technology and methodology to scale and still get the best results possible in information discovery.

No comments:

There was an error in this gadget