Saturday, July 14, 2007
Since the term “Long Tail” was first coined by Chris Anderson in Wired Magazine three years ago, many articles have been written about new businesses developing using technology to solve the problem. Examples abound in internet companies, the movie industry, simulation systems, network traffic to name just a few. In the case of the internet companies Amazon, eBay, Google, Netflix - these businesses release value in their markets by giving consumers access to products (in these cases books, auction products, general information or movies) that live in the long tail of the distribution curve of available products.
The business of investment research is a classical long tail problem which is very expensive to solve with traditional research approaches but which can be solved, like the examples above, with search technology.
Consider the problem of doing qualitative research on a company. The high-frequency information needed is straightforward to find (the initial deep part of a Pareto distribution curve of the information) – it’s in news feeds and articles from the mainstream sources of information and researchers combine this with old-fashioned research: talking to management teams, trying to find non-obvious sources of information and modeling the financial behavior of the company.
But for stock research significant information lies in the long-tail of information that is available – we see more than ~50% of the meaningful information to be found . We process millions of documents from the web each week and, after we have filtered out all the obvious junk, about half are in the head of the curve: news, press releases or near duplicates of the same as news travels through the web; the remainder are unique and need to be mined for fit.
The first aspect that is a long tail lies in the ecosystem of a quiet company. By ecosystem I mean the trends in the market affecting the company, competitors, suppliers or customers. For a quiet company, or even a noisy one in a complex market, information is available on all aspects of the ecosystem but there are too many dimensions for a human being to search every day. As one of our customers said to me after an evaluation “we could not duplicate the results with infinite resources”. But vertical search technology can model the ecosystem and persistently mine the long tail, across hundreds of topics and many thousands of qualified sources, storing and sorting the results to make the information accessible. For topic driven search like ours the company does not even have to be mentioned, just the topic itself is referenced in some way the search engine can detect.
Consider the analogy to Amazon, first researched in 2003. By building a model of all books available, by subject matter and title, and then connecting them together with user reviews, Amazon made you, or me, able to find books of great interest to us that would never have made it to our local bookstores. The reason is the frequency of someone having an interest in the book is not high enough to justify the store carrying the book but there are enough people interested in the book to make money on it if you can match reader to book.
Now consider following a company in your portfolio. There may be over 1000 sources of information (like local news papers, blogs, local filings etc.) that can give you information that would deepen your understanding or even change your mind – and maybe 950 of those sources are in the long tail. We have many examples of stocks like this is our system, for example in source rich sectors like technology and retail. But the frequency of any one source yielding information is probably too low for you to check the source every day and the documents typically won't get correctly caught and filtered by google because they don't fit a simple keyword match. There aren’t many hedge fund managers who have the time or the inclination to check 950 sources a day and so the information never gets factored in. This problem is perfect for vertical search technology.
Just like in the Amazon case where the solution requires both technology and the hard work of collecting all the descriptions of books in one database, in our world the collection and research of sources is, in itself, a long tail problem. The web is populated by billions of sites, more than 80 million blogs and they continue to grow exponentially. However, more interestingly for the investor (although I can’t prove this yet), we see that, while there is a lot of junk to be discarded, the number of interesting blogs – with quality writing – is growing incredibly fast as informed people discover the blogging world and the publishing world gets turned on it’s head. Keeping up requires both technology and the research to find and catalog the sources contributing to the long tail.
Vertical search for investors - what we call search-driven research - is a new field. There are just a couple of companies doing it, all of us start ups, and each trying different approaches, but it is so clearly a long tail problem, and perfect for search technology, that it’s taking off fast.
If you want to read more about the Long Tail check out the excellent definition by Chris Anderson on Wikipedia and on his blog.