selberg.org Home Home

Archive for March 26th, 2007
3/26/07
12:30 am
The problems with metasearch

Greg Linden makes some interesting points regarding metasearch, aka federated search. When I created MetaCrawler back in 1994, I did so with the belief that combining the results of multiple engines would provide better results than any single engine — such as WebCrawler, Lycos, InfoSeek, and OpenText. It turns out this is true. More importantly, it’s still true. Google, Yahoo!, and MSN / Live Search all provide good results, and when they differ a simple voting strategy to combine results makes the sum greater than any of the individual results.

So why isn’t MetaCrawler dominant instead of just a minor blip compared with the big three?

One of the key downsides with metasearch is performance — the metasearch engine is always a little slower than the average engine. But is performance the only issue? Or is it so dominant? Not clear.

What about operational reliability? An issue with federation is that each federated service needs to be available and reliable — each system needs to be able to handle the load. With federation, especially federating external systems, operational issues are more likely, and scale is more difficult. As query volume increases, each federated service needs to be scaled up appropriately as well.

Could it be the brand? Certainly, brand + quality is better than either in isolation, but does brand + quality surpass poor branding and better quality? Again, not clear.

I can’t say for sure… it’s probably some combination of all of the above. This being said, in thinking about it, I suspect operational reliability and to an extent control over data will win out from a business perspective. Federation means others are in control of part of the solution — and even if the others are part of the same company but different groups, it can still be difficult to ensure that the federated services are in sync with the entry point. So, to that extent, it should come as no surprise that folks like Google will advocate putting everything in one index.