Dr. William Chang, gave a number of talks at WWW2008. A brief history: PhD from Berkeley, one of the main developers of InfoSeek, long time advisor and lately (Jan 2007) CTO of Baidu. Here’s some distillation of them, and as always just my interpretations of what he’s said:
Working in China
He spoke at length about the issues of having a company in China; in particular, the perils of having an established non-Chinese company try to move into the Chinese market. He gave a number of examples of companies that haven’t done well, such as Google and Yahoo (Baidu being dominant in search), eBay (being overcome by TaoBao), and Tencent (runs QQ, dominant instant messenger in China). I need to ask him how he views Joyo.com (Amazon.com.cn!). Anyway, various keys he mentioned to success in China (there were a few others, but these were my main takeaways):
- Be focused on China. Other companies were moving in as an after-thought and focused more on their primary markets (typically the US).
- Sell to the market you have, not the one you want. OK, pardon the pun, but essentially, non-Chinese companies often assumed China would be like the US and Europe, and have tried to market their services as such.
- Managers should be local and domain experts. In other markets, the US for example, managers of a division would typically be domain experts of the business as well as local and able to speak the language. Often, foreign companies would send either a domain expert that couldn’t speak Chinese, or someone that was local but not a domain expert. Neither is a good choice.
- Ground troops are plentiful. Use them. It’s cheap and easy to get big fast, so ground troops are necessary. Apparently Baidu has well upwards of 3,000 sales and another 3,000 indirect sales people - well in excess of about ~1000 developers. Most are graduates from local colleges - apparently nearly all hiring is straight from college. Yes, this requires different types of management; Chang described a poll-like model where people are rated on 10 categories, and this allows management to improve both people as well as groups along these axes. Microsoft does something a bit similar with the MS Poll, but that’s more for identifying problems with managers and groups.
Overall, the first and last were the two key points I took away. If you assume China is a different market, which having been here a few times now I can certainly accept, then it makes absolute sense to have a local division be completely empowered to focus on the local market and get lots of ground troops to do it. From prior experience, if the organization has a central (say US) based development effort and some small feature is needed for a remote site (say China), it can be frustratingly difficult for the US based group to do it given other priorities. Yahoo solved the problem to a degree in search by having a group dedicated to handling non-US requests that the core search team wouldn’t do. But really, when you can find ground troops locally, get them.
Three Generations of Search
Chang also spoke at length about the Three Generations of Search. I’ll cut to the chase:
- First Generation (1993 - 2001): IR at scale
- Phrases
- Source rating / prior
- Multiple facets vs single relationships
- “Web Oracle” model - the web will know the answer
- User-generated content
- Tagging
- Wikipedia / Baidupedia
- Question Answering (Naver in Korea, Yahoo Answers in US, Sino’s iAsk and Baidu’s iKnow in China)
- People Search (LinkedIn and FaceBook)
- Personalization
- Integration of Search and Recommendation
- Predictive recommendation with feedback
For the most part, the culmination of the First Generation would be sites like AltaVista or InfoSeek. Second Generation brings us the current offerings from Google, Yahoo, and Microsoft. The Third? Well, he mentioned that Amazon was doing a number of useful things there
but nobody else had been successful yet. Which begs the question will the existing second generation engines turn into the third!
More seriously, the main focus on the Third Generation is using massive data and computationally per individual versus per aggregate. Right now, Google, Microsoft, and Yahoo are able to make various generations about things using prior history of their millions of customers and sessions. However, the question is how to make use of an individual’s history and related people’s history to make meaningful personal recommendations. As an example, one of the papers I was most disappointed with was Spatial Variation in Search Engine Queries by Backstrom, Kleinberg, Kumar, and Novak (Cornell and Yahoo). It showed that queries emanate disproportionately from different regions; for example, baseball team names are very focused around their local region. Yeah, OK. And then? For example, do people click on different results on the same query issued from different locales? Or is there implicit, or explicit, meaning on terms in different locales? It was mentioned that “Cardinals” had two hotspots - Arizona (the football team) and St. Louis (the baseball team)… so presumably people want and will click on different things. But in what way? What are the metrics? And as you get closer to the midpoint of Arizona and Missouri, how do you merge in the results?
I’m not sure if it’s an extension of the Second Generation or the Third, but I’d say solving those types of issues are clearly part of the next wave of things. And it’s good to see that Baidu is also pushing on that, in addition to the Big Three (and in fairness, Baidu is #3 world-wide… yup… and Naver #4. No wonder Microsoft wants to buy Yahoo!
)