selberg.org Home Home

Archive for April, 2008
4/24/08
11:04 pm
William Chang, CTO Baidu, on Search

Dr. William Chang, gave a number of talks at WWW2008. A brief history: PhD from Berkeley, one of the main developers of InfoSeek, long time advisor and lately (Jan 2007) CTO of Baidu. Here’s some distillation of them, and as always just my interpretations of what he’s said:

Working in China
He spoke at length about the issues of having a company in China; in particular, the perils of having an established non-Chinese company try to move into the Chinese market. He gave a number of examples of companies that haven’t done well, such as Google and Yahoo (Baidu being dominant in search), eBay (being overcome by TaoBao), and Tencent (runs QQ, dominant instant messenger in China). I need to ask him how he views Joyo.com (Amazon.com.cn!). Anyway, various keys he mentioned to success in China (there were a few others, but these were my main takeaways):

  • Be focused on China. Other companies were moving in as an after-thought and focused more on their primary markets (typically the US).
  • Sell to the market you have, not the one you want. OK, pardon the pun, but essentially, non-Chinese companies often assumed China would be like the US and Europe, and have tried to market their services as such.
  • Managers should be local and domain experts. In other markets, the US for example, managers of a division would typically be domain experts of the business as well as local and able to speak the language. Often, foreign companies would send either a domain expert that couldn’t speak Chinese, or someone that was local but not a domain expert. Neither is a good choice.
  • Ground troops are plentiful. Use them. It’s cheap and easy to get big fast, so ground troops are necessary. Apparently Baidu has well upwards of 3,000 sales and another 3,000 indirect sales people - well in excess of about ~1000 developers. Most are graduates from local colleges - apparently nearly all hiring is straight from college. Yes, this requires different types of management; Chang described a poll-like model where people are rated on 10 categories, and this allows management to improve both people as well as groups along these axes. Microsoft does something a bit similar with the MS Poll, but that’s more for identifying problems with managers and groups.

Overall, the first and last were the two key points I took away. If you assume China is a different market, which having been here a few times now I can certainly accept, then it makes absolute sense to have a local division be completely empowered to focus on the local market and get lots of ground troops to do it. From prior experience, if the organization has a central (say US) based development effort and some small feature is needed for a remote site (say China), it can be frustratingly difficult for the US based group to do it given other priorities. Yahoo solved the problem to a degree in search by having a group dedicated to handling non-US requests that the core search team wouldn’t do. But really, when you can find ground troops locally, get them.

Three Generations of Search
Chang also spoke at length about the Three Generations of Search. I’ll cut to the chase:

  • First Generation (1993 - 2001): IR at scale
  • Phrases
  • Source rating / prior
  • Multiple facets vs single relationships
  • Second Generation (2001 - present): Data and heterogeneity
    • “Web Oracle” model - the web will know the answer
    • User-generated content
    • Tagging
    • Wikipedia / Baidupedia
    • Question Answering (Naver in Korea, Yahoo Answers in US, Sino’s iAsk and Baidu’s iKnow in China)
    • People Search (LinkedIn and FaceBook)
  • Third Generation (??): Internet as a Matching Network
    • Personalization
    • Integration of Search and Recommendation
    • Predictive recommendation with feedback

    For the most part, the culmination of the First Generation would be sites like AltaVista or InfoSeek. Second Generation brings us the current offerings from Google, Yahoo, and Microsoft. The Third? Well, he mentioned that Amazon was doing a number of useful things there ;) but nobody else had been successful yet. Which begs the question will the existing second generation engines turn into the third!

    More seriously, the main focus on the Third Generation is using massive data and computationally per individual versus per aggregate. Right now, Google, Microsoft, and Yahoo are able to make various generations about things using prior history of their millions of customers and sessions. However, the question is how to make use of an individual’s history and related people’s history to make meaningful personal recommendations. As an example, one of the papers I was most disappointed with was Spatial Variation in Search Engine Queries by Backstrom, Kleinberg, Kumar, and Novak (Cornell and Yahoo). It showed that queries emanate disproportionately from different regions; for example, baseball team names are very focused around their local region. Yeah, OK. And then? For example, do people click on different results on the same query issued from different locales? Or is there implicit, or explicit, meaning on terms in different locales? It was mentioned that “Cardinals” had two hotspots - Arizona (the football team) and St. Louis (the baseball team)… so presumably people want and will click on different things. But in what way? What are the metrics? And as you get closer to the midpoint of Arizona and Missouri, how do you merge in the results?

    I’m not sure if it’s an extension of the Second Generation or the Third, but I’d say solving those types of issues are clearly part of the next wave of things. And it’s good to see that Baidu is also pushing on that, in addition to the Big Three (and in fairness, Baidu is #3 world-wide… yup… and Naver #4. No wonder Microsoft wants to buy Yahoo! ;) )

    4/23/08
    10:50 pm
    Themes from Beijing

    I’m attending WWW2008 in Beijing this week. It’s turned into a big of a monster conference… nine simultaneous tracks over three days, not to mention a day of workshops and tutorials! Yow! And I’m seeing a number of colleagues from the usual haunts here as well. Both Kai-Fu Lee, head of Google China, and Harry Shum, head of Microsoft’s Live Search development, each gave keynotes, and I thought the themes on them was quite interesting and contrasting.

    Kai-Fu Lee’s theme was Cloud Computing, or moving to a world where data and computation was handled on remote anonymous servers and applications then ran. He gave an overview of a number of Google applications that ran on this - Search, Mail, etc. I was struck by one comment he made, which is that cloud computing frees people from the monopoly of a single company controlling everything. Except, of course, the company that runs everything in the cloud for you…. Meet the New Boss…. Same as the Old Boss! But digs at Microsoft aside, the path outlined was clearly focused on Web applications built out on cloud computing, with those applications all leveraging large scale, reliability, and naturally massive amounts of data to handle things.

    Harry’s talk was more of a Company Meeting talk, in which he handed the microphone to Graham Sheldon to show off some demos, in particular highlighting some of the cool things MSRA is doing as well as some of the latest on the Live Search release. They led off with what I thought was the best, which is some work from MSRA’s speech group that extracts speech from video and then enables you to see related videos while watching them. It was put together well, so it isn’t so much a “watch while on the Web” demo but “imagine you’re watching TV” video. I’ll see if I can’t find a link, but good stuff. Also shown was Guanxi, which tries to do a people / relationship search… in this case, it showed who was related to Bill Gates. They also showed a demo where you could do query-by-image, which would show images related to a target image. I need to ask some of my former UW colleagues who did things like QBIC (Query By Image Content). The demos of released Live Search features were focused on new features in the News and Local Verticals, including some cool stuff from the Maps team (which continuously produces some great stuff). Oh, and they have a few things on health they’re experimenting with, and trying to get things hooked up with the HealthVault.

    OK… so we have two “My company is doing cool stuff, come work for us!” keynotes. But do we have any insight here?

    Yes. Google, as widely reported often and everywhere, is busy making an operating system platform of cloud computing that they then build their services on. They’re not actually selling or providing a cloud - Amazon is, with EC2 and S3. But they’re creating the applications that depend on the cloud.

    Microsoft, on the other hand, isn’t really pushing the cloud platform. They have a number of components for that, but the demos shown are all slices on search. But they’re certainly not talking about the power of their platform; they’re talking about cool features. But I worry along that line. The problem they have, which they and Google are trying to address, is user flow. Users don’t go to a vertical, they go to search. So now the problem is to discover intent on when it’s appropriate to show essentially a house ad for a vertical with some content, and then create a compelling, and consistent, experience as a user moves from “search” into “news” or “health exploration” or whatever they’re doing.

    What I can’t help but wonder is why neither appears to be really pursuing differentiated domains and brands. For example, I still don’t think of Google, Yahoo, nor Microsoft when I think “news.” I think CNN. And really, I don’t think “news search” so much, I want more of a news paper. Archival search is great, but should be from within the news portal. To that degree, I wonder why “Live News” isn’t more MSNBC, or even just a different URL, such as www.livenews.com (it’s some random news site… probably buyable!). Certainly there’s lots of direct visitation to www.youtube.com, and I’m still more familiar with www.mapquest.com than the URLs for Google, Yahoo, or Live maps.

    Anyway, food for thought… as always, I’ll lie about updating this later as the conference progresses.

    Update 4/25: We (a number of anonymous conference delegates, and yours truly) now have short synopses on all the keynotes. In order:

    • Kai-Fu Lee, Google: Use our stuff!
    • Harry Shum, Microsoft: We have stuff!
    • Sir Tim Berners-Lee, W3C: I invented stuff!
    • Robin Li, Baidu: I paid for this stuff!
    • David Belanger, AT&T Labs: We route stuff!

    In fairness, we’re sort of making up Robin Li’s synopsis. Sir Tim’s keynote was somewhat, uh, long and rambly, and after about 30 minutes of it the audience in the Great Hall of the People got restless and started heading to the drink counters for more beer and wine. Sadly, by the time Robin got to the stage, the audience was in no mood to listen and was already engaged in conversation, so we’re not really sure what he said. But Baidu did sponsor the banquet, which rocked, so we thanked him for that.

    David Belanger’s keynote was the best in my opinion… and not just because he didn’t do either a passive-aggressive product placement speech or an aggressive-aggressive product demo speech. He just talked about content, experience and devices, and networking to them and a lot of the challenges. For example, apparently as of 10 years ago when AT&T licensed out its rotary phone service, that was still upwards of a BILLION dollar business. For rotary phones. When a new touch-tone costs $10, or is often free. The main takeaways were that (a) there are loads of devices and enpoints, and it’s all increasing, and (b) the observation and re-iteration that old devices don’t go away slowly. The last is ignored at people’s peril… people hold on to things a lot longer than nearly everyone else would like.

    4/21/08
    7:05 pm
    Microsoft acquires FareCast
    So in a surprising move (well, to me at least… ) Microsoft purchased FareCast, the latest startup from my advisor, Oren Etzioni. for a paltry (heh) $115 million. FareCast is a great concept… simply take historical price data from the airlines, and predict whether the price will go up or down. I first wrote about them in 2006, when I said:

    Personally, I give ‘em less than a year before they’re bought by Orbitz, Expedia, or Travelocity. I’m not sure if I’d bank on the prediction model of whether I should buy a ticket now or not, given that if I wait, I might not be able to get the flight I want or end up with a crap seat, just to save something like $10 (and I’m just estimating that based on playing with it… if you can save substantially more, this may be much more interesting, but I’m doubtful things will be that rosy). However, if Expedia / Orbitz / Travelocity could, on average, save $10 per ticket, then they’d just clean up. They buy Farecast, and lower their published prices by $5. Let’s say Expedia buys them. Expedia can now undercut Travelocity and Orbitz with a $5 cheaper ticket — and in a world where people shop by price and have no problems going elsewhere to get a better price on the same ticket, this causes Expedia to win. Plus, for each ticket, Expedia is now getting $5 more — as they’re saving $10 from Farecast. How are they doing that? Well, they’re just going by what Farecast says… buy the ticket now, or wait a bit and buy the ticket later. They just have to eat the occasional higher cost when it is higher, but if Farecast works, then statistics will cause things to win overall, and Expedia (or whomever buys Farecast) will win. Simple as that. And I’m just using $10 as a guess here… if it’s more like $20, it’s an even bigger win and no-brainer.

    So, my prediction of less than a year (meaning a purchase by July 2007) was off by a year. But you should know by now how accurate my predictions are. ;) The more interesting question here is: Why Microsoft? Not clear (and of course the parties aren’t going to comment much until everything is settled). It appears that it’s mostly for the MSN Travel side of the house… which to me seems somewhat suspect given I don’t see why MSN Travel would push for $115MM unless they’re thinking of doing Expedia II. But perhaps they are! I’m also not sure if this is a search play… certainly searching for tickets is a great and interesting concept on a search engine, but knowing if a price will go up / down seems like a very minor feature on a search engine compared to how it could be used in the actual purchase.

    But, let’s get to the important business, which is congratulations for Oren and Jeff among other people there. Great work all around, and it’s fantastic to see another successful startup!

    4/06/08
    11:35 pm
    The #2 Strategy

    I haven’t posted lately on the Microsoft / Yahoo bid. It’s been interesting seeing things unfold… I thought Yahoo would have been much more receptive to the offer. Or, more to the point, I thought Microsoft and Yahoo had already come to terms and this was more the public drama. But apparently not.

    At the Microsoft company meeting last year, Steve Ballmer, in his big rah-rah speech at the end, mentioned that the first thing needed for the Search team was a plan to be #2. #2? Yeah… if you’re #3 (or #5, behind Baidu and Naver… sucks when your global service is smaller world-wide than a dominant local service in China and South Korea!). But let’s focus on the US market, so the goal is to move beyond Yahoo into the #2 position.

    How, exactly, will that be done?

    Seriously… there are millions of people who have Yahoo as their home page and use their search engine. The #1 query on Google is “Yahoo.” Yahoo, while declining in share a bit, is still huge and will hold on for years and years. They’ve held most of their customers, and their customers aren’t going anywhere. Not like Google hasn’t been around for a few years now.

    So, let’s say Microsoft makes a search engine better than Yahoo. OK… will that get Yahoo customers? Doubtful. Why? There’s already a better engine: Google. Hasn’t been a flood of people moving over.

    OK, so it’s a better engine… and better mail, messenger, portal?

    I don’t buy that.

    OK… so I’m not seeing a clear answer to get Yahoo customers to go to some newer, better thing quickly. So how about attacking Yahoo’s financials? Kill the ad network… compete on price, offer advertisers more for less. Lose tons, but it kills them.

    OK, let’s say that works. You kill Yahoo financially, destroy the asset. Some advertisers have gone to Microsoft, others to Google. Hopefully the market hasn’t come down, even though in a recession it will and it looks like we’re in one. But the customers really aren’t moving… so Microsoft still has to buy Yahoo in the end.

    So ultimately, it’s a question of buying a mostly healthy asset now at a premium, or dumping billions into weakening them for a later purchase… and hope that Google hasn’t just run away with things.

    Thus, ultimately, I think Microsoft has to purchase Yahoo, and came to that conclusion earlier this year. Will there be tons of conflict? Yup. Problems integrating? Absolutely. A huge exodus of smart talent up 101 to Google? Damn straight.

    But more to the point… how else can Microsoft get to #2? I don’t see it. So, they bet the company and try to buy Yahoo… cry havoc!

    4/05/08
    5:29 pm
    Upgraded to 2.5…

    been having random DB problems, so let’s see if 2.5 actually solves things.