selberg.org Home Home

Archive for May, 2008
5/25/08
11:20 pm
Google’s trust-building

When I interviewed at Microsoft many years ago, one of my interviewers asked me what I thought the next big thing for search was. I said: “trust.” Right now, people get pages back, but there’s still a huge degree of distrust on what they see. People trust Amazon.com, and (for better or worse) seem to trust Wikipedia. But random sites? Hmm. Some people are generally trusting, but many aren’t, and the continuous stories of identify theft and credit card theft make people more paranoid (which is probably a good thing).

I still stand by my statement. Of the “next big things” for search people keep talking about, such as blended search, personalization, social search, etc. I still believe that trust will be the big differentiator. There is a lot of crap out there, and I suspect it’s growing a lot faster than quality pages.

Which brings me to the following. The other day, I received the following in my inbox:

Dear site owner or webmaster of selberg.org,

While we were indexing your webpages, we detected that some of your pages were using techniques that are outside our quality guidelines, which can be found here: http://www.google.com/webmasters/guidelines.html. This appears to be because your site has been modified by a third party. Typically, the offending party gains access to an insecure directory that has open permissions. Many times, they will upload files or modify existing ones, which then show up as spam in our index.

The following is some example hidden text we found at http://selberg.org/2008/02/:

buy viagra
buy viagra online
viagra online
discount viagra
order viagra
cheap viagra
generic viagra
generica viagra
viagra buy
viagra price
order viagra online
viagra generic
viagra pill
where buy viagra
buy viagra cheap
viagra order
get viagra
buy online viagra
online viagra
viagra sale online
where to buy viagra
cheapest viagra
purchase viagra
cheap viagra online
viagra buy online
buying viagra
buy viagra on
generic viagra canada
prescription viagra
buy viagra norway
generic viagra pack

[...]

In order to preserve the quality of our search engine, we have temporarily removed some of your webpages from our search results. Currently pages from selberg.org are scheduled to be removed for at least 30 days.

We would prefer to have your pages in Google’s index. If you wish to be reconsidered, please correct or remove all pages (may not be limited to the examples provided) that are outside our quality guidelines. One potential remedy is to contact your web host technical support for assistance. For more information about security for webmasters, see http://googlewebmastercentral.blogspot.com/2007/09/quick-security-checklist-for-webmasters.html.

When you are ready, please visit https://www.google.com/webmasters/tools/reinclusion?hl=en to learn more and submit your site for reconsideration.

Sincerely,
Google Search Quality Team

My first reaction was, WTF? I run my own blog, and I know I’m not spamming. Somebody phishing me? Nope… links are legit… so I go to the page in question, and sure enough on my “My advisor’s WSDM” post, there was a hidden block with a number of links:

<font style="position: absolute;overflow: hidden;height: 0;width: 0">
<a href="http://www.bigbadbookblog.com/?menu=1" title="buy viagra">buy viagra</a><br />
><a href="http://www.bigbadbookblog.com/?menu=2" title="buy viagra online">buy viagra online</a><br />
<a href="http://www.bigbadbookblog.com/?menu=3" title="viagra online">viagra online</a><br />
...

And since it was on that post, it was also on the archive post for Feb, which was the link Google found.

I went into panic mode, and first edited the post, then went to Google to ensure my blog wasn’t removed. I was led through an interesting process, where I ended up registering my site, then registering myself as owner (by putting in a special META tag and having Google confirm it was there), and then acknowledge that I’m behaving and all is well.

This is a great process, in that they now have a known owner for a site they can contact. And they know the site is active and somebody on the other end cares… whether they’re a spammer or not, to be determined later. It also means that Google can now alert me to problems with my site… such as poor indexing, or if my site or a post gets hijacked again (still not sure how it happened, but I’ve updated WordPress at least!). Google is doing a ton of analysis and have published some of what they’re doing, such as massive map-reduce scans looking for malware landing pages, in a technical report “All Your iFrame Are Point to Us.” They’re also highlighting sites that they believe may be harmful to your computer (but aren’t sure of enough to remove from their index).

Upon reflection, here’s what they’re doing, and here’s what I now believe:

  • Google is building up a network of sites and site owners, getting to know them better;
  • Google is creating a framework to help registered site owners ensure their sites are legit;
  • Google is actively trying to identify and remove bad and malicious content from their index;
  • Google is being (surprisingly) public about what they’re doing.

What this means to me is that the sites that appear on Google are likely more trustworthy than they are on competitors, such as Microsoft. Now, I know Microsoft has a security group, and they’re doing a lot to go after malware and phishing (for example, the recent anti-phishing plugin on IE is a great step in that direction). But are they connecting all the dots? And are they doing so publicly? Because frankly, without telling people what you’re doing for them, they’re very unlikely to give you proper credit for what you’ve done.

5/20/08
11:10 pm
Live Search Cashback is up

Check it out now… http://search.live.com/cashback

Lots of coverage out there; I saw Todd report it first.

I’ll post some more tomorrow when I can play with it a bit more. Initially, it looks like Live Search Cashback is basically eBates on steroids. They’ve got a number of merchants signed up, and they’re doing both comparison shopping as well as splitting the commission on sale with the purchaser, thus giving a good reason to do comparison shopping @ Live Search. However, unlike Amazon’s Pro Merchant program, which I highly recommend, the customer goes directly to the shopping store. Amazon actually facilitates the transaction, thus the merchant in question doesn’t actually see the customer’s billing information and the merchant must abide by Amazon’s A-Z guarantee, esp. on returns. This is nice when dealing with notoriously sketchy merchants, such as digital camera dealers in New York that aren’t B&H Photo Video; it forces them to behave (and they are on Amazon!).

5/04/08
1:36 am
And so it ends

It looks like Microsoft’s attempted acquisition of Yahoo! has come to an end. Apparently, $46 billion wasn’t good enough, but $50 billion would have been. So what’s $4 billion between friends? Ah well. Mini already has a post up having popped a cork, and I’m sure MSFTExtremeMakeover will have something shortly. And I’m sure there will be plenty of analysis posts as to why, why it’s a good thing, why it’s a bad thing, what might have been, and so on.

So here’s mine!

First, so if I understand properly, Microsoft bid $41 billion, Yahoo! wanted $50 billion. So Microsoft came up $5B more, met ‘em halfway… Yahoo! still wanted the full $50B. OK… so if you can come up with $5, why not $10B? And yeah, I understand, these are scarily huge numbers. But hey, if you’re going to sit down at the World Cup of Poker, you know it’s not a $10 buy-in. I actually wonder if it’s too much of a bet-the-company move… e.g. Microsoft can currently afford anyone that’s $46B or less, but more… not so much.

Second… so what’s next? Well, let’s see….

Option 1: Keep at it! Keep at it! Keep at it!

Well, Satya, Brian, Harry, and the gang have to do something. And now that they won’t have too much of a distraction integrating Yahoo!. Plus, this means that most of Microsoft will now align very closely with services, focusing on ads and search. A search bar in every application, every desktop, every skin. And renewed focus on new frontiers, such as XBox and mobile - especially XBox.

Option 2: Buy! Buy! Buy!

Buy someone else! Or elses! But who? Well, how’s this little gem from comScore:

Baidu Ranked Third Largest Worldwide Search Property by comScore in December 2007


To aid in your research and coverage of Baidu’s recent announcement to enter the Japan market with www.baidu.jp, relevant comScore qSearch worldwide data are provided below.

In December 2007, 66.2 billion search queries were conducted worldwide.

In December 2007, Baidu.com Inc. was the third ranked search property worldwide with 3.4 billion searches, capturing 5.2 percent of worldwide search share.

Worldwide Search Top 10
December 2007
Total World Age 15+, Home and Work Locations*
Source: comScore qSearch 2.0

Searches (MM)

Share of Searches

Total Internet

66,221

100.0

Google Sites

41,345

62.4

Yahoo! Sites

8,505

12.8

Baidu.com Inc.

3,428

5.2

Microsoft Sites

1,940

2.9

NHN Corporation

1,572

2.4

eBay

1,428

2.2

Time Warner Network

1,062

1.6

Ask Network

728

1.1

Yandex

566

0.9

Alibaba.com Corporation

531

0.8

Baidu is the dominant engine in China, NHN is www.naver.com, which is the dominant engine in South Korea. Oh, and today, 5/4/2008, NHN is worth about $11.25B (current price, in KRW), and Baidu is worth $12.36B (current price in USD).

Naver hasn’t shown any propensity to move outside of Korea, and for the most part their stranglehold on South Korea is their huge question and answers site (which is what Yahoo! Answers, Microsoft QnA, and Baidu’s iKnow are based upon). Their search, last I knew, wasn’t terribly great.

But Baidu…. Baidu is doing real search. Baidu just launched in Japan earlier this month. And they have the currently dominant question and answer site, although TenCent, which runs QQ, the dominant instant messenger in China by far, is looking to create their own version that may cause some trouble. And Baidu has got heavy competition from Google.

Now, there are certainly issues with buying Baidu due to the Chinese government. But… well… at the end of the day, those Yahoo customers aren’t going anywhere quickly - not to Google, not to MSN. That’s one of the key reasons why, IMHO, Microsoft wanted to buy them. But that isn’t happening, so those customers stay with Yahoo. Now, Microsoft still needs to get some additional customers somehow, somewhere. If not from Yahoo, and if not from Google… well, for me, I’d start looking abroad really quickly myself.