selberg.org Home Home

Archive for June, 2006
6/30/06
10:40 am
AIRWeb 2006 Call for Participation - Updated with papers!

[Ed: I’m glad to see my pal Greg Linden talking at AIRWeb 2006. We’ve also got some great papers… hope you can make it!]

AIRWeb 2006 Call For Participation

The attraction of hundreds of millions of web searches per day provides significant incentive for many content providers to do whatever is necessary to rank highly in search engine results, while search engine providers want to provide the most accurate results. The conflicting goals of search and content providers is adversarial, and the use of techniques that push rankings higher than they belong is often called search engine spam. Such methods typically include textual as well as link-based techniques, or their combination.

AIRWeb 2006 provides a focused venue for both mature and early-stage work in web-based adversarial IR. The workshop solicited technical papers on any aspect of adversarial information retrieval on the Web. Submissions were reviewed by a program committee of search experts and accepted papers (listed below) cover state-of-the-art research advances to address current problems in web spam.

AIRWeb 2006 brings together both researchers and industry practitioners and will be held on August 10, 2006, after the SIGIR 2006 conference, in Seattle, Washington. Early registration rates end June 30.


Workshop Program

This, the second AIRWeb workshop, builds on last year’s successful meeting in Chiba, Japan as part of WWW2005. This year we will have both full and short presentations on aspects of adversarial information retrieval on the Web.

In addition to the papers listed below, we will have an invited talk by Jan Pedersen, Yahoo! on sponsored search, and an expert panel discussion on blog spam, including:

  • Dennis Fetterly, Microsoft Research
  • Natalie Glance, Nielsen BuzzMetrics
  • Jeremy Hylton, Google
  • Greg Linden, Findory.com
  • Andrew Tomkins, Yahoo! Research
  • someone to be determined, Ask.com


Accepted Papers

Four papers were accepted for full presentation:

  • Link-Based Characterization and Detection of Web Spam
    Luca Becchetti, Università di Roma “La Sapienza”
    Carlos Castillo, Università di Roma “La Sapienza”
    Debora Donato, Università di Roma “La Sapienza”
    Stefano Leonardi, Università di Roma “La Sapienza”
    Ricardo Baeza-Yates, Yahoo! Research Barcelona
  • Link-Based Similarity Search to Fight Web Spam
    András A. Benczúr, Hungarian Academy of Sciences and Eötvös University
    Károly Csalogány, Hungarian Academy of Sciences and Eötvös University
    Tamás Sarlós, Hungarian Academy of Sciences and Eötvös University
  • Improving Cloaking Detection using Search Query Popularity and Monetizability
    Kumar Chellapilla, Microsoft Live Labs
    David Maxwell Chickering, Microsoft Live Labs
  • Tracking Web Spam with Hidden Style Similarity
    Tanguy Urvoy, France Telecom R&D
    Thomas Lavergne, France Telecom R&D
    Pascal Filoche, France Telecom R&D

Two additional papers were accepted for short presentation:

  • Adversarial Information Retrieval Aspects of Sponsored Search
    Bernard J. Jansen, Pennsylvania State University
  • Web Spam Detection with Anti-Trust Rank
    Vijay Krishnan, Stanford University
    Rashmi Raj, Stanford University


Organizing Committee


Program Committee

  • Sibel Adali, Rensselaer Polytechnic Institute, USA
  • Lada Adamic, University of Michigan, USA
  • Einat Amitay, IBM Research Haifa, Israel
  • Andrei Broder, Yahoo! Research, USA
  • Carlos Castillo, Università di Roma “La Sapienza”, Italy
  • Abdur Chowdhury, AOL Search, USA
  • Nick Craswell, Microsoft Research Cambridge, UK
  • Matt Cutts, Google, USA
  • Dennis Fetterly, Microsoft Research, USA
  • Zoltan Gyongyi, Stanford University, USA
  • Matthew Hurst, BuzzMetrics, USA
  • Mark Manasse, Microsoft Research, USA
  • Jan Pedersen, Yahoo!, USA
  • Bernhard Seefeld, Switzerland
  • Erik Selberg, Microsoft Search, USA
  • Bruce Smith, Yahoo! Search, USA
  • Andrew Tomkins, Yahoo! Research, USA
  • Tao Yang, Ask.com/Univ. of California-Santa Barbara, USA


Contact Email

  • airweb(at)cse.lehigh.edu
6/28/06
11:05 pm
Farecast!

My advisor, Oren Etzioni, just had his latest company launch it’s first product publicly. Check out Farecast at www.farecast.com!

The idea is rather simple — study airline pricing and determine if you should buy a ticket now or wait. Right now, it only really works if you’re travelling from Seattle or Boston, but they’re adding cities left and right.

Personally, I give ‘em less than a year before they’re bought by Orbitz, Expedia, or Travelocity. I’m not sure if I’d bank on the prediction model of whether I should buy a ticket now or not, given that if I wait, I might not be able to get the flight I want or end up with a crap seat, just to save something like $10 (and I’m just estimating that based on playing with it… if you can save substantially more, this may be much more interesting, but I’m doubtful things will be that rosy). However, if Expedia / Orbitz / Travelocity could, on average, save $10 per ticket, then they’d just clean up. They buy Farecast, and lower their published prices by $5. Let’s say Expedia buys them. Expedia can now undercut Travelocity and Orbitz with a $5 cheaper ticket — and in a world where people shop by price and have no problems going elsewhere to get a better price on the same ticket, this causes Expedia to win. Plus, for each ticket, Expedia is now getting $5 more — as they’re saving $10 from Farecast. How are they doing that? Well, they’re just going by what Farecast says… buy the ticket now, or wait a bit and buy the ticket later. They just have to eat the occasional higher cost when it is higher, but if Farecast works, then statistics will cause things to win overall, and Expedia (or whomever buys Farecast) will win. Simple as that. And I’m just using $10 as a guess here… if it’s more like $20, it’s an even bigger win and no-brainer.

So, hats off Oren (and Jeff!) and hope things work out. The beta looks great!

6/25/06
7:55 am
RIP Lemon Yellow

Last night after the ride I noticed my lone large Lemon Yellow (Labidochromis caeruleus) was in distress. I did a quick water change, added some clout and Maracyn Plus… no luck. When I awoke this morning to watch Ecuador vs England in the World Cup, my poor fish has expired. :(

DSC_27631.jpg
6/24/06
8:55 pm
UZ to Fall City loop, 93.2 miles

Ruben, Megan, Paul, Scott, and I did a lovely ride on a hot, sunny day today! Scott met us at Marymoor and just did the 50 mile Tour de Cure loop with us; while the rest of us trucked to Marymoor and back from UZ via the trail. We made good time — overall, averaged about 14 miles an hour moving, including the time spend on a couple of good hills.
My GPS log: http://trail.motionbased.com/trail/episode/view.do?episodePk.pkValue=956939

6/22/06
4:10 pm
Good Day, Bay Day

Good Day: My Polar S720i’s LCD decided to die… the rest of the unit is fine, I just can’t see anything. So I went down to REI, and they gave me a new box and unit, no fuss, no muss. REI rocks.

Bad Day: Two of my Labidochromis caeruleus up and died on me… I had just put in a few new fry-sized lemon yellows. So now I’m down to one big yellow… but my Melanochromis johannii are doing great — starting from 2, I’ve now got maybe ten good sized ones (all tank born).

6/15/06
10:43 pm
Why do alarms not come with a big “OFF!” button?

So, my wife and I awoke to the screech of the carbon monoxide alarm going off at about 2 AM this morning. No, we didn’t have a sudden influx of CO… just some battery somewhere was dying, and thus time to get us to fix it by going off. Well, after fumbling around looking for screwdrivers to unscrew the 3 screws holding the box, cycling the power in a couple rooms a few times, and generally being frantic and cranky in our underwear, we finally got the damn thing off.

But really… all this just to say, “Yes, I hear you alarm, please stop now.” At 2 AM, you’re not going to be able to replace the battery (and no, it isn’t a nice 9V, but one of these weird form factor ones), so you just want things to end so you can go back to sleep. So why doesn’t someone put in a big “OFF” switch? Or at least a timed “SLEEP” switch for like 12 hours or something? Or even have the damn thing just “beep!” every 30 seconds like the smoke alarms do?

Gah. Stupid alarms…

6/11/06
5:35 pm
A tale of two rides…

A bunch of us did a century, +/- 5 miles (I was -) this weekend. We decided to truck down to The Hill, which is the biggest hill on STP — all 300 feet of it (OK, it’s not a big hill). This is approximately 45 miles away from our starting point, University Zoka, or UZ as we call it. We did most of the initial STP route, but instead of taking the West Valley Highway, which has a fair amount of traffic, we hopped onto the Interurban Trail, which is a great trail — no roots, minimal traffic, nice and speedy. We made great time, averaging about 15 mph and making it down to this great espresso store at the corner of Milwaukee & Valley Ave. in Sumner - The Coffee Corner if I can make the picture out (stupid phone took it at itty-bitty resolution).

IMAGE_223.jpg

Sadly, right before this point, we had a flat — and no spare tube for a 650 tire! Tire repair took a bit longer than usual, but such is life. Anyway, we took off and did the hill. Triumphant, we quickly cashed it back in, zipping down to the red light at the base of the hill (d’oh!). We then trucked back up Interurban, then decided to take a bit of a detour and take the Green River Trail to the Duwamish Trail, which cuts across the east side of West Seattle. We made it to downtown Seattle, and at this point we split up — Scott and I had to book it home, while the rest of the gang decided to do a bit more of a ride to get in a full century. Anyway, I don’t feel too bad about missing it… I ended up doing 95 miles, so I’m feeling pretty reasonable about everything.

Next week: Flying Wheels!

Erik’s log: http://trail.motionbased.com/trail/episode/view.mb?episodePk.pkValue=900369

Ruben’s log: http://trail.motionbased.com/trail/invitation/accept.mb?senderPk.pkValue=13403&unitSystemPkValue=2&episodePk.pkValue=897495

6/04/06
7:10 pm
WWW 2006

So, while posting my cycling episodes was fun, the main purpose of my trip to Edinburgh and Cambridge was as a delegate to WWW 2006, the 15th International World Wide Web Conference. This is one of the two conferences I’ve typically been involved with; the other being SIGIR coming up in August. In theory, I had planned on blogging during the conference, but it turns out that the wifi at the Edinburgh International Conference Center wasn’t up to the task of providing some 1000 people with laptops access… and I ended up getting quite busy to actually blog with some thought (turns out posting a narrative on pictures is far easier! probably more entertaining to my readers as well. nonetheless…)

Anyway, now that I’ve had some time to write down some thoughts, without further ado here they are:

The International World Wide Web Conference was held in Edinburgh, Scotland, from May 22nd -> 26th, 2006. What follows is a brief trip report on the conference, including some of the talks and as is often the case the more interesting conversations that happened outside the talks.

Attendees
The conference was well attended, although it did feel that student attendance was less than it had been in previous years (perhaps due to the rather steep increase in registration costs). Microsoft had over 30 employees attend, from various groups. Yahoo and Google were also there in force, although the delegates from Yahoo were some of their more senior people like Prabhakar and Andrei. There were also a number of people there from IBM and Ask.com. The conference itself was split mostly into two main camps, as it has been in years past: those interested in search, and those interested in the semantic web.
 
Interesting Talks
Google, Yahoo, and Microsoft all had very heavily attended talks and were for the most part all quite good. I’ll wasn’t able to personally see them all, but will provide some short summaries for those I was able to catch. Microsoft had 7 papers there, which is quite a good showing. I was also impressed that Google appears to be trying to get back into having its people publish. They’ve been given a fair amount of grief for people going to Google and then stopping all publishing. I’m not sure if the 3 papers done by Google is an answer to that or just coincidence, but I suspect they may be trying to counter public opinion. Yahoo also has a tutorial on ads, by their big guns (Andrei Broder, Prabhakar Raghaven, Ricardo Baeza-Yates).
 
Google Talks (3):
 
A Web-based Kernel Function for Measuring the Similarity of Short Text Snippets
Mehran Sahami
Timothy D. Heilman
 
Mehran gave a great talk on a new kernel function that they used to measure short text snippets; in particular, this was useful for discovering similar queries (although it should be noted that this would be useful for other things, not just finding similar queries).
 
Retroactive Answering of Search Queries
Beverly Yang
Glen Jeh
 
Beverly’s talk showed an interesting glimpse of where Google is going with personalization. The paper dealt with an alternative approach to Alerts. Google found that most people don’t sign up for alerts (meaning alerting them when a new result for a query arrives), and that when they do its for a very short time. What they’re trying to do is create alerts automatically based on a search history. The idea is primarily in concept phase; for evaluation, they just used a bunch of Google engineers. But it does seem to be an interesting way to handle alerts and creating some personalization that users can understand.
 
Browsing on Small Screens: Recasting Web-Page Segmentation into an Efficient Machine Learning Framework
Shumeet Baluja
 
Yahoo Talks (3):
 
Visualizing Tags over Time
Micah Dubinko
Ravi Kumar
Joseph Magnani
Jasmine Novak
Prabhakar Raghavan
Andrew Tomkins
 
I didn’t see this talk, but heard it was quite good.
 
Generating Query Substitutions
Rosie Jones
Benjamin Rey
Omid Madani
Wiley Greiner
 
Rosie gave a talk that highlighted some of what Yahoo is doing as far as query substitutions, which includes spelling. They were especially interested in handling this in the advertising case, so they could more easily match ads against a query even though the advertiser may not have used the most appropriate terms.
 
Searching with Context
Reiner Kraft
Chi Chao Chang
Farzin Maghoul
Ravi Kumar
 
Reiner’s talk presented a couple different ways to augment a query using terms extracted from the search context (for example, the web page the user is currently browsing, or terms from previous queries).
 
A Google AND Yahoo Talk (technically, an IBM paper, as the G & Y authors were both at IBM at the time):
 
Using Annotations in Enterprise Search
Pavel A. Dmitriev
Nadav Eiron
Marcus Fontoura
Eugene Shekita
 
Interesting Talks – Academia & Other
 
Off the Beaten Tracks: Exploring Three Aspects of Web Navigation
Harald Weinreich
Hartmut Obendorf
Eelco Herder
Matthias Mayer
Winner of Best Student Paper Award
 
Random Sampling from a Search Engine’s Index
Ziv Bar-Yossef
Maxim Gurevich
Winner of Best Paper Award
 
Microsoft Talks (7):
 
Time-Dependent Semantic Similarity Measure of Queries Using Historical Click-Through Data
Qiankun Zhao
Steven C. H. Hoi
Tie-Yan Liu
Sourav S Bhowmick
Michael R. Lyu
Wei-Ying Ma
 
A Comparison of Implicit and Explicit Links for Web Page Classification
Dou Shen
Jian-Tao Sun
Qiang Yang
Zheng Chen
 
Beyond PageRank: Machine Learning for Static Ranking
Matthew Richardson
Amit Prakash
Eric Brill
 
Finding Advertising Keywords on Web Pages
Wen-tau Yih
Joshua Goodman
Vitor R. Carvalho
 
Detecting Online Commercial Intention (OCI)
Honghua Dai
Lingzhi Zhao
Zaiqing Nie
Ji-Rong Wen
Lee Wang
Ying Li
 
Detecting Spam Web Pages through Content Analysis
Alexandros Ntoulas
Marc Najork
Mark Manasse
Dennis Fetterly
 
The case for multi-user design for computer aided learning in developing regions
Joyojeet Pal
Udai Singh Pawar
Eric Brewer
Kentaro Toyama

6/04/06
7:05 pm
Cycling around Cambridge

My final day in Cambridge, I went cycling around the city with Nick and Andrew, another researcher from MSRC. I had found what looked like a nice route at CTC Cambridge: http://www.ctc-cambridge.org.uk/Cycle-Routes.html that went around the city. So, starting from MSRC, we biked mostly due west to connect to the loop, and then took it clockwise. Turns out this wasn’t the best idea… the cue sheet wasn’t the easiest to follow, and we were going backwards — it called for counter-clockwise.

Anyway, we managed to get through the first half of the loop without much difficulty, but then managed to get a bit lost… oops. Also, turns out that there was a problem with the camera… the shutter stuck in a halfway up position, with the D70 flashing “Err.” Gonna have to look at that when I return.

After a brief stop at lunch, we took a bit of a wrong turn and went due west vs southerly… after a bit, we arrived at a sign for the town of Babraham… yes, its real. And apparently, there’s a nearby zoo… Linton Zoo…. so, it’s Babraham Linton Zoo. Fun fun fun… :)

Anyway, we turned south and then circled about, completing the loop more in a plus-shape versus a loop. But hey, we still got our 40 miles, and all is good.

40 miles, GPS log: http://trail.motionbased.com/trail/episode/view.mb?episodePk.pkValue=865867

6/01/06
2:10 am
Cambridge to Ely

A few days after a trip to Loch Ness, Nick and I teamed up with another colleague from MSRC (and as it turns out CMU) Dushyanth did a quick ride from MSRC in Cambridge up to Ely (pronounced EE-lee).

MotionBased GPS log: http://trail.motionbased.com/trail/episode/view.mb?episodePk.pkValue=841914 - 37.65 miles. Don’t believe the 7:30:09 time… we took quite some time at lunch (Pizza Express) and at the Ely Museum waiting for the rain to let up.

Here’s Nick and Dushyanth:

DSC_3373.jpg

and here’s yours truly at the River Great Ouse (pronounced ows, although here it’s more like ooze), the fifth-longest river in the UK.

DSC_3375.jpg

A few seconds later, this flotilla of swans decided to pose for a photo:

DSC_3376.jpg

There wasn’t a whole lot between Cambridge and Ely; it’s relatively flat, and aside from a freak hail storm (d’oh!), we got to the cathedral at Ely quite handily:

DSC_3379.jpg
DSC_3382.jpg
DSC_3383.jpg

We couldn’t actually go inside the cathedral, as they were filming Elizabeth: The Golden Age starring I believe Cate Blanchett. Guess this means I’ll to see the first Elizabeth movie, as I’m dying to see the interior of the cathedral! Of course, I suspect I could just look for other images out there… :)