selberg.org Home Home

Archive for the 'Random' Category
7/25/07
10:25 am
Server crash… sorry

Welp, I can see from abroad that my server crashed while everyone was away. Sigh… sorry about that folks; hard to keep the reliability up on a shoestring budget. Plus, I haven’t bothered to invest too much in even half-cheeked redundancy. Lame, I know.

7/25/07
10:25 am
SIGIR 2007

I’m at SIGIR 2007 this week…. if you’re also here, send me a ping. I also created a Facebook group for SIGIR as an experiment… feel free to join if you’re on Facebook!

7/19/07
9:35 pm
Positively False

While en route to Pittsburgh, I read Positively False, a memoir of Floyd Landis, winner of the 2006 Tour de France. For those who have forgotten, Floyd tested positive for doping after his Stage 17 victory in the 2006 TdF, and is currently fighting to prove the test was an error, thus keep his TdF victory and reputation. Floyd was in Seattle recently for a book signing, and while I was in Mountain View, some friends went and picked an autographed copy up for me — thanks guys! The book is a first-person narrative and an easy read (with big print, so great on an airplane in poor light!).

The book is an overview of Floyd’s life in terms of how he became a pro racer, emphasizing his training and worth ethic. It then details his participation in the Tour de France, both as a member of Team Postal with Lance and then as the leader of Team Phonak. While a member of Team Phonak, he worked closely with Allen Lim from CycleOps who worked with him heavily using a PowerTap. He credits using the PowerTap and racing with a power meter to enable him to mount his come-from-behind stage. I’ll skip the details, but the main point is that a power meter enables a racer to figure out how much wattage they can put out without tiring. By racing with a PowerTap, Floyd was able to attack and break away early on a hilly Stage 17, which normally would be a crazy strategy, and then pedal without tiring too early and time trial his way to closing an 8 minute gap on the leader down to 30 seconds.

His story shifts quickly from the victory to the positive test for doping and current (as of July 2007) status — fighting to prove the test is false in an arbitration case against USADA, the United States Anti-Doping Association. He explains a lot of the early reporting, for example the numerous explanations, as naivety with the press , and some of the other surprising  outcomes, such as the revelation of Greg LeMond being abused as a child, as frustration at the situation.

While I am engrossed and interested in the story, the book is not actually the story of what happened — it’s his public defense. Two or three years from now, there might be a resolution to the case, and then perhaps the story will be complete. At the end of the day, the question really is “Did Floyd dope to win?” Certainly, for doping there is no difference in actions between a guilty person caught and an innocent person falsely accused — they’ll both protest innocence. And there are lots of cases where after further review, it turns out the protestations of innocence were, in fact, false. But, presumably, some are truly innocent.

After reading Floyd’s book, I’m still unsure if he’s clean or not. Certainly, he provides a good story, but again if you’re doping it’s very easy to say that your body is just adjusting better to training and conditioning, and thus it’s easy to lie through that. And it’s also easy to keep a conspiracy like that under wraps… take baseball. It’s turning out lots of people are using, and lots of people know, and only a few are coming out and stating this. Cycling has been going through the same thing, with people like Bjane Riis, a former TdF winner, admitting to doping (although the statute of limitations has passed, so he can’t be stripped of anything officially). From what I understand of USADA, for the most part they’re 165-0, meaning they’ve never ruled in an athlete’s favor. Personally, that feels suspicious to me… perhaps USADA is reasonable enough that they can see a false positive and respond to that before arbitration, thus maintaining a perfect record. But this seems unlikely, as eventually a false positive is something that can’t easily be ignored all the time. Perhaps they only go after really clear cases, and ignore defensible ones, again maintaining a perfect record. But that would imply more that USADA is incompetent and sports is filled with doping, and USADA can only catch the really obvious ones. While possible, that’s fairly depressing.

Anyway, it does make for a good read, if only to promote training with power and a glimpse inside the world of professional cycling. Plus, I suspect my copy will make a great momento, no matter which way this thing finally ends!

6/14/07
1:55 pm
107 goldfish!

As some of you know, I have a fishtank in my office. Well, after a recent move, I stocked it with about 10 goldfish to cycle the water. 7 were still alive and kicking some months later, and all other fish I’ve tried to add to the tank have passed prematurely. So, while at Petco the other day, they received a new shipment of goldfish at $0.10 a pop. So I dropped $10 and picked up 100… and so far, no fatalities! We’ll see what the weekend brings.

Picture 1.jpg
5/03/07
12:45 pm
Decision theories

Greg makes a great recommendation on “Hard Facts, Dangerous Half-Truths, and Total Nonsense” by Stanford Business School Professors Jeffrey Pfeffer and Robert Sutton. I’ve been doing some thinking on this myself. As always, I’m speaking for me, #include <std_disclaimer.h>.

Recently, I’ve seen a number of companies adopt a “data-driven decision process.” This appears to come mostly from Google saying they use data, not politics, and a lot of other companies are now echoing that as they believe (a) deciding with data (vs intuition, common sense, etc.) is a good thing, and (b) if they say they’re using the same strategy as Company A, and Company A is kicking butt, people will believe their strategy will also kick some butt.

I’ve now come to really hate this strategy, and cringe when people say they’re “data-driven.” It’s not that it’s completely without merit, but the devil is in the details, and those are often what’s lacking.

First, the biggest example of why I think data-driven tends to be a bad decision model: New Coke. For those that don’t remember, New Coke was a sweeter, more Pepsi-like drink Coca-Cola introduced to replace it’s much-loved Coke, aka Coke Classic. This was done as a reaction largely to Pepsi’s Pepsi Challenge, where they had people on commercials try both Coke and Pepsi and say they liked Pepsi better. Well, after a lot of research done by the Coca-Cola company, and lots of data, they decided to go ahead and change the formula from Coke to New Coke. This then turned into one of the greatest marketing disasters people can remember.

The Coke folks had data. In fact, they made the decision largely on data. They weren’t about to change the formula that drastically unless they had lots of data saying their move would be a success. And yet, it wasn’t.

And this brings me to the reasons I hate data-driven decision theory. A data-driven approach has to start with the right question, followed by experiments that provide data that is properly interpreted to provide the answer. Typically, people fail in either starting out with the wrong question, or by conducting poor experiments that produce flawed data. An evil twin of flawed data is what I call Executive Data Bias. A decision-maker will have a certain bias on what to do, and is looking for data to back up that decision. Thus, flawed data that backs up the decision is accepted without much probing, while good data and the implications are rejected, typically by asking for “more experiments” or “more data,” or questioning assumptions made in the experiment or question.

Data-driven can work. But it requires knowing the right experiments, understanding the data, and being prepared for an unpopular outcome. For example, in television, most decisions to kill programs are data driven. If a program doesn’t do well, meaning isn’t watched, the executives try to move it to a better slot. If that doesn’t work, they kill it. It doesn’t matter as to the quality of the show (although they’ll certainly try more things to keep what they believe are quality assets), just whether or not people are watching. I remember at one point that Charles S. Dutton, who played Roc Emerson on Roc back in the early 90s, made an interesting statement to that effect. When Fox canceled the critically acclaimed Roc, a number of fans were extremely grumpy and vocal, especially as at the time it was one of the few shows to put black families in a positive light. However, Dutton came right out and said that it didn’t matter, what mattered were numbers, and they weren’t good enough, sadly. Update: actually, he blasted Fox as deciding to cancel the show because of network racism versus numbers. I must have misremembered, but luckily the Internet keeps old quotes!

However, most decisions aren’t as cut and dry as whether or not to cancel a program, and thus getting the data and using it to inform is often very, very tricky. This isn’t to say it’s worthless, but rather that data is one component in what should be a thoughtful decision making process.

4/30/07
12:20 am
Davies sentencing delayed until June 15th

According to the Redditch Advertiser, the Lilly case has been adjourned until June 15:

Lilly case adjourned

THE sentencing of a woman in connection with the case of Baby Lilly, the infant whose body was found by the banks of the River Alne, has been adjourned.

Rachel Davies, 26, of Wharrage Road, Alcester has pleaded guilty to concealing the birth of a child she gave birth to by secretly disposing of the child’s dead body.

She was due to be sentenced on Friday at Warwick Crown Court.


The hearing is now expected to take place on Friday, June 15.

I previously discussed the guilty plea and arrest. There are a fair number of comments from people in the area on the arrest post that present some differing viewpoints, and I encourage you to read them.

4/23/07
11:55 pm
Closer on Virginia Tech…

This editorial from the WSJ says some good things about what to do to prevent another Virginia Tech massacre (h/t John Cole):

Diagnosis from afar is the purview of talk-shows hosts and other charlatans, and I will not attempt to detail the psyche of the Virginia Tech slaughterer. But I will hazard that much of what has been reported about his pre-massacre behavior–prolonged periods of asocial mutism and withdrawal, irrational anger and hatred, bizarre writing and speech–is not at odds with the picture of a fulminating, serious mental disease. And his age falls squarely within the most common period when psychosis blossoms.

No one who knew him seems surprised by what he did. On the contrary, dorm chatter characterized him explicitly as a future school-shooter. One of his professors, the poet Nikki Giovanni, saw him as a disruptive bully and kicked him out of her class. Other teachers viewed him as disturbed and referred him for the ubiquitous “counseling”–an outcome that is ambiguous to the point of meaninglessness and akin to “treatment” for a patient with metastasized cancer.

But even that minimal care wasn’t given. The shooter didn’t want it and no one tried to force him to get it. While it’s been reported that he was involuntarily committed to a “Behavioral Health Center” in December 2005, those reports also say he was released the very next morning. Even if the will to segregate an obvious menace had been in place, the legal mechanisms to provide even temporary “warehousing” were absent. The rest is terrible history.

That is not to say that anyone who pens violence-laden poetry or lets slip the occasional hostile remark should be protectively incarcerated. But when the level of threat rises to college freshmen and faculty prophesying accurately, perhaps we should err on the side of public safety rather than protect individual liberty at all costs.

If the Virginia Tech shooter had been locked up for careful observation in a humane mental hospital, the worst-case scenario would’ve been a minor league civil liberties goof: an unpleasant semester break for an odd and hostile young misanthrope who might’ve even have learned to be more polite. Yes, it’s possible confinement would’ve been futile or even stoked his rage. But a third outcome is also possible: Simply getting a patient through a crisis point can prevent disaster, as happens with suicidal people restrained from self-destruction who lose their enthusiasm for repeat performances.

This is good. Ever since the 70s, there’s been a clear relationship between the number of mentally ill people and the homeless, which in large part is due to closing down state hospitals. At my daughter’s preschool, we pulled up the carpets in the entry this weekend because it smelled like urine. The reason it smells is that there’s a transient who lives in the area who, among other issues, has bladder and bowel control issues, and will walk into the entryway during the day and soil himself. We’re getting the funding for a keypad for the exterior, but the reality is that this guy doesn’t belong wandering the streets near UW. I doubt he’s a threat, but I’d like to think that our society could at least make him comfortable and put him out of harm’s way.

The article then goes south in a hurry:

The best predictor of future violent behavior is past violent behavior, yet we regularly grant parole to murderers, serial rapists, chronically assaultive individuals and habitual pedophiles. Even when we do attempt to segregate low-impulse multiple offenders with effective tools such as with three-strikes laws, liberationist clamor never ceases.

At some point, if an inmate has done his or her time, then we let them go. And for parole, there are a number of criteria that must be met before someone is paroled. Does it always work? No. I read? heard? somewhere that a third of felons end up committing another crime. But then again, if that’s true, this means two-thirds don’t. If the author wants to argue that the sentences are too short, that’s one thing. But in any society governed by laws, once a punishment is meted out, society has to live with that punishment, even if after the fact people want more.

Talk to anyone who’s tried to commit a dangerously violent child or parent for even a few days: A stranger with a law degree will show up at the hearing and paint you as a fascist. So it’s far too much to expect anything resembling a decisive approach to those whose level of threat remains at the verbal level.

Given the excesses of the past–husbands committing troublesome wives, involuntary sterilization of those judged defective–extreme caution is warranted. But like drunk drivers, we sway from one side of the legal road to the other and find the sensible center lane elusive.

The problem here isn’t that it’s too hard to commit someone, or that there’s too much abuse when it’s too easy to commit someone. It’s that as a society, we don’t care that much about mental health. We are chronically building more jails instead of closing them, meaning we’re putting more people away. However, we aren’t doing nearly enough to help people that need it. And that’s what needs to be fixed.

4/17/07
12:00 am
Sonics stadium falls down, goes boom

Apparently, the Sonics’ stadium is dead. As a longtime listener of KJR Sports Talk Radio, 950 on YOUR AM dial, I’ve been following the latest Seattle stadium initiative with some interest. For the most part, there are two arguments, and they’re the same arguments that were made about a decade ago when the decision was made to level the Kingdome and build Safeco Field and Qwest Field for the Mariners and Seahawks.

Pro public funding of a stadium: Professional sports teams generate revenue, and thus the city / county / state should invest in a stadium.

Against public funding of a stadium: The public should not give millions to billionaires. If a stadium is such a good investment, then private financing should be readily available.

The argument against is stronger, in my opinion. As near as I can see, stadiums don’t by their nature generate a positive cash flow to a locality. Sure, there’s an influx of people to the stadium and a resulting tax dollar increase. OK, let’s do some quick math for a basketball team. 82 games per year, 41 home games. Say an average of 15,000 people attend each game (Key Arena holds a bit over 17,000). Say the average spend per person is $100 total, and we’ll say the tax is 10%, so $10 per person. So 41 * 15,000 * $10 = $6,150,000 per year. So, about $6 million per year. Say I’m off by a factor of 3… OK, that’s $18,450,000 per year. Call it $20 million. King County’s 2007 proposed budget is $507 million, so that’s about 4% of the yearly budget, being very generous. Less generous, and it’s 1-2%. However, when the initial outlay is $300 million or so (or perhaps just $250… the news reports $150 from King County and $100 from Renton), it doesn’t seem like a great investment.

Personally, I think the pro argument is the wrong one. Governments spend money on public buildings for professionals all the time — opera houses, symphony halls, theaters, and such, not to mention college stadiums where everyone makes money except the athletes. For the most part, buildings like symphony halls aren’t great investments. Symphonies and the like don’t make a lot of money. However, I find that they’re vital as far as the culture they bring to the community. Performing arts are critical to a community’s culture. And just like arts, sports are also critical to a community’s culture. Yes, there is a ton of money involved. Pop culture can do that. But take it away, and the community loses something.

That being said, it seems to me that the entire Sonics debacle is a crisis brought on by short-term greed. It seems the way to make money as a professional sports team owner is to buy, run the team for a number of years, and sell at a huge profit. Many owners can profit while they own the team, but some can’t, or don’t care to. Such is life. Cities and counties can also pay for stadiums, but they can’t necessarily pay for it on any given year. Sometimes a region feels relatively generous and can pay for things, other times it will feel pinched. Like if major bridges and freeways are in need of repair, and two other stadiums have been recently built. Now, the Sonics have been in Seattle for over 40 years, and have a huge fan base, so at some point the city or county will be up for building a stadium. But not now. So, a rational plan is to wait and keep pushing, and in a few years, something will go through.

Instead, a deal has to get done now, otherwise, the new owners will move ‘em out. It’s classic blackmail. Sadly, at this point the city / county / state isn’t in the mood to cave. Again, there are times when investing in culture makes sense, and other times when it doesn’t. Now, it doesn’t.

So, looks like the Sonics are moving to Oklahoma. Bah.

4/16/07
1:35 am
The joy of taxes

Well, it’s that time of year again, when the 300 million or so Americans file their federal, and in most cases state and local income tax returns. This year, like the past few years, I’ve used TaxCut, mostly because I’m getting pretty good at weird cap gains issues and because TurboTax annoyed me a few years ago with their overly aggressive product activation thing. What I’ve observed over the years is that even though I’m using the same program that loads in data from the previous return, I seem to be rooting through my yearly files more for random documents than before. Sure, there are the receipts for charities, and in WA state sales tax (but hey, unless it’s a big-ticket item, just go with the standard deduction). But I gotta keep track of the WA state tab cards that state how much tax I pay, and various transaction details and such. But the big takeaway is that I’m realizing that it’s becoming much tougher to not have software to fill out a modest 1040. I’m not talking about the simple math. Instead, it’s the ton of speculative calculations and mini-worksheets that need to be filled out.

At times like this, I often wonder about the difference between the “progressive” income tax and a “regressive” sales tax. At least in the US, income tax is used both to collect revenue in a way as people can afford (e.g. rich pay more than poor), as well as to encourage behavior via credits and deductions. For example, if both parents work, then some child care expenses (daycare) can be deducted. But if one spouse doesn’t work, then no deduction. The idea clearly is to encourage spouses, in particular mothers, to either stay home with the kids or work. However, putting the kid in daycare or a preschool environment without working (such as you might do when one kid is getting older and needs to socialize with other kids, and there’s a newborn that needs lots of mom time)… well, that’s just to be discouraged.

In Washington state, every now and then people grumble that we have a very regressive sales tax and no income tax, instead of what Oregon has, which is a high income tax and no sales tax. However, what a lot of people fail to recognize is that approximately 1/3 of purchases are made by businesses, thus businesses pay a third of the sales tax burden. Meanwhile, with an income tax, there’s all sorts of ways to avoid it. So, even though it normally runs against my normal instincts, I sometimes do give pause to wonder about how the economy of Washington seems to be doing much better than Oregon, and whether it makes sense to try and simplify things a bit. But that’s just crazy talk!

4/13/07
2:25 pm
AIRWeb 2007 Accepted Papers out

AIRWeb ‘07, to be held on May 8th in Banff as part of WWW 2007, published the list of accepted papers today. We’ve got 10 great full papers and three short papers. Hopefully, links to the actual papers will be coming soon, but until then, be sure to make those reservations for Banff!

Full papers

  • Kumar Chellapilla and Alexey Maykov: A Taxonomy of JavaScript Redirection Spam.
  • Debora Donato, Mario Paniccia, Maddalena Selis, Carlos Castillo, Giovanni Cortese and Stefano Leonardi: New Metrics for Reputation Management in P2P Networks.
  • Ye Du, Yaoyun Shi and Xin Zhao: Using Spam Farm to Boost PageRank.
  • Georgia Koutrika, Frans Effendi, Zoltán Gyöngyi, Paul Heymann and Hector García-Molina: Combating Spam in Tagging Systems.
  • Yu-Ru Lin, Hari Sundaram, Yun Chi, Junichi Tatemura and Belle Tseng: Splog Detection Using Self-similarity Analysis on Blog Temporal Dynamics.
  • Xiaoguang Qi, Lan Nie and Brian Davison: Measuring Similarity to Detect Qualified Links.
  • Krysta Svore, Qiang Wu, Chris Burges and Aaswath Raman: Improving Web Spam Classification using Rank-time Features.
  • Baoning Wu and Kumar Chellapilla: Extracting Link Spam using Biased Random Walks from Spam Seed Sets.
  • Josiane Xavier Parreira, Debora Donato, Carlos Castillo and Gerhard Weikum: Computing Trusted Authority Scores in Peer-to-Peer Web Search Networks.
  • Dengyong Zhou, Christopher Burges and Tao Tao: Transductive Link Spam Detection.

Short papers

  • András A. Benczúr, István Bíró, Károly Csalogány and Tamás Sárlós: Web Spam Detection via Commercial Intent Analysis.
  • Qingqing Gan and Torsten Suel: Improving Web Spam Classifiers Using Link Structure.
  • Hiroo Saito, Masashi Toyoda, Masaru Kitsuregawa and Kazuyuki Aihara: A Large-Scale Study of Link Spam Detection by Graph Algorithms.