<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: In defense of AOL&#8230;</title>
	<atom:link href="http://selberg.org/2006/08/07/in-defense-of-aol/feed/" rel="self" type="application/rss+xml" />
	<link>http://selberg.org/2006/08/07/in-defense-of-aol/</link>
	<description>Erik Selberg's Homepage &#038; Blog</description>
	<pubDate>Wed, 20 Aug 2008 13:16:13 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
		<item>
		<title>By: head</title>
		<link>http://selberg.org/2006/08/07/in-defense-of-aol/#comment-2420</link>
		<dc:creator>head</dc:creator>
		<pubDate>Sat, 12 Aug 2006 23:56:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.selberg.org/2006/08/07/in-defense-of-aol/#comment-2420</guid>
		<description>Yes.. try out the AOL search database yourself.. It is just fun to look at some of the search data..

http://data.aolsearchlogs.com/log/random.cgi</description>
		<content:encoded><![CDATA[<p>Yes.. try out the AOL search database yourself.. It is just fun to look at some of the search data..</p>
<p><a href="http://data.aolsearchlogs.com/log/random.cgi">http://data.aolsearchlogs.com/log/random.cgi</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sara Astruc</title>
		<link>http://selberg.org/2006/08/07/in-defense-of-aol/#comment-2218</link>
		<dc:creator>Sara Astruc</dc:creator>
		<pubDate>Wed, 09 Aug 2006 14:20:54 +0000</pubDate>
		<guid isPermaLink="false">http://www.selberg.org/2006/08/07/in-defense-of-aol/#comment-2218</guid>
		<description>&lt;i&gt;So, certainly, a sample of one does not a proof make.&lt;/i&gt;

Well, you asked &lt;i&gt;Find me a single user that is identifiable by these queries (and you cannot just self-identify!). People said this with Alta Vista and Excite released their logs, but nothing came of it. So, let’s try this again… if you can identify someone via their queries, then you win, I’m wrong, and I’ll say so on this blog. &lt;/i&gt;

You asked for one, I gave you one.</description>
		<content:encoded><![CDATA[<p><i>So, certainly, a sample of one does not a proof make.</i></p>
<p>Well, you asked <i>Find me a single user that is identifiable by these queries (and you cannot just self-identify!). People said this with Alta Vista and Excite released their logs, but nothing came of it. So, let’s try this again… if you can identify someone via their queries, then you win, I’m wrong, and I’ll say so on this blog. </i></p>
<p>You asked for one, I gave you one.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: mc</title>
		<link>http://selberg.org/2006/08/07/in-defense-of-aol/#comment-2206</link>
		<dc:creator>mc</dc:creator>
		<pubDate>Wed, 09 Aug 2006 09:25:55 +0000</pubDate>
		<guid isPermaLink="false">http://www.selberg.org/2006/08/07/in-defense-of-aol/#comment-2206</guid>
		<description>You are certainly right about IP addresses alone not being enough, thanks to the nice "efficient" proxies AOl likes to use :)

Still I'm glad the nytimes proved my point, although the story seems to imply they tracked her down using queries alone, however they don't explictly say this. I hope they doen't get into trouble for using the information for "commercial use" by exposing that user :-s

Thanks for the discussion!</description>
		<content:encoded><![CDATA[<p>You are certainly right about IP addresses alone not being enough, thanks to the nice &#8220;efficient&#8221; proxies AOl likes to use <img src='http://selberg.org/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Still I&#8217;m glad the nytimes proved my point, although the story seems to imply they tracked her down using queries alone, however they don&#8217;t explictly say this. I hope they doen&#8217;t get into trouble for using the information for &#8220;commercial use&#8221; by exposing that user :-s</p>
<p>Thanks for the discussion!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Warrior</title>
		<link>http://selberg.org/2006/08/07/in-defense-of-aol/#comment-2205</link>
		<dc:creator>Warrior</dc:creator>
		<pubDate>Wed, 09 Aug 2006 09:08:55 +0000</pubDate>
		<guid isPermaLink="false">http://www.selberg.org/2006/08/07/in-defense-of-aol/#comment-2205</guid>
		<description>Make your mistakes and take your risks with your own (or with that of persons consenting) personal data/organs/minds.</description>
		<content:encoded><![CDATA[<p>Make your mistakes and take your risks with your own (or with that of persons consenting) personal data/organs/minds.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Erik Selberg</title>
		<link>http://selberg.org/2006/08/07/in-defense-of-aol/#comment-2202</link>
		<dc:creator>Erik Selberg</dc:creator>
		<pubDate>Wed, 09 Aug 2006 07:08:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.selberg.org/2006/08/07/in-defense-of-aol/#comment-2202</guid>
		<description>So, certainly, a sample of one does not a proof make. However, finding out that all IP addresses in a log file are random AOL proxies does in fact discount the ability to discover more info about a givne user. That's proof by induction.

Certainly, if I can identify more by the user than IP... like his or her name... then yeah, I've got more. But IP address alone won't cut it; I do need to have more information.

BTW, I was talking to someone from the New York Times today who has access to all their data. He didn't say that he'd be able to track down an indivual or not, but he wasn't sure. YMMV.</description>
		<content:encoded><![CDATA[<p>So, certainly, a sample of one does not a proof make. However, finding out that all IP addresses in a log file are random AOL proxies does in fact discount the ability to discover more info about a givne user. That&#8217;s proof by induction.</p>
<p>Certainly, if I can identify more by the user than IP&#8230; like his or her name&#8230; then yeah, I&#8217;ve got more. But IP address alone won&#8217;t cut it; I do need to have more information.</p>
<p>BTW, I was talking to someone from the New York Times today who has access to all their data. He didn&#8217;t say that he&#8217;d be able to track down an indivual or not, but he wasn&#8217;t sure. YMMV.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sara Astruc</title>
		<link>http://selberg.org/2006/08/07/in-defense-of-aol/#comment-2199</link>
		<dc:creator>Sara Astruc</dc:creator>
		<pubDate>Wed, 09 Aug 2006 05:48:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.selberg.org/2006/08/07/in-defense-of-aol/#comment-2199</guid>
		<description>&lt;a href="http://www.nytimes.com/2006/08/09/technology/09aol.html?ref=technology" rel="nofollow"&gt;Here&lt;/a&gt; is your one identifiable person: Meet Thelma Arnold, a 62-year-old widow hailing from Lilburn, GA.

If the public perception of AOL's search is such that they are afraid to use it, that their personal search details could be released-- whether to the police, Homeland Security, or thousands of schmucks online-- the public will no longer trust that search engine with their searches. Aggregating search data for potential release has the potential to seriously devalue the brand. And all those pretty ads that the search engines hope to earn money per click? If everyone's afraid to use the search engines, no one's there to click.</description>
		<content:encoded><![CDATA[<br />
]]></content:encoded>
	</item>
	<item>
		<title>By: donturn</title>
		<link>http://selberg.org/2006/08/07/in-defense-of-aol/#comment-2198</link>
		<dc:creator>donturn</dc:creator>
		<pubDate>Wed, 09 Aug 2006 05:28:39 +0000</pubDate>
		<guid isPermaLink="false">http://www.selberg.org/2006/08/07/in-defense-of-aol/#comment-2198</guid>
		<description>See this &lt;a href="http://www.nytimes.com/2006/08/09/technology/09aol.html?ei=5090&#38;en=f6f61949c6da4d38&#38;ex=1312776000&#38;partner=rssuserland&#38;emc=rss&#38;pagewanted=all" rel="nofollow"&gt;NY Times Story&lt;/a&gt;.</description>
		<content:encoded><![CDATA[<p>See this <a href="http://www.nytimes.com/2006/08/09/technology/09aol.html?ei=5090&amp;en=f6f61949c6da4d38&amp;ex=1312776000&amp;partner=rssuserland&amp;emc=rss&amp;pagewanted=all">NY Times Story</a>.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Morgan Schweers</title>
		<link>http://selberg.org/2006/08/07/in-defense-of-aol/#comment-2197</link>
		<dc:creator>Morgan Schweers</dc:creator>
		<pubDate>Wed, 09 Aug 2006 02:37:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.selberg.org/2006/08/07/in-defense-of-aol/#comment-2197</guid>
		<description>Greetings,
If the information is going to be out there, I'd rather it were made public, instead of being in only a few peoples hands.  For instance, say that you lose your bet (like I think you're about to), and someone can be personally identified?  Someone, probably a number of someones, will contact them and tell them that their info is out there.  (They'll probably sue AOL, but that's beside the point.)

If the info was in a smaller number of hands, and a 'black-hat' got ahold of it, there wouldn't be anyone telling those users that they are in danger.  This is (roughly) the premise behind the 'transparent society'...

Now, I'll provide a few potential answers.
Nelson Gill, anonymized user 4539634.
Barbara Jean Leighton, anonymized user 5167434.
tki189@aol.com, anonymized user 18942526.
Evidently tibrown1964@aol.com signed up for Passport, according to anonymized id 12423492.

How about the query:
locate keith ivan thompson born 3 may 64 social security #### last address was 7th street apt 317 aurora colorado

It doesn't expose the user, but it exposes someone else's personally identifiable information.

or

birth certificate for debra ann collins 1-28-59 ss ####

There are a few dozen others like this.  I'm astonished that people searched for their SSN numbers, and the SSN numbers of other people, but they did, and do, and even associate their names with them.

The worst exposure is this:

kristy nicole vega hammond la. social secruity number ### birth date 03 08 81 drivers license number la. ### address 41178 rene dr. hammond la.

There aren't very many passwords, thankfully, but there are some.

That all said, I'm a member of SIGIR (although not the mailing list :( ), and I'm absolutely GIDDY with excitement over what I can do with this query information, in terms of useful research.  This is going to be wonderful information, in a purely abstract sense.  For those who do NOT have dark desires in their hearts, this is a great boon.  For those who lean to the dark side...well, it is also, but I believe the potential for misuse is limited by the responsibility of the white-hats who'll find the people with personally identifiable information and contact them, and let them know their info has been compromised.

--  Morgan</description>
		<content:encoded><![CDATA[<p>Greetings,<br />
If the information is going to be out there, I&#8217;d rather it were made public, instead of being in only a few peoples hands.  For instance, say that you lose your bet (like I think you&#8217;re about to), and someone can be personally identified?  Someone, probably a number of someones, will contact them and tell them that their info is out there.  (They&#8217;ll probably sue AOL, but that&#8217;s beside the point.)</p>
<p>If the info was in a smaller number of hands, and a &#8216;black-hat&#8217; got ahold of it, there wouldn&#8217;t be anyone telling those users that they are in danger.  This is (roughly) the premise behind the &#8216;transparent society&#8217;&#8230;</p>
<p>Now, I&#8217;ll provide a few potential answers.<br />
Nelson Gill, anonymized user 4539634.<br />
Barbara Jean Leighton, anonymized user 5167434.<br />
<a href="mailto:tki189@aol.com">tki189@aol.com</a>, anonymized user 18942526.<br />
Evidently <a href="mailto:tibrown1964@aol.com">tibrown1964@aol.com</a> signed up for Passport, according to anonymized id 12423492.</p>
<p>How about the query:<br />
locate keith ivan thompson born 3 may 64 social security #### last address was 7th street apt 317 aurora colorado</p>
<p>It doesn&#8217;t expose the user, but it exposes someone else&#8217;s personally identifiable information.</p>
<p>or</p>
<p>birth certificate for debra ann collins 1-28-59 ss ####</p>
<p>There are a few dozen others like this.  I&#8217;m astonished that people searched for their SSN numbers, and the SSN numbers of other people, but they did, and do, and even associate their names with them.</p>
<p>The worst exposure is this:</p>
<p>kristy nicole vega hammond la. social secruity number ### birth date 03 08 81 drivers license number la. ### address 41178 rene dr. hammond la.</p>
<p>There aren&#8217;t very many passwords, thankfully, but there are some.</p>
<p>That all said, I&#8217;m a member of SIGIR (although not the mailing list <img src='http://selberg.org/wp-includes/images/smilies/icon_sad.gif' alt=':(' class='wp-smiley' /> ), and I&#8217;m absolutely GIDDY with excitement over what I can do with this query information, in terms of useful research.  This is going to be wonderful information, in a purely abstract sense.  For those who do NOT have dark desires in their hearts, this is a great boon.  For those who lean to the dark side&#8230;well, it is also, but I believe the potential for misuse is limited by the responsibility of the white-hats who&#8217;ll find the people with personally identifiable information and contact them, and let them know their info has been compromised.</p>
<p>&#8211;  Morgan</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: donalds</title>
		<link>http://selberg.org/2006/08/07/in-defense-of-aol/#comment-2191</link>
		<dc:creator>donalds</dc:creator>
		<pubDate>Tue, 08 Aug 2006 23:09:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.selberg.org/2006/08/07/in-defense-of-aol/#comment-2191</guid>
		<description>&#62; So, I decided to try out the experiment you suggested. Turns out it doesn’t work.

As a researcher (with presumably a background in math), you should know that a sample of one does say much at all. So people should go through hundreds of IDs and check them. Your bet is lost with one identifiable person. And a very small percentage of IDname relations still means thousands of people with a big problem in their lives.

Also,  you call the people which peek into the data 'hypocrites'. I do not think this is true. No harm can be done with data which is available at every corner now. It is impossible to hide it again - as I see it, this is the often cited *observation* of 'information wants to be free'.

With private data, harm is done if it is made public first. Once it is in the public, there can't be any additional harm. This is, I think, the idea of the bloggers and I can understand that very well.

People/corps/govs with evil intentions do not even think about whether they are 'evil' if they use this data.</description>
		<content:encoded><![CDATA[<p>&gt; So, I decided to try out the experiment you suggested. Turns out it doesn’t work.</p>
<p>As a researcher (with presumably a background in math), you should know that a sample of one does say much at all. So people should go through hundreds of IDs and check them. Your bet is lost with one identifiable person. And a very small percentage of IDname relations still means thousands of people with a big problem in their lives.</p>
<p>Also,  you call the people which peek into the data &#8216;hypocrites&#8217;. I do not think this is true. No harm can be done with data which is available at every corner now. It is impossible to hide it again - as I see it, this is the often cited *observation* of &#8216;information wants to be free&#8217;.</p>
<p>With private data, harm is done if it is made public first. Once it is in the public, there can&#8217;t be any additional harm. This is, I think, the idea of the bloggers and I can understand that very well.</p>
<p>People/corps/govs with evil intentions do not even think about whether they are &#8216;evil&#8217; if they use this data.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Erik Selberg</title>
		<link>http://selberg.org/2006/08/07/in-defense-of-aol/#comment-2186</link>
		<dc:creator>Erik Selberg</dc:creator>
		<pubDate>Tue, 08 Aug 2006 22:26:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.selberg.org/2006/08/07/in-defense-of-aol/#comment-2186</guid>
		<description>So, I decided to try out the experiment you suggested. Turns out it doesn't work. Here's one of two users that hit this blog:
&lt;blockquote&gt;6131368 free audiovox phone unlocking   2006-03-03 18:46:12     10      http://www.selberg.org&lt;/blockquote&gt;
Here's the entry in my log (and you're about to see why I don't have any problems posting it):
&lt;blockquote&gt;cache-ntc-ac03.proxy.aol.com - - [03/Mar/2006:16:08:58 -0800] "GET /2005/07/22/unlock-an-audiovox-5600-for-free/ HTTP/1.0" 200 120179 "http://aolsearcht3.search.aol.com/aol/search?invocationType=topsearchbox.search&#038;query=free+AUDIOVOX+phone+unlocking" "Mozilla/4.0 (compatible; MSIE 6.0; AOL 9.0; Windows NT 5.1)"&lt;/blockquote&gt;
Um, great. I know this guy came in through cache-ntc-ac03.proxy.aol.com. Yup, that narrows him or her down. And he or she uses the AOL 9.0 client... probably rare for someone that uses AOL.

So really, turns out the AOL proxy masks a ton... and I really don't have more information, even though I can match the log entry in my blog with a log entry in the data set.</description>
		<content:encoded><![CDATA[<br />
]]></content:encoded>
	</item>
</channel>
</rss>
