« Signing off | Main | The Mystery of Credit for Time Served »

Tuesday, April 09, 2013

Academics Go To Jail – CFAA Edition

Though the Aaron Swartz tragedy has brought some much needed attention to the CFAA, I want to focus on a more recent CFAA event—one that has received much less attention but might actually touch many more people than the case against Swartz.

Andrew “Weev” Auernheimer (whom I will call AA for short) was recently convicted under the CFAA and sentenced to 41 months and $73K restitution. Orin Kerr is representing him before the Third Circuit. I am seriously considering filing an amicus brief on behalf of all academics. In short, this case scares me in a much more personal way than prior discussed in my prior CFAA posts. More after the jump.

Here’s the basic story, as described by Orin Kerr:

When iPads were first released, iPad owners could sign up for Internet access using AT&T. When they signed up, they gave AT&T their e-mail addresses. AT&T decided to configure their webservers to “pre load” those e-mail addresses when it recognized the registered iPads that visited its website. When an iPad owner would visit the AT&T website, the browser would automatically visit a specific URL associated with its own ID number; when that URL was visited, the webserver would open a pop-up window that was preloaded with the e-mail address associated with that iPad. The basic idea was to make it easier for users to log in to AT&T’s website: The user’s e-mail address would automatically appear in the pop-up window, so users only needed to enter in their passwords to access their account. But this practice effectively published the e-mail addresses on the web. You just needed to visit the right publicly-available URL to see a particular user’s e-mail address. Spitler [AA’s alleged co-conspirator] realized this, and he wrote a script to visit AT&T’s website with the different URLs and thereby collect lots of different e-mail addresses of iPad owners. And they ended up collecting a lot of e-mail addresses — around 114,000 different addresses — that they then disclosed to a reporter. Importantly, however, only e-mail addresses were obtained. No names or passwords were obtained, and no accounts were actually accessed.

Let me paraphrase this: AA went to a publicly accessible website, using publicly accessible URLs, and saved the results that AT&T sent back in response to that URL. In other words, AA did what you do every time you load up a web page. The only difference is that AA did it for multiple URLs, using sequential guesses at what those URLs would be.  There was no robot.txt file that I’m aware of (this file tells search engines which URLs should not be searched by spiders). There was no user notice or agreement that barred use of the web page in this manner. Note that I’m not saying such things should make the conduct illegal, but only that such things didn’t even exist here. It was just two people loading data from a website. Note that a commenter on my prior post asked this exact same question--whether "link guessing" was illegal--and I was noncommital. I guess now we have our answer.

The government’s indictment makes the activity sound far more nefarious, of course. It claims that AA “impersonated” an iPad. This allegation is a bit odd: the script impersonated an iPad in the same way that you might impersonate a cell phone by loading http://m.facebook.com to load the mobile version of Facebook. Go ahead, try it and you’ll see – Facebook will think you are a cell phone. Should you go to jail?

So, readers might say, what’s the problem here? AA should not have done what he did – he should have known that AT&T did not want him downloading those emails. Yeah, he probably did know that. But consider this: AA did not share the information with the world, as he could have. I am reasonably certain that if his intent was to harm users, we would never know that he did this – he would have obtained the addresses over an encrypted VPN and absconded. Instead, AA shared this flaw with the world. AT&T set up this ridiculously insecure system that allowed random web users to tie Apple IDs to email addresses through ignorance at best or hubris at worst. I don’t know if AA attempted to inform AT&T of the issue, but consider how far you got last time you contacted tech support with a problem on an ISP website. AA got AT&T’s attention, and the problem got fixed with no (known) divulgence of the records.

Before I get to academia, let me add one more point. To the extent that AA should have known AT&T didn’t desire this particular access, the issue is one of degree not of kind. And that is the real problem with the statute. There is nothing in the statute, absolutely nothing, that would help AA know whether he violated the law by testing this URL with one, five, ten, or ten thousand IDs.  Here’s one to try: click here for a link to a concert web page deep link using a URL with a numerical code. Surely Ticketmaster can’t object to such deep linking, right? Well, it did, and sued Tickets.com over such behavior. It claimed, among other things, that each and every URL was copyrighted and thus infringed if linked to by another. It lost that argument, but today it could just say that such access was unwanted.  For example, maybe Tickemaster doesn’t like me pointing out its ridiculous argument in the tickets.com case, making my link unauthorized. Or maybe I should have known because the Ticketmaster terms of service says that an express condition of my authorization to view the site is that I will not "Link to any portion of the Site other than the URL assigned to the home page of our site." That's right, TicketMaster still thinks deep linking is unauthorized, and I suppose that means I risk criminal prosecution for linking it. Imagine if I actually saved some of the data!

This is where academics come in. Many, many academics scrape. (Don’t stop reading here – I’ll get to non-scrapers below.) First, scraping is a key way to get data from online databases that are not easily downloadable. This includes, for example, scraping of the US Patent & Trademark Office site; although data is now available for mass download, that data is cumbersome, and scraper use is still common. That the PTO is public data does not help matters. In fact, it might make it worse, since “unauthorized” access to government servers might receive enhanced penalties!

Academics (and non-academics) in other disciplines scrape websites for research as well. How are these academics to know that such scraping is disallowed? What if there is no agreement barring them from doing so? What if there is a web-wrap notice as broad as Ticketmaster's, purporting to bar such activities but with no consent by the user? The CFAA could send any academic to jail for ignoring such warnings—or worse—not seeing them in the first place. Such a prosecution would be preposterous, skeptics might say. I hope the skeptics are right, but I'm not hopeful. Though I can't find the original source, I recall Orin Kerr recounting how his prosecutor colleagues said the same thing 10 years ago when he argued the CFAA might apply to those who breach contracts, and now such prosecutions are commonplace.

Finally, non-scrapers are surely safe, right? Maybe it depends on if they use Zotero. Thousands of people use it. How does Zotero get information about publications  when the web site does not provide standardized citation data? You guessed it: a scraper. Indeed, a primary reason I don’t use Zotero is that the Lexis and Westlaw scrapers don’t work. But the PubMed importer scrapes. What if PubMed decide that it considered scraping of information unauthorized? Surely people should know this, right? If it wanted people to have this data, they would provide it in Zotero readable format. The fact that the information on those pages is publicly available is irrelevant; the statute makes no distinction. And if one does a lot of research, for example, checking 20 documents, downloading each, and scraping each page, the difference from AA is in degree only, not in kind.

The irony of this case is that the core conviction is only tangentially a problem with the statute (there are some ancillary issues that are a problem with the statute). “Unauthorized access” and even “exceeds authorized access” should never have been interpreted to apply to publicly accessible data on publicly accessible web sites. Since they have, then I am convinced that the statute is impermissibly broad, and must be struck down. At the very least it must be rewritten. 

Posted by Michael Risch on April 9, 2013 at 10:21 PM in Information and Technology, Web/Tech | Permalink

TrackBack

TrackBack URL for this entry:
https://www.typepad.com/services/trackback/6a00d8341c6a7953ef017c387b96e0970b

Listed below are links to weblogs that reference Academics Go To Jail – CFAA Edition:

Comments

Curses, foiled again! I've responded to Orin's comment over on Michael's cross-post on Madisonian to get around the spam filter problem.

Posted by: Bruce Boyden | Apr 11, 2013 2:09:44 PM

Orin, that looks like a nice bright-line technological distinction, but I think it breaks down into a mushy social one on closer examination. First, the web is not really a publishing platform in the sense that everything put on it is public. There are networked computers, but some parts of the network are private and some are not. Pages on both private and public portions are the network are written in HTML and requested and transmitted using HTTP via a web browser. Merely knowing that something is on the network somewhere and retrievable by a web browser doesn't really tell us whether access by general members of the public is authorized or not, even as a default.

As to your proposed distinction -- pages retrievable by typing stuff in the address bar are public, pages that require typing something into a field on a page are not -- strikes me as too narrow and too broad. Too narrow because it's possible to create a login page that transmits the login information entered in fields on the page -- username and password -- in the URL, via a "GET" request. That's a password-type control that demands login credentials, just the same as any other login page, and I think most people would say that account pages retrieved by typing in the right username and password are not public. Sure, it's *dreadfully insecure*, but whether access is authorized or not shouldn't depend on the strength of the security measure, as I think you yourself have stated, what matters is the signal the security measure sends. And I don't think that the particular portion of the page request where the password is transmitted to the site should matter either. For another example, how about a buffer overflow or SQL-injection attack that either retrieves restricted data or results in administrator access to the server? My understanding is that both can be accomplished through the URL portion of a page request. But certainly both are unauthorized access, even though any member of the public could type malformed URLs into their browser and achieve the same result.

It's also too broad. There are sites that require login and passwords, but where defeating that requirement seems questionable as unauthorized access. I'm thinking of sites that say, e.g., "No government agents allowed. If you are not a government agent, type "NO" to be allowed entry." Typing NO lets you into the site. But it's not really a password control that visitors understand keeps the pages restricted only specific people previously designated by the site owner. The same with sites like newspaper sites that provide free access, based on providing only an email address. Suppose someone finds a way around the login page for such a site (other than by typing something into the URL field of their browser). Is that unauthorized access? The site is essentially open to the public after a trivial hurdle. I can see a jury saying that bypassing that hurdle, like my "Type NO to proceed" or even just clicking a button, does not have the social significance necessary to make entry trespass, just as it might make that determination in a real-property type situation.

Posted by: Bruce Boyden | Apr 11, 2013 1:42:36 PM

Michael, I know you're a guest and wouldn't have known, but please don't retrieve Bruce and Orin's posts. We have our filter specifically set up to make sure the two of them, and just them, can't post.

Posted by: Paul Horwitz | Apr 11, 2013 10:18:30 AM

Pulled yours out, Orin. The filter is obviously on overdrive.

Posted by: Michael Risch | Apr 11, 2013 7:46:26 AM

Uh oh, I think it happened to my comment, too.

Posted by: Orin Kerr | Apr 10, 2013 11:46:08 PM

Bruce,

In my view, access to information that can be retrieved by entering a URL line into a web browser is authorized access. The World Wide Web is a publishing platform: If you set up a computer such that it responds with information when a particular URL line is entered, then you are publishing that information and have authorized access to it. See Pulte Homes v. LIUNA (6th Cir.2011). In my view, United States v. Phillips is distinguishable because it involved entering in a password into a prompt on a webpage rather than just visiting a webpage. As the Court noted, "Neither Phillips, nor members of the public, [could] obtain such authorization from UT merely by viewing a log-in page, or clicking a hypertext link." Rather, once at that webpage was loaded, the user could try to enter in a password and try to gain access to the information.

Posted by: Orin Kerr | Apr 10, 2013 10:03:28 PM

The spam filter does not appear designed with verbose law profs in mind. Or perhaps my comments are particularly spammy.

Posted by: Bruce Boyden | Apr 10, 2013 4:32:18 PM

I pulled it out of the spam filter for you. Oy.

Posted by: Michael Risch | Apr 10, 2013 4:19:59 PM

Typepad is not letting me reply.

Posted by: Bruce Boyden | Apr 10, 2013 4:14:14 PM

Orin, I've got a question pending too, which is how is typing a particular code into a URL accompanying a page request for a page that a reasonable person would think was not meant to be seen by the public meaningfully different from typing a password that in combination with a page request leads to a page that a reasonable person would think was not meant to be seen by the public? In neither case, let's say, are there terms of service expressly prohibiting such access. Assuming that you think US v Phillips is correctly decided (which you may not, perhaps your position is that the very idea of computer trespass makes no sense), there are some circumstances where using information to navigate to a page in a way that any member of the public could do subjects a person to criminal sanctions -- the question is just where the line is. I'll take a crack at drawing the line at where social conventions would clearly put it. Obviously social conventions with respect to Internet use are still not fully developed even though we are nearly 20 years past the launch of the web. But I think that's the only line that we have that makes password-guessing unauthorized access.

I think terms of service are basically irrelevant. As I said above, what the website operator puts in T&C that no one ever reads is not the right place to look for whether access is authorized or not. To analogize to real space (I think the analogies at some point get twisted enough to be not helpful, but not in this respect), the idea of notice through terms and conditions is a bit like posting No Admittance signs inside broom closets. Of course, the next question is, OK, what about a site that puts on every page "unauthorized visitors not allowed," but otherwise doesn't restrict access in any way (not even by requiring hard-to-guess unpublished URLs) -- the problem with analogizing this to real space is that we're talking about web pages that, if there's a link to them, there is generally a practice of the public freely visiting them that is hard to imagine in real space -- a shopping mall in which some open areas of the mall are designated "no admittance," but without any walls or even rope lines, perhaps. I think I'd go so far as to say that social convention *overrides* site owner wishes in that instance, so that making the prohibition express is not sufficient. But I need to think that through more carefully, and in any event that's not this case.

Posted by: Bruce Boyden | Apr 10, 2013 4:04:27 PM

Certainly for criminal punishment, I would agree it has to be pretty well established that the norms prohibit access, not merely murky. Lots of sites claim to prohibit some (usually not all) scraping, but since Google and other search engines do it, it seems at least "murky." So perhaps that's what the dispute really should be about here -- not whether link-guessing is trespass, but link-guessing to get this sort of page, and whether that's *clear*. I think it's reasonably clear, but beyond-a-reasonable-doubt clear? I'm less sure.

Posted by: Bruce Boyden | Apr 10, 2013 12:54:30 PM

Bruce, in your view, what kind of visiting a public website is "off limits" to a reasonable person, such that visiting the website is a federal crime? If the website owner publishes Terms of Service prohibiting the visit, does that ward off a reasonable person? What if there are no Terms of Service?

Posted by: Orin Kerr | Apr 10, 2013 12:52:40 PM

Bruce, I agree with you in general. One could, for example, liken the numeric IDs to "passwords" where he simply guessed at them to see which ones worked. I tried to address this with my "difference in degree, not in kind" discussion. You allude to the same when you talk about "severity of the misconduct."

My concern, though, is that when it comes to scraping for research, it is unclear what a reasonable person would expect and what constitutes sufficient severity. I've spoken with people who scrape over the years, and they all use different methodologies.

Posted by: Michael Risch | Apr 10, 2013 12:32:15 PM

There's two issues here, and I think they need to be distinguished. One is whether "computer trespass" can be defined solely by looking at the intentions of the *owner*, as manifested in some document no one has ever read (as my contracts professor once said "literally, no one" -- perhaps not even the person who drafted it). I think that's wrong-headed for all of the reasons you and Orin have identified.

But there's a separate issue presented by this case, and I'm less convinced by your arguments. What about access to a computer that a reasonable person should know is not permitted? I think this removes the sting from your deep-linking and academic scraping hypotheticals. We don't require every property owner to put up signs at every entrance to their house or buildings specifically forbidding entry -- socially understood cues that everyone is presumed to know serve the purpose of indicating where entry is and is not permitted. You keep saying that the page was "publicly accessible," but that seems to me to presume the answer you are trying to reach, which is that some sort of barrier -- a locked door -- is necessary to constitute trespass. What about intentionally accessing pages that the site owner obviously (not just possibly or inconspicuously) intends to keep private (not just accessible through a particular method)?

How is deciphering likely links for account pages meaningfully different from the behavior in United States v. Phillips, 477 F.3d 215 (5th Cir. 2007), in which the defendant accessed a publicly accessible page, and entered information guessing at how to proceed beyond that page -- i.e., people's login passwords? One difference is the severity of the misconduct -- one is just getting email addresses (without even associated names), the other was trying to steal financial information -- but that's all about what the purpose of the trespass was, not whether there was one.

Posted by: Bruce Boyden | Apr 10, 2013 12:15:07 PM

Michael writes: "That's right, TicketMaster still thinks deep linking is unauthorized, and I suppose that means I risk criminal prosecution for linking it. "

Indeed, the same office that prosecuted AA also brought a criminal case under the CFAA for circumventing Ticketmaster's restrictions on buying multiple tickets. The defendants pled guilty after the court denied a motion to dismiss. The case is United States v. Lowson, and the district court's opinion is here: http://www.jdsupra.com/legalnews/opinion-denying-motion-to-dismiss-superc-07377/

Posted by: Orin Kerr | Apr 9, 2013 11:42:04 PM

The comments to this entry are closed.