The Overcollection Problem Identified in the 2011 FISC Opinion

The FISC’s newly-declassified 2011 Opinion on the NSA’s implementation of Section 702 surveillance is both dense and fascinating. In this post, I thought I would just bring readers up to speed on the basic factual problem identified in the opinion (at least to the extent I can understand it). In later posts, I’ll consider the legal implications of those facts, both as Judge Bates interpreted them and as I see them.

Here’s the context. Under Section 702 of the Foreign Intelligence Surveillance Act, the law permits wiretapping of communications from “targeting of persons reasonably believed to be located outside the United States to acquire foreign intelligence information.” The government has to offer its plan for how to do that to the FISC, and the FISC needs to determine whether the government’s “procedures are reasonably designed to” achieve compliance with the statute.

In this opinion, Judge Bates concludes that one aspect of the government’s procedures failed that test: specifically, the NSA’s “upstream collection” protocols. To understand what that means, you need to realize that the government can get Section 702 information in two ways: 1) By going directly to the major providers such as Facebook, Microsoft and Google and getting data from them, or 2) by installing its own devices at major Internet hubs and scanning for traffic. According to the opinion, 91% of the data collected under Section 702 involves data directly from ISPs, the so-called PRISM program.

The FISC opinion deals only with the remaining 9% of Section 702 acquisition, which is obtained directly by the NSA in what the opinion calls “upstream collection.” “Upstream collection” is ther term used for collection using the NSA-installed surveillance tools installed an Internet traffic hubs; it is “upstream” in the sense that it collects the traffic before it has reached individual providers like Facebook or gmail. The 9% of Section 703 traffic picked up through “upstream collection” is still a ton of traffic: According to the opinion, in the 1st 6 months of 2011, the 9% of traffic involved 13.25 million “Internet transactions.” As best I can tell from Footnote 23, “Internet transactions” are sets of packets of Internet traffic that belong together.

The NSA’s tools are programmed to filter for Internet communications that fall within the parameters allowed by law. In this opinion, though, Judge Bates is trying to determine whether the tools are working in a way that satisfies the statutory standard. How can he know that? He’s just a judge in chambers; he can’t know how the protocols are working Internet-wide. At his request, however, the NSA manually looked at a sample of about 50,000 collections of the 13 million in the previous six months to get a snapshot of what is happening. Judge Bates figured that the protocols probably will work in the future in the same way they did in the past. By extrapolating from the study of the 50,000 collections, Judge Bates can get a rough sense of whether the NSA’s collection protocols are sufficiently careful that they only collect the communications that they are supposed to collect.

The apparent problem with the NSA’s filter involves what the opinion refers to as MCTs, “multi communication transactions.” As the Administration has explained, MCTs are packets of internet traffic that combine multiple messages:

One example of this is if you have a webmail email account, like Gmail or Hotmail or something like that, you know that when you go and you open up your email program, you will get a screenshot of some number of emails that are sitting in your inbox. In the case of my server, what I get is the date of the email, the sender, the subject line, and the size of the email message. But I may get 15 of them at one time.

Those are all transmitted across the Internet as one communication, even though there are 15 separate emails mentioned in them. And for technological reasons, NSA was not capable of breaking those down into their — and still is not capable — of breaking those down into their individual components.

When that happened, the NSA was inadvertently collecting parts of communications involving purely domestic communications. The NSA’s review found that of the 50,000 communications reviewed in its sample, about 5,000 were MCTS. Of the 5,000, a total of 10 of these MCTs were known to have at least one purely domestic communication embedded in it. Judge Bates then estimated that it was probably the case that on the order of thousands of instances of the NSA unintentionally collecting purely domestic communications occurred per year because of the technology of bundling messages together in ways that the NSA could not separate out.