Every time you send an email or a text message or save a file to the cloud, pieces of data related to your activity are created—the type of computer you are using, your IP address, where you are, the types of files that you’re uploading. Big data[1] and the tools that have been developed to aggregate, store and use big data create a number of significant legal, ethical and privacy-related issues.

Law School Courtyard at Yale Law School, New Haven, CT.

In the summer of 2013, it came to light that companies are not the only ones finding a use for consumer data. Documents that former United States (US) National Security Agency (NSA) contractor Edward Snowden leaked to The Guardian revealed that the US government has been using various programs to collect data on US and foreign citizens alike. Yale Law School hosted a Big Data Symposium on April 6, 2014 to address the impact that the US government’s mass surveillance is having on US relations with other countries.[2] The event brought together notable scholars and experts in national security, information privacy, civil liberties and international law who shared their thoughts on the international implications of the US government’s mass surveillance and collection of big data. Roshni Patel attended the symposium on behalf of the WMF legal team to get an update on the latest developments in privacy law.

Like individuals and organizations all over the world, WMF was surprised to learn the extent of US government domestic and foreign surveillance. As an organization with users and contributors all over the world, the debate about international law and how privacy rights are protected internationally is relevant to WMF and its projects. This post describes some of the interesting and pertinent topics that were covered at the symposium.

Finding viable foreign intelligence information: a needle in a haystack

The first panel discussed the tension between finding viable intelligence information and minimizing the amount of data that is collected. The global population creates an enormous amount of data using modern technology and that amount is growing at an exponential rate.[3] Technology has also allowed us to collect and analyze that data.[4]

However, when the data of a US person is collected in the course of a government agency’s investigation of a foreign person, that data is subject to minimization requirements, meaning that the data is retained for the least amount of time possible and its dissemination is prohibited.[5] Following the Snowden revelations, President Obama issued Presidential Policy Directive 28 (PPD-28), which restricts the collection of data to foreign intelligence and counterintelligence purposes and, for the first time, extended minimization procedures to data collected about foreign persons. Although this ostensibly seems to limit the information that agencies can collect, minimization principles apply only when it is clear that something is not foreign intelligence information. People being targeted are rarely upfront about their intentions and plans, and it ends up being extremely difficult to winnow out information that has “foreign intelligence” value.

With vast amount of data at its fingertips through programs like XKeyscore how does a government agency like the NSA implement minimization procedures? Data is only kept briefly before a subset of data is selected for further analysis and the rest is discarded. That subset is selected through targeted searches. Rather than simply searching the data collected domestically for the word “bomb,” agents instead conduct searches based on hard selectors such as phone number, email address, or IP address.

But how are these hard selectors generated? One way of extracting hard selectors is through the use of soft selectors. Soft selectors are more general characteristics, such as what language the communication is in, what region the communication is to or from, which characteristics of the software are being used, what type of encryption suite is being used, what operating system is being used, or what cookies are present. A set of soft selectors can be used to find communications in the data pool, then hard selectors are pulled from those communications. So while the end result is a targeted search for anyone sending an email to xyz@serviceprovider.com, the backend of this process often involves analysis using abstract characteristics.

The role that international law plays in government surveillance

After the panelists described some of the intricacies of data collection and analysis, the focus shifted to the legality of mass surveillance programs that target foreign citizens under international law. The second panel addressed two important questions: whether international law applies and whether international law is violated by mass surveillance programs.

The International Covenant on Civil and Political Rights (ICCPR) is perhaps the world’s most significant human rights instrument, and Article 17 of ICCPR guarantees every individual the right to be free from “arbitrary or unlawful interference with his privacy, family, home or correspondence.” Professor Peter Margulies discussed whether the ICCPR has extraterritorial application, meaning whether a country is obligated to protect privacy rights outside of its territory.

Article 2.1 of the ICCPR obligates a member country “to respect and to ensure to all individuals within its territory and subject to its jurisdiction the rights recognized” in the ICCPR. The US government holds the position that there are two criteria that must be met in order for the ICCPR to control: the individual must be within US territory and must be subject to its jurisdiction. This interpretation has enormous implications when it comes to mass surveillance. Because most intelligence gathering involves foreign citizens, the US can argue that the ICCPR does not apply and thus the US is not obligated to respect the privacy rights of foreign individuals. The European Court of Human Rights (ECHR), on the other hand, holds the position that countries are obligated to respect privacy rights if they are in effective control of a territory. The concept of effective control has not been defined in the realm of electronic communications. Professor Margulies would argue that this applies to communications where we have virtual control, and that if we have the ability to intercept and shape or alter data abroad, we are in virtual control of that data.

April Glaser, staff activist at EFF, argued that mass surveillance is a violation of international human rights laws such as ICCPR and presented the International Principles on the Application of Human Rights to Communications Surveillance. These principles were put together by legal experts and activists to advise countries on how to comply with international law. [6]


Overall, the Big Data Symposium presented an interesting international perspective on US government surveillance. Other topics that the panelists covered include data localization measures taken by foreign governments, pending legislation in the European Union, and the possible applicability of the law of war in cases where surveillance programs provide information used in drone strikes. Interested readers can watch recordings of the panels on the event’s website.

Roshni Patel, Privacy Fellow

Yana Welinder, Legal Counsel

  1. Big Data has been defined as datasets “whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.” Ann Cavoukian, Ph.D. & Jeff Jonas, Privacy by Design in the Age of Big Data 3 (2012).
  2. The Symposium was hosted by The Information Society Project at Yale Law School, the Yale Journal of Law & Technology, and the Oscar M. Rubenhausen Fund.
  3. According to Ronald Lee, a partner at Arnold & Porter focusing on national security law, in 2012, the world’s population created an estimated 2.5 exabytes of data per day. That is equivalent to streaming 95,000 years of high definition movies. Using Google Fiber, which is 100x faster than the average broadband speed, it would take 633 years to download that much data. The amount of data that exists is growing at an exponential rate—90% of the data available in 2012 was created within previous two years.
  4. One of the Snowden revelations was a secret computer system used by the NSA called XKeyscore that collects content information and metadata about nearly everything that individuals do on the internet. The system indexes the data, allowing intelligence agents to conduct searches by, for example, a Facebook user name, type of browser used, or the language in which internet activity was conducted.
  5. Foreign Intelligence Surveillance Act (FISA) Amendments Act of 2008, Section 702 authorizes “the targeting of persons reasonably believed to be located outside of the United States to acquire foreign intelligence information.” If the data of a US person is collected, it must be subject to minimization procedures, defined as “procedures… that are reasonably designed … to minimize the acquisition and retention, and prohibit the dissemination of non-publicly available information concerning unconsenting United States persons.”
  6. The Wikimedia Foundation has since signed on to the Principles.