Skip to main content

Currently Skimming:

5 Controlling Usage of Collected Data
Pages 59-77

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 59...
... In understanding the security of any computer system, it is important to be clear about the "threat model," that is, the set of threats that the system must be defended against. In the context of bulk collection, there are three broad classes of threats: • Entities outside the Intelligence Community (IC)
From page 60...
... 5.2  CONTROLLING USAGE Chapter 4 states the committee's conclusion that refraining entirely from bulk collection will reduce the nation's intelligence capability and that there is no kind of targeted collection that can fully substitute for all of today's bulk collection. However, the committee believes that controlling the usage of data collected in bulk (and indeed all data)
From page 61...
... Deterrence requires technical capabilities to detect access, to identify the (authorized) accessing party, and to audit records of access to spot suspicious patterns of access.3 In addition, both 3 Note that, to date, the only allegations that information collected in bulk has been used for an unauthorized purpose was the so-called "LOVINT" set of incidents in which some NSA analysts inappropriately used this data to track the activities of significant others.
From page 62...
... If a policy decision is made to continue bulk collection, protection of privacy and civil liberties will necessarily rely on these rules. Once the results of a query are delivered to an analyst, other means must be used to control proper use of the data between queries and disseminated intelligence reports.
From page 63...
... Information flow control cannot do the kind of queryspecific control that is described in this chapter; instead, it tends to push computed outputs to the highest level of classification, which is not useful in practice. However, it is the best technique known at present.
From page 64...
... has its own Civil Liberties and Privacy Office,6 and the ODNI Office of General Counsel and the Department of Defense Office of General Council have responsibility for oversight as well. Continuing external oversight is provided by the Department of Justice, congressional oversight committees, and the Foreign Intelligence Surveillance Act (FISA)
From page 65...
... Purely automatic control of usage would mean that the rules would be enforced automatically using published mechanisms. Then people outside the IC concerned about privacy and civil liberties would not have to trust that the IC has adequate procedures and follows them, which many of them are reluctant to do.
From page 66...
... Figure 5.2 shows the elements of this method, which is closely related to the standard access control method used in cybersecurity. The bulk data is cut off from the outside world by an isolation boundary.
From page 67...
... Traditionally in computer security, the guard implements an access control policy that, as shown in Figure 5.2, would specify which analysts are allowed to access which items of bulk data, by attaching to each data item some description of the analysts authorized to access it. NSA has reported that its analysts use some variant of this scheme within a private cloud.8 Although it is a useful line of defense, this mechanism cannot express more complex policies such as, "Report all contacts that are one hop away from this target and were in Afghanistan during the communication." 8 Dirk A.D.
From page 68...
... Furthermore, some kinds of preprocessing of the data may be much less effective, for example, working out all the tightly knit cliques of people who communicate with each other a lot, so that it is possible to quickly find all the cliques that an individual belongs to. Federation has clear advantages for safeguarding privacy and enforcing policies: • The federated parties are separate from the intelligence agency and may have no incentive to break the rules, which would help ­ eassure those r who are concerned that NSA may have incentives to break the rules.9 • One party's misbehavior exposes only some of the collected data.
From page 69...
... Here are a few examples: • Airgap. The most secure and most expensive isolation boundary is an airgap: separate physical machines, or networks of physical machines, inside and outside the isolation boundary that is breached only by a carefully controlled network connection.
From page 70...
... In between separate physical machines and separate virtual machines is a fairly new way of doing isolation, called an enclave in the implementation, developed by Intel. This is like a virtual machine, but its isolation is provided directly by the central processing unit (CPU)
From page 71...
... The Apache Accumulo open-source database, for example, has this feature; it was originally developed by NSA, which transferred it to Apache, an organization that develops open-source software for the Internet. This kind of tagging is the standard way of doing access control in computer security; it is helpful for controlling usage of collected data, but not sufficient for enforcing a rule such as "trace contacts for at most two hops," which restricts the algorithm that processes the data rather than access to the data itself.
From page 72...
... If the target is X, and each database entry represents a call detail record with a triple , verifying the proof means checking that every result endpoint Y is in an entry or . Note that this does not prove that the result is correct, but it does prove that no extra information is disclosed.
From page 73...
... Trusted Untrusted Figure 5-5 Computing Base Storage Read Guard Bulk data block Encrypted processing Verify/ Data bulk data Decrypt + MAC Audit Policy log Host (airgap, hardware, virtual machine monitor, operating system, etc.) FIGURE 5.6  Smaller trusted computing base by encrypting bulk data at rest.
From page 74...
... Many practical queries fall into one of these categories, and it is not too hard to modify an existing database system to make these queries work entirely on encrypted data.15 Work using an encrypted search may yield useful results in the future; see Section 6.3.1. The idea behind homomorphic encryption is that any basic computation on encrypted data, such as adding two numbers, comparing two strings for equality, or sorting a list of items, can be done (slowly)
From page 75...
... 5.4.2  Restricting Queries Automatically Restricting queries automatically is another way to control usage. The goal is to do this well enough that software can decide which queries are Figure 5-7 allowed by the policy, or at least drastically reduce the number of queries that require human approval.
From page 76...
... Chapter 6 discusses some possible improvements. 5.5 CONCLUSION This chapter has reviewed a variety of feasible mechanisms, both manual and automatic, for controlling the way that collected data is used.
From page 77...
... Increased transparency can give people outside the IC more confidence that the controls are appropriate, although the need for secrecy about some of the details makes complete confidence unlikely. Whether any given method should actually be deployed is a policy question that requires determining whether increased effectiveness and apparent transparency is worth the cost in equipment, labor, and potential interference with the intelligence mission.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.