Ethical and Legal Requirements Associated with Data Dissemination
Session II of the workshop was structured to provide an overview of the ethical and legal issues related to data dissemination. Mary Ann Baily, Institute for Ethics, American Medical Association, presented a paper, commissioned for the workshop, titled “Regulating Access to Research Data Files: Ethical Issues.” Donna Eden, Office of the General Counsel, U.S. Department of Health and Human Services, outlined her assessment of recent and prospective legislative developments. Finally, Thomas Puglisi of the Office for Protection from Research Risks (OPRR), National Institutes of Health, provided an overview of the role of institutional research boards (IRBs).
Mary Ann Baily described underlying ethical issues raised by the use of microdata, especially longitudinal data that are linked to administrative records. In so doing, she articulated the conflicting rights and obligations of data subjects, producers, and users, and the role of government in providing a structure within which these conflicts can be resolved.
Baily outlined positions at both extremes of the policy debate over data access, then made the case for pursuing a middle ground—striking a balance between the right to be left alone and the obligation to cooperate in the pursuit of communal goals. She discussed activities that are essential to setting appropriate limits on data use and the organizational framework required to carry out these activities. She concluded with observations on the
problem of translating societal recognition of a moral right to privacy into enforceable public policy.
Cases For and Against Unrestricted Access to Microdata for Research
In fields such as health care, education, and economic policy, research based on microdata can illuminate the nature of social problems and the effects of public and private actions taken to ameliorate those problems. Members of society, including data subjects, can benefit from more efficient use of their pooled funds (such as tax dollars and health insurance premiums). Researchers themselves benefit as well, in terms of advancing research programs and fulfilling career goals.
Given these benefits, and given that the cost—often publicly funded—of developing databases is high, it may be asked why databases should not be made widely available to all. Such a policy would maximize benefits, and any costs associated with the additional access could be charged to users when appropriate.
The primary objection to a policy of unrestricted access arises from the potential effect on data subjects, since disclosure of personal information can be harmful. Disclosure of such information may result in being arrested for a crime, being denied eligibility for welfare or Medicaid, being charged with tax evasion, losing a job or an election, failing to qualify for a mortgage, or having trouble getting into college. Disclosure of a history of alcoholism, mental illness, venereal disease, or illegitimacy can result in embarrassment and loss of reputation. Less directly, research results based on personal data can cause harm by affecting perceptions about a group to which a person belongs.
Even in the absence of such concrete effects, disclosure can be seen as a harm in itself, a violation of the fundamental right to privacy, derived from the ethical principle of respect for individual autonomy. Informational privacy has been defined as “the claim of individuals, and the societal value representing that claim, to control the use and disclosure of information about them” (Fanning, 1998:1). Gostin (1995:514) highlights the importance of respect for privacy to the development of a sense of self and personhood: “It is difficult to imagine how, in the absence of some level of privacy, individuals can formulate autonomous preferences, or more basically, develop the capacity to be self-governing.” A lack of respect for privacy makes people reluctant to trust others with personal information; for example, they may conceal sensitive information needed by their physicians to provide effective treatment.
Those who argue for recognition of a strong right to informational privacy claim that access to data about individuals should require their explicit consent. Advocates of this position may acknowledge the research potential of personal information databases developed by public and private entities;
many accept the use of such data for socially useful research if the data are aggregated or otherwise processed to prevent identification with a specific person. They maintain, however, that if the data are personally identifiable, researchers must persuade the data subjects to agree voluntarily to each use. To holders of this view, then, the appropriate policy is no access to personally identifiable data without explicit, informed consent.1
Restricted Access: What Limits Are Appropriate?
Since unrestricted access can cause harm to individuals and also conflicts directly with respect for individual autonomy, it is not an appropriate policy. On the other hand, requiring explicit, informed consent for any access to personally identifiable data is also problematic. On a practical level, having to obtain meaningful informed consent for every use of data would make much valuable research prohibitively expensive. Meeting this requirement would be costly not only to the research enterprise, but also to data subjects, who would have to spend time submitting to the process.
On a philosophical level, such a policy is focused solely on an individual right and ignores individual responsibilities. The right to informational privacy has never been considered absolute. Governments must collect personal information to function, and members of society have a civic duty to cooperate. For instance, the U.S. Constitution requires that there be a decennial census. Governments require research results to determine areas in which policy action is needed and what form it should take; additional research is needed to determine whether policies have been effective. Research is also an essential element in the support of individual civil rights and the right to a fair trial. Moreover, private organizations must be able to produce research results to carry out their roles effectively. Individuals cannot refuse to provide personal information to private entities such as educational institutions, health care delivery organizations, and employers unless they are willing to forego education, health care, and employment.
In some cases, research can be done on data collected from volunteers, but in others, unacceptable bias would result. Moreover, if data collected from voluntary subjects for one purpose should later become essential for an
Informed consent is defined in Private Lives and Public Policies as “a person's agreement to allow personal data to be provided for research and statistical purposes. Agreement is based on full exposure of the facts the person needs to make the decision intelligently, including any risks involved and alternatives to providing the data” (National Research Council and Social Science Research Council, 1993:23). See Chapter 3 of the same report for a full description of the terminology, as well as of the historical development and use of informed consent and notification procedures.
unforeseen but even more beneficial purpose, obtaining individual consent to the new use may be impossible, or at least prohibitively expensive.
For the above reasons, it is unreasonable to allow individuals complete authority to control whether and on what terms they participate in socially important research. Therefore, a balance must be struck between the right to be left alone and the obligation to cooperate in the pursuit of societal goals. According to Bally, the appropriate policy is somewhere in between unlimited access to personally identifiable data and access only with explicit informed consent, with the chosen policy being supported by sufficient security to maintain confidentiality. The difficulty is in reaching agreement on just where the point of balance exists in a particular context. For instance, the appropriate policy is likely to be different for research databases than for other types of data, such as hospital records and marketing information.
Establishing and Enforcing Appropriate Limits
Baily suggested that three kinds of activities are inherently part of imposing appropriate limits on access to microdata:
Weighing of relative benefits and costs—First, data managers must develop information on the benefits associated with research use of personal data and the harm that could result from granting access. This activity includes investigating the attitudes of data subjects. Both benefits and costs may vary significantly in different contexts. The benefits must then be weighed against the costs to determine access policy, with guidance from and accountability to the community as a whole through democratic institutions.
Maintenance of confidentiality—Data managers must be able to enforce whatever limits are established. It is impossible to eliminate the possibility of improper use entirely, but security measures must be adequate to protect legitimate privacy interests. This requires developing information on the risk of misuse in each case, given the nature of the data and their potential uses, and tailoring security measures accordingly. There must also be effective sanctions for violations of confidentiality policies, aimed at preventing improper use, not merely punishing those responsible after misuse occurs.
Public education/notification/consent—Data managers must inform data subjects about information policy and obtain their consent to use of the data when appropriate. Decisions about what people must know, how they should be told, and when consent rather than simple notification is morally necessary are complex, however. It is generally agreed that respect for privacy requires openness about the existence of databases containing personal information and the uses made of the data, regardless of whether explicit consent is required for every use. In practice, however, principles of fair information practice—such as those that form the basis for the federal Privacy Act of
1974—are surprisingly ambiguous.2 There appears to be acceptance, if not explicit acknowledgment, of the fact that personal data will be collected, some of it without subjects' explicit, freely given consent.
In clarifying the obligations of data managers to inform and seek consent from data subjects, it is useful to think in terms of three levels. There should be a base level of education about the role of data and research in making society run well. The goal is to make sure people understand that information about them is collected and used, with confidentiality safeguards, as a matter of routine, and that it is their civic responsibility to accede to this in exchange for medical progress, an effective educational system, protection of their civil rights, and so on. This level implies a category of “ordinary” research uses producing substantial social benefits with a low risk of harmful disclosure. For research in this category, there is no obligation to notify data subjects about each use or to seek explicit consent, although there should be a way for subjects to learn what research is being done with the data if they wish to do so.
The second level pertains to research uses that differ substantially from the routine, making it reasonable to notify data subjects and provide justification for the use. For example, a new hypothesis about the cause of an illness might lead to new analysis of old data that promises significant benefits with little risk. Alternatively, a significant change in the underlying benefit/cost picture might lead to new kinds of research or new interest in variables not previously examined.
The third level pertains to uses for which explicit, informed consent is required. A research use might fall into this category because the potential for harm is significantly greater relative to societal benefits, or because the degree of actual or perceived harm varies substantially across individuals. Uses for private rather than social benefit also fall into this category.
The above categories suggest a way to think about informing potential participants of current and future uses of the data. The immediate research goals could be explained, and the participants could be informed that the data
These principles—as set forth by Alan Westin and cited by George Duncan in Chapman (1997:336)—are as follows: (1) there must be no secret personal data record-keeping system; (2) there must be a way for individuals to discover what personal information is recorded and how it is used; (3) there must be a way for individuals to prevent information about them that was obtained for one purpose from being used or made available for other purposes without their consent; (4) there must be a way for individuals to correct or amend a record of information about themselves; and (5) an organization creating, maintaining, using, or disseminating records of identifiable personal data must ensure the reliability of the data for their intended use and must take reasonable precautions to prevent misuses of the data.
might also be used for routine socially beneficial research in the future, with confidentiality safeguards and without further notification. An important issue to be addressed is where the linkage of survey data (especially longitudinal data) to administrative records falls along this spectrum. Can confidentiality safeguards and accountability be effective enough, and the risk of harm to individuals low enough, to allow most such linkages to be considered “routine” or “ordinary” research uses in the sense discussed above?
Baily concluded her presentation by offering the following assessment:
There is consensus on the existence of a right to informational privacy, but not on the extent to which policies should go to protect that right or on how to implement such policies in practice. In a pluralistic society, translating a moral right into enforceable policy is a political problem; inevitably, no one is entirely satisfied with the result.
It is easier to reach an agreement most people can live with if people understand that the goal is a practical compromise among competing moral visions, not the triumph of their own point of view. Also, the process of achieving compromise must both be and be perceived to be one in which there is ongoing democratic accountability for what happens as a result.
Finally, it is easier to agree on change when the existing situation is unsatisfactory to nearly everyone, so that there is much to gain from an improved system. At present, personal privacy, even in highly sensitive areas such as medical information, is far less protected than most people realize. The opportunity exists to both improve safeguards on the use of data and increase access to data for socially useful research if the right policies are instituted.
RECENT AND PROSPECTIVE LEGISLATION
During her presentation, Mary Baily asserted that implementation of effective access and confidentiality policies requires carefully constructed social mechanisms. Paramount among these is a consistent national legal framework, designed to indicate to both data producers and users what standards are appropriate and to aid in imposing sanctions for misuse. The need for such a framework, which does not now exist, is a recurrent theme in the literature on privacy and data use. Confirming Baily's assessment, Donna Eden provided an overview of recent and prospective legislative developments. Speaking primarily about health data, Eden observed that coordinated federal privacy legislation to protect health records does not yet exist. This is somewhat of a surprise, given that Congress has been working to enact
comprehensive privacy legislation in health and other areas for many years. In fact, in the Health Insurance Portability and Accountability Act (HIPAA) of 1996, Congress set an August 1999 deadline for the enactment of privacy legislation to protect health records, and provided that if that deadline were not met, the Secretary of Health and Human Services would be required to issue regulations to protect the privacy of certain health data.
Consensus on privacy legislation has not been forthcoming; instead, protections are based on a piecemeal system that originates from both federal and state levels. The primary laws governing data access are the Freedom of Information Act, the Privacy Act, privacy and confidentiality laws specific to individual agencies or purposes, and state laws. The wide variability in statutes governing access to administrative records nationwide makes it difficult for researchers and others to understand applicable rules. Laws dictate what data can be collected, the linkages that can be made, and the protections that are required for data use.
Bills are pending that would regulate federal statutes applying to large databases, including proposals to mandate copyright protection for some. The first major component of legislation pertaining to health information is directed toward encouraging adoption of standards, particularly in the electronic context, to replace an array of currently existing formats. HIPAA requires the Secretary of Health and Human Services to adopt standards developed by organizations accredited by the American National Standards Institute whenever possible. The legislation requires that all health care information transmitted by health plans, clearinghouses, and those providers who conduct business using electronic transactions comply with these standards within 24 months (or, for small health plans, 36 months) of their formal adoption by the Secretary.
Standardization is slated to include the creation of unique national identifiers for health care providers, health plans, employers, and individual patients that will technically facilitate linkages across sources. This practice will ensure that different variables—for example, for diagnosis and procedure data—are coded identically across data sets. The standard for individual patient identifiers is on hold until comprehensive privacy protections are in place. One of the major gaps in current HIPAA requirements is that the standards will not apply to exactly the same data when those data are held by employers, some insurers, and government agencies.
HIPAA also requires the adoption of standards for the security of information transmitted or maintained electronically, and for electronic signatures used in standard health care transactions. The Department of Health and Human Services will issue compliance and enforcement requirements to provide assurance that, if information is misused, there will be redress. This move toward standardization will clearly impact researchers' data collection efforts, particularly with regard to the types of data that can be linked. Stan-
dardization will make reading of medical records much easier, and should significantly simplify the mechanics of data matching and analysis.
Because Congress failed to enact comprehensive health privacy legislation, the Secretary of Health and Human Services is now required to issue privacy regulations. These regulations will be based on recommendations for privacy legislation prepared by the Secretary for Congress in 1997. Statutory language requires that the privacy regulations address (1) recognition of the rights of individual subjects, (2) procedures designed to enforce those rights, and (3) the uses and disclosures permitted. There is as yet no clear administrative mechanism or funding to activate the provisions, which may create additional delays. Moreover, since the new privacy standards will be issued as regulations, not as a statute, state privacy laws will not be preempted; all existing federal statutes, such as the Privacy Act and the Public Health Act, also remain in place.
At this point, it is unclear how effective these efforts to control the uses of data and protect individual confidentiality will be. Uncertainty about effects on access and the potential for disclosure will persist until legislation is formulated. Eden reviewed several current circuit court cases that indicate possible directions the privacy legislation may take.
Under HIPAA, individual data subjects have some limited rights and protections. The statute provides both criminal and civil penalties for disclosure of data in violation of the various standards, along with very limited monetary penalties. Many of the workshop participants expressed the view that these rules need to be strengthened and criminal penalties stiffened. The Freedom of Information Act gives the public certain rights to data held by the federal government. Federal, state, and local governments authorize themselves to use data for basic operations and particular social purposes. Researchers and data collectors have very few explicit legal rights to data.
In conclusion, Eden offered her assessment of practical resolutions for current issues. Given that the prospects for passage of comprehensive privacy laws appear to be remote, she envisions a continued piecemeal approach to legislation. The potential for practical solutions may be greatest in the areas of copyright law and contracts. The Internet offers a wide range of possibilities for creating instant contracts and user agreements. Eden predicted a broad expansion in the use of click-on and other technology-facilitated agreements, most of which offer the promise of enforceability through existing contract law. Additionally, these mechanisms do not require special recognition by Congress of a separate private right of action. Agreement violators can be taken directly to state or, in certain cases, federal court. In some states, data subjects have the right to take legal action against a secondary user or licensee if terms and conditions are violated.
As noted earlier, a number of participants expressed skepticism about the ability of privacy laws to keep pace with technology and to effectively target
individuals and groups that pose the greatest risk to data security. The protection offered by HIPAA, as well as by regulations that will be adopted under this legislation, is limited by its restriction to health plans, clearinghouses, and those providers who conduct business electronically. Eden noted that the Secretary of Health and Human Services is on record as supporting the need for comprehensive federal legislation in this area.
ROLE OF INSTITUTIONAL REVIEW BOARDS
The IRB is the most direct regulatory link between researcher and research data. Thomas Puglisi presented a paper coauthored by Jeffery Cohen, also of OPRR, titled “Human Subject Protections in Research Utilizing Data Files,” providing an overview of IRB procedures, limitations, and prospects.
OPRR is in charge of enforcing federal policy for the protection of human research subjects. Nearly every executive branch agency of the federal government that supports human-subject research is a signatory to this policy, which makes researchers subject to a common set of regulations. The IRB process and the requirement for informed consent are the two core protections provided to individuals by these regulations.
Federally funded researchers become subject to the IRB process as soon as they access potentially identifiable private information about living individuals. Information is considered private if an individual could reasonably expect that it would not be used for purposes other than that for which it was provided. The IRB is charged with judging whether risks to data subjects are “reasonable” in relation to the anticipated benefits of data release and whether the risks are minimized by a sound research design, as well as with ensuring that informed consent is acquired and that confidentiality protections underlying data dissemination are adequate.
With regard to federally mandated informed consent, regulations require that subjects be notified about the degree to which the confidentiality of their records will be maintained. Meeting this requirement is not problematic for certain specific-use data sets, but it is typically not possible to provide accurate notification about information that will be linked or otherwise be incorporated in larger data sets. It can be difficult to predict the level of security for data that are extended beyond their original use, as is often the case with clinical or administrative data not initially collected as part of a defined research project.
Similarly, it is frequently impossible to acquire informed consent for research of the type discussed in the workshop session on case studies. In survey-based social science research, IRBs may waive the consent requirement if they find that three conditions are met: (1) the risk to subjects is minimal, (2) use of the information for research will not adversely impact the rights and welfare of the subjects, and (3) it would not be practicable to
obtain informed consent. The second and third conditions usually pose little difficulty; the first, however, requires more judgment. As noted above, there is no set of standards an IRB can apply in deciding whether a given application should be considered as involving minimal risk and whether the confidentiality protections in place are adequate. In fact, it probably is not possible to establish a universal standard, given the case-by-case variation in key parameters.
With regard to data access policy, Puglisi noted that the inherent challenge of the IRB mandate is in interpreting a standard, since the regulations do not establish one. Judgments must be made about what type of data can be released, and in what form, for each research project. The IRB must weigh the risk of information disclosure and the potential ramifications associated with it against the anticipated benefits to research and policy, and then decide on an appropriate level of protection.
During this process, IRB staff must first evaluate the nature and sensitivity of the data. Could disclosure put data subjects at risk of criminal or civil liability? Could it be damaging to financial standing, employability, insurability, or the reputation of an individual or group? Obviously, the more sensitive the information is, the more stringent the protections must be. Risk is a function of the level of identifiability (and data protection) and the sensitivity of the data.
The probability of disclosure and subsequent damage often appears trivial from the perspective of the researcher, but disclosure of private information does occur. Puglisi provided several real-world examples of inadvertent disclosure—one involving a case study and one involving exposure of notes at a professional meeting. He also noted instances in which data were released by investigators to news reporters and to congressional committees. The risks are real, but difficult to quantify.
Frequently, IRBs are advocated in proposed legislation as the vehicle for resolving data access and confidentiality tensions. There are hurdles to overcome, however, if IRBs are to serve this purpose optimally. IRB personnel are often no more competent than other groups to make unstructured decisions. Federal regulations do not require that IRB members have training in statistical disclosure limitation techniques and other methods of protecting confidentiality. Moreover, the federal requirement that IRBs have the scientific expertise to judge research they review is not being met uniformly. These factors make it impossible to carry out cost/benefit assessment, which is, strictly speaking, a violation of federal regulation.
Another problem pointed out by Puglisi is that IRB standards tend to become increasingly restrictive as more procedural constraints are adopted; staff who are inadequately trained may have incentives to err on the safe side. With no increase in the expertise of IRB personnel, this trend is likely to continue. IRB administrators have an obligation to acquire (either internally
or through outside consultants) enough professional expertise to combat fears that an institution will be held liable for a mistake. Communication between researchers and IRBs is also key to improving knowledge levels and enhancing the performance of IRBs.
Another important aspect of efficient IRB utilization involves matching a research project with the appropriate IRB. In most instances, a researcher's local IRB will not be the one best suited to evaluate research value and data security risks. The model Puglisi recommends would require that data access proposals be reviewed at the location where data are maintained. The host IRB is in the best position to balance research potential against confidentiality risks. This approach would allow the host IRB to play an educational role as well, which is appropriate since its staff should be most knowledgeable about specific data characteristics, research applications, and conditions under which data should be shared. In acting as the responsible gatekeeper, the host IRB could provide information to local IRBs that would help streamline the approval process. Researchers could submit judgments from the overseeing IRB, demonstrating to the local IRB that a knowledgeable, respected body has approved confidentiality protections. Robert Willis noted that this is essentially the model that has been implemented successfully at HRS. Given the expanding role of data enclaves, it is likely that, along with auditing procedures developed by the National Center for Education Statistics, the Bureau of Labor Statistics, and the National Science Foundation, IRBs will for the foreseeable future continue to be the central mechanism for monitoring researchers ' access to data.