Changing the Culture of Research
Key Messages Identified by Individual Speakers
• Given the substantial time and cost involved in conducting clinical trials, as well as the current system of career advancement in academia, academic researchers need incentives for releasing datasets, such as receiving credit for secondary analyses of their data.
• Clinical trial funders can influence the data-sharing actions of researchers by making grants contingent on compliance with data-sharing policies.
• Where journals can agree on principles and the means of enforcing those principles, they, too, can shape data-sharing policies.
• New policies at the European Medicines Agency on the release of clinical trials data could have implications for data sharing worldwide.
• Engaging patients in research and being open and honest with them can lead to patient-driven mechanisms for data sharing.
• As organizations increasingly offer data analysis services and medical advice over the Internet, the traditional health care and biomedical research enterprise may need to adapt to keep up with the changing culture.
The norms and expectations of the various groups involved in data sharing came up repeatedly during the workshop. In the context of the previous chapters, for example, rigorous standards and working mod-
els for data sharing can go only so far if not supported by the prevailing culture.
Many participants acknowledged the difficulty of changing well-established cultures. Educational preparation, vigorous enforcement, and consistent leadership are steps in the process of cultural change. Cultures also can be transformed by profound technological change, as is happening today with the application of Internet-based technologies and practices to health care.
Hans-Georg Eichler, senior medical officer at the European Medicines Agency (EMA), which regulates drugs and biologicals in Europe, discussed recent major policy changes at EMA regarding sharing of data from clinical trials. EMA is a public agency, said Eichler, and as a public body it is obliged to be fully transparent. The only exceptions, he said, are personal protected data and commercial confidential information. Given the overriding public health interest, EMA recently has taken the position that clinical trial data will no longer be considered commercial confidential information (Eichler et al., 2012). This has “huge implications,” according to Eichler.
Currently, EMA is providing trial reports retroactively with personal information redacted. In the future, however, it will publish trial reports proactively. The next step is making all data held by EMA publicly available, including data from prelicensing clinical trials, pharmacovigilance, and observational data. If someone asks for de-identified patient-level data, “we will make it available,” said Eichler. EMA is approaching this objective “gingerly,” he continued. The release of data puts many people, particularly from industry, outside their comfort zones. However, science is moving toward a new model of openness in which data are made available for others to reanalyze and combine with other data.
The open question is whether making clinical trials data available will be a boon or a bane for drug development and public health. One barrier to making data available is that clinical trials data include personal information that needs to be protected. However, Eichler said, “that’s probably an addressable problem.” What will likely happen is that EMA will tell industry that as of a certain date, all clinical trials data submitted to the agency will be available to anyone else, so it should not
include protected patient health information that could be used to identify individuals.
Another risk of data sharing is that reanalysis of data may produce phantom risk and health scares. Neither regulators nor industry like to be blindsided by reports that a drug or vaccine has an unreported side effect, but Eichler predicted that many licensed drugs could come under attack based on such reanalyses. As an example, he cited a meta-analysis of a drug called tiotropium bromide that found a slightly increased risk (relative risk of 1.6) of adverse cardiovascular events in chronic obstructive pulmonary disease patients using the drug (Singh et al., 2008). In this case, however, the company responsible for the drug had a study under way looking at long-term clinical endpoints that ultimately found no increased risk (Michele et al., 2010). This is a risk of meta-analyses and the use of observational data. How many beneficial drugs will be lost through mistaken analyses, and how will people be persuaded to participate in postmarketing trials of drugs if they perceive a possibility of drug-related harm, Eichler asked.
Despite the risks of data sharing, there are clearly considerable benefits to patients and the research community. Open science could support the development of predictive models for patient selection to appropriate treatments or doses based on patient characteristics. A second advantage is that different therapies could be compared to determine relative efficacies without the expense of direct comparison trials. “This will be a boon for comparative effectiveness research,” said Eichler.
EMA is still in the process of determining how best to make data available, and it is engaging many stakeholder groups in this discussion. However, Eichler said that data will not be released until the agency has made a regulatory decision on the product based on its assessment of the data. Also, EMA intends to ask for the preregistration of protocols for data reanalysis to avoid data dredging that is unlikely to produce meaningful results. EMA wants to know in advance whether studies on requested data will be exploratory or confirmatory in nature.
Shortly after the workshop summarized in this volume, EMA conducted its own workshop to bring together stakeholders to provide input to the development of its policies. Issues addressed included standards for storing and sharing of data, the level of data to be released, standards for protection of personal data, quality standards for meta-analyses, and rules of engagement among stakeholders (EMA, 2012).
The implication of the EMA policy change for the Food and Drug Administration’s (FDA’s) policy of nondisclosure was raised during the
discussion period. Unlike EMA, said a participant, FDA is prohibited from releasing patient-level data by statute and regulation, presenting a major legal barrier to data sharing. Robert Califf, Duke University Medical Center, contended that the reports companies send to FDA should be made public, along with the internal FDA analysis. Today, if a drug does not get to the market, federal law prohibits the release of these documents, but companies still could make these reports public, he said, even if FDA currently cannot.
Steven Goodman, who, in addition to his academic appointment at Stanford University School of Medicine is also associate editor at Annals of Internal Medicine and editor at Clinical Trials, discussed the role of journals in promoting data sharing and the challenges they face. In a paper published in Annals of Internal Medicine in 2007, Goodman and several colleagues announced a new policy the journal was adopting to require that manuscripts include a reproducible research statement (Laine et al., 2007a). Such a statement would say whether the study protocol, code, and dataset are available and how to get each. Goodman labeled this a “weak” solution, but he also said that if Annals of Internal Medicine makes demands that are difficult to fulfill, authors will simply publish their articles elsewhere. “Journals are competitive with each other. They also want to publish the best stuff. And they can’t put up barriers that nobody else is putting up,” Goodman said. The requirement has at least shined a light on the problem, but some authors have simply said that data are not available or have referred readers to the large databanks from which the data in the study were derived. Polling by journal staff has indicated that the number of requests authors are receiving for data, statistical code, and protocols is still fairly low.
Journals cannot be effective acting alone, said Goodman. To really shift the culture surrounding data sharing, journals will need to agree on a common set of principles and sanctions, such as requiring that the authors of articles share data on request. Although a few other journals have adopted the reproducible research statement policy, in general, journals are taking their own approaches to dealing with data sharing and the issue of reproducibility, and some have no such policies at all. One success story mentioned by Goodman was clinical trial registration. Though the system still needs to be improved, he said, it has worked well
because it is done through one central repository, and having a legislative mandate and collective support by the medical journals has helped with enforcement (Laine et al., 2007b). Journals cannot be the custodians of all research data and protocols, and they cannot be the sole guarantors of scientific quality because they have neither the staff nor often the technical capability.
Goodman also briefly touched on the role that funders have in promoting data sharing. For example, the National Institutes of Health (NIH) requires a data-sharing plan for research projects funded at levels above half a million dollars, and the National Science Foundation recently started requiring all grantees to have plans for sharing data in a timely fashion and at nominal cost. Goodman pointed out that Howard Hughes Medical Institute (HHMI) has a very detailed requirement that funded researchers make any materials, data, databases, and software deemed integral to the publication freely and expeditiously available for use by other scientists, with no restrictions on use. Interestingly, HHMI actually specifies that researchers may not insist on collaboration, coauthorship, or prior review of manuscripts generated using their shared data and materials. Other funders have their own policies, but the extent to which these policies are being followed is difficult to determine, said Goodman. However, a recent joint statement by a group of funding organizations that was published in the Lancet (Walport and Brest, 2011) indicates that funders are aware of the role they can play in changing the culture of data sharing. The statement indicated an intention to work together to increase access of the scientific community to research data that is funded by their organizations.
As an alternative to making funding contingent on adherence to specified data sharing policies, a suggestion was raised during a discussion period that funders consider track records in data sharing as a significant factor in the scoring of funding proposals.
The issues associated with data sharing are a major concern of the NIH leadership, said Josephine Briggs, director of the National Center
for Complementary and Alternative Medicine and acting director of the Division of Clinical Innovation in the National Center for Advancing Translational Sciences. Almost a decade ago, the NIH implemented a data-sharing policy for cooperative agreements (through which many NIH-run clinical trials are funded) and for grants exceeding a half-million dollars, but that is just a “baby step” compared with the many things that need to be done to promote data sharing, said Briggs. She briefly described three ways in which the NIH is investing in data sharing. First, it is investing in data standards, which, as described in Chapter 5, can facilitate pooling of shared data from different sources and comparison of results from independent studies. For example, the National Institute of Neurological Disorders and Stroke has developed sets of common data elements (CDEs) for specific disease areas and now requires researchers who receive funding from the Institute to ensure that their data collection is compatible with those standards. One concern she raised, however, was that data elements defined for different disease areas will use different demographic variables. The trans-NIH BioMedical Informatics Coordinating Committee, which is being led by the National Library of Medicine, is collating a list of available CDEs, which may draw attention to needs for harmonization. Second, the NIH is supporting data-sharing resources in order to make datasets easier to find, accessible, and available. These resources are generally disease specific and are led by a single Institute or Center. Briggs pointed to the National Heart, Lung, and Blood Institute (NHLBI) as a great model, given its clear and unambiguous expectations for data sharing in large trials and to a certain extent even in smaller trials. Data can be shared through the data repository managed by the NHLBI’s Biological Specimen and Data Repository Information Coordinating Center (BioLINCC) or directly among investigators as part of their continued collaboration on NHLBI-funded work.
The final way that the NIH promotes data sharing is by funding secondary analyses on existing datasets. Clinical trials can be extremely complex to organize and run, often requiring large collaborations, but secondary analyses of trials are “an incredibly important way for individual investigators to participate in the generation of new knowledge,” said Briggs.
For studies with budgets of less than $500,000, NIH policies are not clear regarding expectations for data sharing, Briggs acknowledged. But the NIH controls the purse strings, and by creating expectations for smaller grants that datasets should be shared, it could exert a powerful influence.
INCENTIVIZING CHANGE BY ENSURING CREDIT
Ensuring that researchers who generate data get credit for it was raised by several workshop participants as an important incentive to promote data sharing, particularly in the academic community, where career advancement depends on publications and citations. Throughout the workshop, different mechanisms for giving trial organizers credit were discussed, including offering coauthorship, listing trialists as collaborators, and assigning datasets unique identifiers that researchers can track to show downstream use of their data.
Code of Conduct for Conducting Secondary Analyses
While discounting many other arguments commonly raised against data sharing, Andrew Vickers, Memorial Sloan-Kettering Cancer Center, acknowledged the validity of one of the major cultural arguments against data sharing—that researchers have a right to exploit data that they have spent years collecting. Researchers do need incentives to collect data. But blocking access to data forever is far from the only available alternative. The investigators who have collected the data will have the opportunity to publish the first paper on those results and an embargo period during which they alone can use the data would be simple to arrange, said Vickers. Systems conferring credit for the reuse of data are being discussed and are needed to incentivize data sharing in academia. We know already that papers for which the data are made available are cited more than papers for which the data are not available (Piwowar et al., 2007).
Vickers (2006) suggested that a code of conduct governing the use of shared raw data could help to ensure that the original data collectors get fair credit for their work. He suggested that a code of conduct could include the following: an independent investigator planning to publish a new analysis of previously published data should contact the trialists, those who ran and published on the original clinical trial, before undertaking those analyses; if a reanalysis of the data is to be published, the trialists should be offered coauthorship or an opportunity to write a commentary to be published alongside the new analysis; journals should refuse to publish the new analysis unless this step has been taken; and finally, the original publication should be cited in any new analysis of the data.
Researchers and companies that continue to resist the release of data are swimming against the tide of history, Vickers said. When open access to scientific papers was first proposed, it was widely resisted, as was clinical trial registration, yet today both are widely accepted. “A whole bunch of things seemed very radical at the time. I think data sharing is one of those,” Vickers observed.
Trial Organizers as Collaborators on Secondary Analyses
Myles Axton, editor of Nature Genetics, has been involved in several experiments to allow greater access to research data, including databases of genotypes and phenotypes, micro-attribution as a way to incentivize community annotation of the human genome, and peer review on an open data platform. However, at the workshop, he focused on a different means for ensuring that investigators get credit for data they generate. He argued against the separation of people who have invested their time in a clinical trial from the data generated by the trial. The trial organizers should, of course, be able to continue to use their data. But in a second track, the trial organizers should be cited as collaborators and not authors. This would allow the original trial organizers to distance themselves from the conclusions of others who reuse their data while remaining associated with those data. Data need to be analyzed independently, but the people who spent years organizing the trial also should receive credit for the generation of those data—even if subsequent conclusions end up being critical of the trial, Axton said. An additional step forward would be to universally identify exactly what each person did in the production of new knowledge. “There should never be a discussion again about authorship order,” he asserted.
Unique Dataset Identifiers
Steven Goodman, Stanford University School of Medicine, proposed yet another mechanism by which due credit could be ensured. Currently, academic researchers have only two ways to gain credit for their work. They are an author on a paper, or their paper is cited. What is needed, said Goodman, is a way to measure use of someone’s data for the generation of novel findings and publications. This would require that each dataset has a unique identifier, like the PubMed ID for a paper. “Every
single time that dataset is used, [that identifier] needs to be in the paper that used it.” These citations could then go on the CVs of academic researchers and factor into hiring and promotion decisions. Some organizations are already doing this. For example, iDASH (Integrated Data for Analysis, Anonymization, and Sharing), a data repository established at the University of California, San Diego, specifically for health research, assigns unique identifiers to all datasets it provides via its Web-based distribution system. According to Goodman, applying the approach more broadly is key to solving the incentive problem. “We have to create a culture and a reality where people benefit as much from everyone sharing their data for all purposes as they currently do from protecting it.”
PROTECTING AGAINST MISUSE OF SHARED DATA
One of the major barriers to data sharing identified by those in industry is fear over the misuse of data. Several workshop participants raised the possibility of controlled access as a means of protecting against the potential harms from poor-quality secondary analyses of shared data. Goodman described different models of data sharing that are intermediate between full access, where the data can be used for any purpose with no restrictions, and no access. For example, he said, data can be shared only for the purpose of reproducing the results that were published or for commenting on the results via a letter to the editor, with no original findings based on the data published without explicit permission from the original investigators. Alternatively, the data can be used to generate new findings, but any modifications to the data also need to be made available and/or the authors of the original data need to be cited. “There are ways to mediate this relationship that are not ‘I give you the data’ or ‘I don’t give you the data,’” Goodman said.
Similarly, Axton proposed that one way to obtain access to research results could be to have anyone wanting to reuse the results document his or her status as a bona fide researcher and provide a research plan detailing the objectives of the research to be performed. Such a request could specify the dataset that is necessary and sufficient to conduct the proposed research. It also could provide a detailed documentation of processes designed to ensure that the data will not be distributed to third parties and will be protected to safeguard the privacy of the research subjects. Under these conditions, the default should be that access is granted rapidly by the trial organizers and owners of the data. If this default is not achieved,
a data access moderation committee could be a source of recourse. According to Axton, such a committee should include members of the trial group, independent researchers, and participant representatives. It would be responsible for advising those who have been denied access on how to comply with conditions for access. In this way, it could protect research subjects while making data more available for useful research questions. It would be quicker than existing procedures and should work better because the trial group would remain involved in data reuse analyses and in publications.
PATIENT-DRIVEN SHARING OF CLINICAL RESEARCH DATA
Institutions that participate in the clinical research enterprise must comply with regulations such as the Health Insurance Portability and Accountability Act Privacy Rule and the Common Rule, which place clear boundaries on use of patient data. But when patients take data they have generated themselves to the Internet, these regulations do not apply, said Deven McGraw, director of the Health Privacy Project at the Center for Democracy and Technology, making such “patient-facing pathways” enormously attractive. People dealing with a serious illness often have different conceptions of privacy than someone who is not and, therefore, may be more willing to share health information. “We need to acknowledge that there is a great range in the extent to which people care about their privacy and give more flexibility in that realm, she said. As a privacy advocate, McGraw stressed the importance of protecting patients’ personal health information and postulated that both institutional and patient-facing pathways to data sharing rely too much on the consent process for this purpose. Most patients will sign almost anything put in front of them if they trust the person asking them to sign, but consent does not necessarily protect privacy, McGraw observed. Consent forms therefore create an obstacle for researchers without providing patients with much protection.
McGraw contended that the general type of consent form often used in online research is not specific enough with regard to how the patient’s information will be used. When someone gives consent to do research with their data using such forms, others define what is and is not research, not the person giving consent. The same observation applies to other uses of the data, including commercial uses. “We need another
framework for thinking about how we make sure in this environment, both on the regulated side and on the unregulated side, that there is public trust and understanding of what we’re doing.”
McGraw suggested a different approach based on what are known as Fair Information Practices. These are models of data stewardship that build both privacy protections and public trust into the process. She presented a set of such practices drawn from the Markle Common Framework, which was issued by the Markle Foundation in 2006 as a framework for the exchange of information among health professionals:
• openness and transparency about how data will be used;
• purpose specification and minimization;
• collection limitation to only those data actually needed;
• use limitation;
• individual participation and control (e.g., patient consent);
• data integrity and quality;
• security safeguards and controls;
• accountability and oversight; and
McGraw expounded on some of these principles as follows: the users of data need to be open and transparent about the purposes for which they are using the data; investigators should take only the data they need to address a research question and not take data that are not needed; if data are to be used for purposes significantly outside the context for which they were collected (e.g., sale to third parties), permission needs to be obtained. The purpose of this kind of framework is to create a system “that works without necessarily relying on the patient to evaluate and say yes to each and every research question that we want to bring to the data,” said McGraw.
The concept of data ownership is not very helpful in considering the sharing of health data, McGraw observed. A better and more workable concept is that holders of data have rights and responsibilities that accompany them. “The patient has a right to transparency about data, to be able to get copies of data, to take data and to use it in ways that they want to, including to donate it for research projects if they want to do that,” she said. Research organizations that have data in their possession have a responsibility to think about sharing that data in ways that protect the rights of patients. “If we’re struggling with notions of who owns [data]
and when can it be given away, we’re starting in the wrong place. The holders of the data have responsibilities.”
Beginning in 2014, clinicians participating in the Meaningful Use of Electronic Health Records incentive program will be required to provide patients with the capability to view, download, and transmit clinical data that are part of an electronic medical record. This will create a “very interesting dynamic,” said McGraw, as patients gain more control over their health data.
Williams Syndrome as an Example
Beth Kozel, instructor of pediatrics in the Division of Genetics and Genomic Medicine at St. Louis Children’s Hospital and the Washington University School of Medicine, works with individuals who have Williams syndrome, a rare genetic condition affecting approximately 1 in 10,000 individuals. Kozel described health effects associated with Williams syndrome, including significant cardiovascular anomalies, hypertension, neurocognitive effects, predisposition for obesity and diabetes, and endocrine abnormalities. However, each characteristic varies in severity among people with Williams syndrome, which is likely caused by differences in genetic background and environment exposures. This constellation of features leads to a complicated health picture for these individuals, but it also leads to the confluence of research groups interested in these many different phenotypes.
As a clinical geneticist, said Kozel, she would like to have genomic or environmental information that she could present to families to let them know what might happen to a child, rather than giving families a long list of things that might go wrong. The problem is that the sample sizes needed to study the effects of genetic backgrounds or environmental exposures are large; several hundred patients may be needed in a study to detect an association. To do such studies, people who work with Williams syndrome need to pool their data because most investigators work with relatively small numbers of people. But several major barriers have limited such sharing to date. Kozel works with the Williams Syndrome Association, which is an organization that brings together people with the syndrome and their families. It provides information for families, teachers, and others who work with people with Williams syndrome. It also includes a registry that allows families interested in research to interact directly with researchers. As part of its efforts to promote collab-
orations among researchers, it sent a survey to 30 individuals and groups known to be active in research on the syndrome. Only 15 surveys were returned, and of those 15 respondents, 9 said they had no samples. Six said they had samples, but went on to cite various challenges to sharing. “There were absolutely zero investigators who said, ‘Yes, I have samples and I would love to share them with you,’” Kozel said.
One important barrier identified by Kozel involves the issues that arise in other genetic studies. Genetic signatures may be identifiable in public databases, particularly with a small community where people know each other. Contributors of data may expect to receive results back. Some of the people in studies were consented before the molecular diagnosis was even known, and reconsenting them for new studies would be a challenge. Other samples were collected when someone was a child and is now an adult. Regulations or restrictions imposed by institutional review boards (IRBs) may place limits on doing research on genetic material collected in the past.
Other barriers involve the culture of academia. Investigators may be worried about getting credit for contributing samples. Being included in the middle of a long list of authors is not going to help a junior investigator receive tenure. Scientific “clout” may be associated with an investigator’s access to rare samples. Some investigators “have accumulated hundreds of samples and have reputations with the families—and that is who they are,” said Kozel. “If they let that go, their clout in the community becomes different.”
Kozel suggested that patients and patient groups have a role to play in overcoming some of these barriers. IRBs could allow patients and families to become active partners in making decisions about issues such as genetic confidentiality. For example, the registry of the Williams Syndrome Association has an online forum where families can discuss changes in protocols and then make decisions about whether to continue with research. Social media and new technologies also could increase the engagement of patients and families, which could lead to better acquisition of data. As an example, longitudinal data could be acquired on changes in phenotypes over time. Family groups can educate their members about the pros and cons of data sharing. They could ask members to look for and ask about sharing statements in consent forms, and when data sharing is not allowed, ask why. If “researchers are aware that the individuals giving their time to the study want the data shared, [it] may put more impetus on the researchers to make it happen.”
Some barriers are beyond the reach of patients and family groups, and other stakeholders will need to step up, said Kozel. For example, the expense of well-run biobanks is too large for small family groups to support. Funding organizations could consider establishing central biobanks for rare diseases. When samples are limited, the provision of downstream data, such as sequence or expression data, may be preferable to storing and distributing samples. Journals can continue to require such genomic data to be deposited in protected but accessible sites online. They also could consider mechanisms to connect authors of underpowered research instead of allowing publication of lower-powered studies that can later be reexamined by meta-analysis. “It doesn’t serve our rare disease community or science itself for all of this data to be sitting in people’s drawers,” Kozel concluded. But the acquisition of large numbers of rare samples will require coordinated efforts among multiple groups, and changes in practice will likely be needed from all stakeholders.
Public-Driven Sharing of Clinical Research Data
Clinical research data, said John Wilbanks, director of Sage Bionetworks, is more than the information historically contained in folders at a physician’s office. Those folders, which have now been reproduced in electronic medical records, contain only the information generated during episodic trips to the doctor. New technologies, biomedical as well as ubiquitous sensors such as cell phones and computers, now enable people to collect longitudinal data on their health and other aspects of their lives, regardless of whether they are in a traditional clinical research study.
A week before the workshop, Wilbanks got his genotype from the company 23andme and posted it on openSNP, which is a wiki based in Europe created by a postdoctoral fellow to enable genomics research. Within 2 days he got an e-mail from another wiki called SNPedia with an annotation of his genotype, which indicated that he had a genetic variant conferring an increased risk of hypertension, along with another variant that seems to prevent baldness. This is happening “outside of any sort of regulated direct-to-consumer system,” said Wilbanks. Although he would prefer that he got this kind of information from health care providers who have the training and resources to substantiate the information they provide, “I’m not getting this service from the health system as an individual and my capacity as an individual to generate data about myself is exploding,”
he said. Services in the marketplace now enable an individual to obtain their genotype and distribute it to people who will interpret it and return the results via e-mail. “People who are frustrated are increasingly going to find these services and start using them” despite a lack of standards, site protections, and privacy.
Wilbanks also uploaded the genome file he received from 23andme into the Sage Bionetworks Synapse system, which is a self-contributed data repository for genomics research. The system, which includes an online informed-consent process, allows data scientists to conduct collaborative research on individual-level data that are provided in a standard format and have been cleared with respect to privacy protections.
With the computational and consent infrastructures in place, the last piece in the democratization of clinical research is something that begins to change the role of the individual, “so it’s not just ‘I’m a patient and I see my doctor x number of times a year.’ You can be a participant,” said Wilbanks. Bridge, which is the newest piece of the Sage system, demonstrates the power of this kind of model. It provides a means for people who have data about themselves to come together and commission researchers to build the computational disease models. For example, he said, “50 people with early-onset Parkinson’s could come in and say, ‘we’ve got genomics data, we’ve got all sorts of other omics data, we’ve got metabolic and molecular data, it’s in a standard format—$50,000 prize to the first person who builds a successful computational model.’”
Wilbanks proposed a simple set of standards to guide this kind of public-driven data sharing. First, he said, be honest with people. If people send their genomes to a shared system where data are at least moderately public, their privacy is unlikely to be permanently protected. Contributors of data need to know about the risks they face, but society should also have some tolerance for people who think the value of sharing their data is greater than the risks, such as those with a rare disease. Second, data should be reusable, which to Wilbanks meant computationally useful. Scans of paper records that patients have typically received from their doctors when requesting their medical records, for example, are not reusable. Finally, data should be portable so they can be shared among institutions, doctors, laboratories, and studies. When the control group from one study can also serve as a cohort control for another, “it begins to accelerate the system exponentially,” said Wilbanks.