Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
4 Models of Data Sharing Key Messages Identified by Individual Speakers ï· Registration of clinical trials and summary trial results has been a major step forward, but ambiguous protocols and dis- crepancies between protocols and results raise concerns about the integrity of clinical research data. ï· Greater transparency of study protocols and amendments, statistical analysis plans, informed consent forms, clinical study reports, and adverse event reports would both improve clinical trials and facilitate sharing of trial results. ï· The de-identification process can be complicated and expen- sive when studies are not designed with data sharing in mind. ï· Collaborations need to be clear about common goals, realize the unique value each party brings to the effort, and strive for open inclusiveness. ï· Companies can be fierce competitors, but still cooperate on precompetitive research to meet common needs. ï· If patients provide information for a research project, they should receive information in return that can help them make meaningful health care decisions. ï· Treating patients as partners in research would acknowledge their expertise in managing and understanding their conditions. 27
28 SHARING CLINICAL RESEARCH DATA Clinical trial data are a public good, but many stakeholders in addi- tion to the public have interests in those data, observed Jeffrey Nye, Janssen Research & Development, in his introduction to the session on models of data sharing. Participants in a trial have interests in the infor- mation a trial generates, as do the researchers conducting a trial. Pharma- ceutical companies are another stakeholder, along with researchers from either the private or public sectors doing reanalyses or meta-analyses of study data. Regulators have the objective of safeguarding public health and guiding and advising companies as they develop new products, while citizen scientists may be studying the data to derive information they can apply in their own lives. Seven speakers at the workshop described different models designed to increase the sharing of clinical research data. All of these models have strengths and limitations. Although the optimal path forward is not yet clear, all of these models offer lessons that can inform future initiatives. CLINICALTRIALS.GOV Three key problems interfere with the practice of evidence-based medicine, said Deborah Zarin, director of ClinicalTrials.gov at the Na- tional Library of Medicine, National Institutes of Health (NIH). Not all trials are published. Publications do not always include all of the prespecified outcome measures. Unacknowledged changes made to trial protocols can affect the interpretation of findings. These problems led to the establishment in 2000 of ClinicalTrials.gov, which serves as a registry of clinical trials at the trialsâ inception (Zarin et al., 2011). The registry now contains key protocol details of more than 130,000 trials from around the world. In 2008 the registry added a results database, which now contains the summary results of more than 7,000 trials. ClinicalTrials.gov does not accept participant-level data, Zarin emphasized, but it has considerable experience with other kinds of data generated by clinical trials. Clinical trials data take many forms, from uncoded, participant-level data to analyzed summary data; only the latter are posted at ClinicalTrials.gov. At each step in the process leading from the raw data to the summary data, information is lost (see Figure 4-1). Also, each ver- tical drop involves subjective judgments that are not transparent, but can
MODELS OF DATA SHARING 29 FIGURE 4-1 Information loss as clinical trials data progresses from raw uncoded data to summary data. SOURCE: Zarin, 2012. Presentation at IOM Workshop on Sharing Clinical Research Data. influence the reproducibility of results. The users of summary data generally assume that they reflect the underlying participant-level data, with little room for subjectivity. That assumption is not always correct, said Zarin. The results database at ClinicalTrials.gov was launched in response to the Food and Drug Administration Amendments Act of 2007 and was based on statutory language and other relevant reporting standards. It requires that the sponsors or investigators of trials report the âminimum dataset,â which is the dataset specified in the trial protocol in the registry. The data are presented in a tabular format with minimal narrative. They cov- er participant flows, baseline patient characteristics, outcome measures, and adverse events. The European Medicines Agency is currently devel- oping a similar results database. Although ClinicalTrials.gov has checks for logic and internal con- sistencies, it has no way of ensuring the accuracy of the data reported. ClinicalTrials.gov does not dictate how data are analyzed, but does re- quire that the reported data make sense. For example, if the participant flow had 400 people and results are presented for 700, it asks the trial organizers about the discrepancy. Similarly, time to event must be meas- ured in a unit of time, and the mean age of patients cannot be a nonsensi- cal number like 624. âThat is the kind of review we do,â Zarin said. ClinicalTrials.gov was established on the assumption that required data are generated routinely after a clinical trial based on the protocol for
30 SHARING CLINICAL RESEARCH DATA the trial, so the burden of reporting to ClinicalTrials.gov would be due mainly to data entry. Instead, the experience at ClinicalTrials.gov has shown that protocols are often vague, are not always followed, or in some cases may not even exist. In addition, summary data are not always readily available even for trials that have already been published. For many trials, no one can explain the structure of the trial or the analysis of the data, said Zarin. âWhat we learned is there is not an objective, easy- to-describe route from the initial participant-level data to the summary data. Many people and many judgments are involved.â Structural changes to trials are also common. A trial can start as a two-arm study and then become a four-arm study. Participants come and go, so that the number of participants changes over time. Participant flow and baseline characteristic tables describe different populations than the outcomes table. Data providers often cannot explain the âdenominatorsâ for their results, the groups from which outcomes or adverse events are collected. Zarin described a study in which a year of close work was re- quired with statisticians to figure out who the people in the study were and where they went as a result of structural changes to the study. âThese are brilliant statisticians. They were in charge of the data. [But] this trial was basically too complicated for them to figure out. They were giving outcome measures without actually knowing what the denominators were. That is one kind of problem we have seen.â In other cases, outcome measures were changed: a quality-of-life scale was replaced with a depression scale; 1-month data were replaced with 3-month data; the number of people with an event was replaced with time to an event; and all-cause mortality was replaced with time to relapse. Sometimes discrepancies are obvious. In one study, the mean for hours of sleep per day was listed as 823.32 hours. Another study of 14 people included data on 36 eyeballs. âAs a consumer of the medical lit- erature, these are not reassuring things,â Zarin observed. In a study of 100 matched pairs of ClinicalTrials.gov results and publication results, 82 percent had at least one important discrepancy. The inevitable conclusion is that summary data may not always be an accurate reflection of participant-level data. Although the deposition of clinical trial protocols and summary data into registries is a huge step forward in the direction of transparency, the validity and reproducibility of summary data are called into question by such inconsistencies. âThis is a big problem,â Zarin asserted. Providing more transparency about the process of converting one type of data into another type would help inspire trust, she said. Docu-
MODELS OF DATA SHARING 31 ments that may help explain this journey include the protocol and amendments, the statistical analysis plan, informed consent forms, clini- cal study reports, and adverse event reports. Greater transparency would also help everyone involved with clinical trials to engage in internal quality improvements. THE DATASPHERE PROJECT In contrast to the declining mortality rates for heart disease (see Box 2-2), mortality rates for cancer have dropped only slightly in recent dec- ades, noted Charles Hugh-Jones, vice president and head of Medical Af- fairs North America for Sanofi Oncology. Changes in risk behaviors, an increase in screening, and new therapeutics have all contributed to this decline in cancer, âbut we are not being as effective as we would like to be.â At the same time, the price of cancer treatment has skyrocketed, which is not sustainable in an era of fiscal austerity. We need to find bet- ter ways of reducing cancer mortality rates, said Hugh-Jones, and âone of the solutions of many that we need to address is data sharing.â Data sharing in the field of oncology could lead to faster and more effective research through improved trial designs and statistical method- ology, the development of secondary hypotheses and enhanced under- standing of epidemiology, collaborative model development, and smaller trial sizing, said Hugh-Jones. For example, as oncology researchers di- vide cancers into smaller subgroups with particular molecular drivers, data increasingly need to be pooled to have the statistical power to de- termine the most effective treatments for each subgroup. Hugh-Jones described an ideal data-sharing system as simple, system- atic, publicly accessible, and respectful of privacy issues. DataSphere, which is an initiative of the CEO Roundtable on Cancer, is designed to achieve these objectives. The CEO Roundtable on Cancer consists of the chief executive officers (CEOs) of companies involved in cancer re- search and treatment who are seeking to accomplish what no single com- pany can do alone. DataSphere will rely on the convening power of CEOs, together with support from patients and advocacy groups, to se- cure and provide data. Initially, it will seek to provide comparator arms, genomic data, protocols, case report forms, and data descriptors from industry and academia. DataSphere will include data from both positive and negative studies because a negative study is often as revealing from an epidemiological point of view as a positive study. De-identification
32 SHARING CLINICAL RESEARCH DATA will be standardized, and DataSphere will then work with third-party da- ta aggregators to pool the data in meaningful waysâa significant chal- lenge when hundreds of cancer drugs are being developed at any given time and thousands of studies are registered in ClinicalTrials.gov. At the outset, said Hugh-Jones, the originators of DataSphere asked three questions. Why would people want to share their data? If I wanted to share my data, how would I do it? Finally, where would I put it once it was ready to post? DataSphere has established incentives for data con- tributors that call attention to the increased productivity, cost savings, citations, and collaboration that can accompany sharing. It also is looking at micro-attribution software that could extend credit for sharing to the contributors of data. Similarly, incentives for patients emphasize the benefits of making data available and the security precautions that have been taken. It has even been looking into the possibility of competitions among researchers to enhance the sharing of data. Tools to enable sharing, continued Hugh-Jones, include a standard de-identification system being developed in collaboration with Vanderbilt University that is consistent with Health Insurance Portability and Accountability Act (HIPAA) regulations, a single online data use agree- ment form, how-to guides for de-identification, and tools for advocacy. Finally, it has been working closely with the database company SAS to produce a simple but secure, powerful, and scalable website where everything needed to share data is automated. Sanofi is contributing de-identified data from two recent Phase III clinical studies to start the ball rolling. The goal, said Hugh-Jones, is to have at least 30 high-quality datasets in the database by the end of 2013 and then expand beyond that. âWith the sort of environment we have demonstrated here, this is something that can be successful.â THE YALE-MEDTRONIC EXPERIENCE One paradigm for facilitating dissemination of industry data and en- suring high-quality independent review of the evidence for efficacy is exemplified by the Yale-Medtronic experience, as described by Richard Kuntz, senior vice president and chief scientific, clinical, and regulatory officer of Medtronic, Inc., where proprietary data were released to an external coordinating organization that contracted other organizations to perform systematic reviews of the study results.
MODELS OF DATA SHARING 33 In 2002, according to Kuntz, the Food and Drug Administration (FDA) approved a product from Medtronic called INFUSE, which was designed to accelerate bone growth in cases of anterolateral lumbar interbody fusion. Approval was based on one pilot randomized con- trolled study and two pivotal randomized controlled studies. A series of subsequent peer-reviewed publications supported by Medtronic provided additional data on the use of the product. In June 2011, Kuntz continued, a major challenge was raised regard- ing the validity of all the published literature on INFUSE. The principal focus was on the results presented in the peer-reviewed literature and on general study designs and endpoints. The challenge was published in a dedicated issue of a medical journal and consisted of more than 10 arti- cles. The company quickly reviewed its data to ensure that the dossiers it had were accurate. âWe are convinced that the data were good, and talked to the FDA immediately to make sure that they felt the same.â However, the issue was being discussed extensively in the media. âWe had to make some quick decisions,â said Kuntz. Within less than a month, Kuntz said, the company announced its decision to contract with Yale University as an independent review coor- dinator. In August, Yale announced its plan to establish an independent steering committee and contract with two systematic review organiza- tions to carry out reviews of the research. Medtronic agreed to supply Yale with all de-identified patient-level data, including non-label studies, along with all FDA correspondence and adverse event reports. It also agreed to allow Yale to establish a public transparency policy and pro- cess for the entire INFUSE patient-level dataset. The publication of the systematic reviews was scheduled for the fall and winter of 2012, with summary manuscripts prepared and submitted for publication in the An- nals of Internal Medicine at the time of the workshop. The project has been undertaken by the Yale University Open Data Access (YODA) project, which, according to Kuntz, serves as a model for the dissemination and independent analysis of clinical trial program data. This project is based on the rationale that a substantial number of clinical trials are conducted but never published, and even among pub- lished clinical trials, only a limited portion of the collected data is availa- ble. As a result, patients and physicians often make treatment decisions with access to only a fraction of the relevant clinical research data. Clini- cal trials are conducted with both public and private funding, but several issues are particularly important among industry trials. Industry funds the majority of clinical trial research on drugs, devices, and other products,
34 SHARING CLINICAL RESEARCH DATA both premarket and postmarket. Also, industrial research is proprietary, with no requirement for publication or dissemination, and the public per- ception is that industry has a financial interest in promoting âsupportiveâ research and not publishing the rest of the data. The YODA project has been designed to promote wider access to clinical trial program data, increase transparency, protect against industry influence, and accelerate the generation of new knowledge. The public has a compelling interest in having the entirety of the data available for independent analysis, but industry has legitimate concerns about the re- lease of data, Kuntz said. Steps therefore are needed to align the interests of industry and the public, particularly when concerns about safety or effectiveness arise. Yale and Medtronic spent a year working through issues involved in assembling the data and giving those data in the most unbiased way pos- sible to reviewers so they could do a full systematic review. To maintain transparency and independence, formal documentation of communica- tions between Yale and Medtronic was necessary along with clarity about what kinds of discussions could and could not be held. For exam- ple, Kuntz said, Medtronic did not want to send Yale previous reviews or interpretations of the data done by outside groups because the company did not want to taint the information. The query process among the re- viewers, Yale, and Medtronic also had to be carefully managed. The de-identification process was complicated and expensive. De- identifying the necessary HIPAA fields and information took several months and the efforts of about 25 people, which contributed substantial- ly to the overall $2.5 million cost of the project. The HIPAA Privacy Rule was not designed for this kind of activity, Kuntz observed. As a result, the YODA projectâs approach to de-identification was a âRube Goldberg contraptionâ and clearly not scalable. Given that paper case report forms and studies going back to 1997 had to be reviewed, the pro- ject was âan outlier example of how complicated it would be to de- identify [data].â Industry has several reasons for participating in this kind of process, according to Kuntz. It allows fair and objective assessment of product research data, as opposed to speculative analysis based on incomplete data. It supports competition on the basis of science rather than market- ing. It promotes transparency and advances patient care. Although com- mitted to transparency, Medtronic was concerned about potential misuses of the data. For example, is everyone seeking access to the data interest- ed in the truth? Litigant firms may be interested in making money, âbut
MODELS OF DATA SHARING 35 litigant firms also can find the truth,â said Kuntz. In the end, Medtronic sought to provide the data and initiate conversations about its use. However, Kuntz raised a large number of questions that the Yale- Medtronic project has not fully answered: ï· Would it be possible for an independent group to determine whether a question requiring the use of data serves the public in- terest or a special interest? ï· Should queries be limited to single questions, and should the methods used to answer the questions be prespecified? ï· Should there be an initial time period during which data remain proprietary? ï· What portion and level of the dataset are necessary? ï· Should there be a time limit or license for data access? ï· Who controls the data distribution? ï· Are there a priori questions and hypotheses to be tested, or is there an interest in data exploration? ï· Is the requester competent to do the proposed analysis? ï· Should a trusted third-party analysis center be contracted? May the requester share the data with others? ï· Should there be controls on the dissemination of results, such as a requirement for peer review before dissemination? ï· What methodological review is required? ï· Should industry be involved in the peer review of results derived from its data? All of these questions need better answers than exist today, said Kuntz. Nevertheless, the bottom line is that industry has a responsibility to do studies with regulatory agencies to produce results in a faithful and trusted way and to disseminate them under the law. It needs to compe- tently and ethically contract or execute the required clinical studies and perform timely filing of the data and results dossier. Industry makes products that âwe sell to people,â said Kuntz. âWe are responsible for the health of those individuals.â The movement from keeping data concealed to sharing data will re- quire foundational changes, Kuntz concluded. One important step will be involving patients as partners rather than âsubjects,â which will help lower at least some of the barriers to the use of data.
36 SHARING CLINICAL RESEARCH DATA THE BIOMARKERS CONSORTIUM The Biomarkers Consortium of the Foundation for the National Insti- tutes of Health (FNIH) is a precompetitive collaboration designed to in- crease the efficiency of biomarkers-related research. Its goals are to facilitate the development and validation of new biomarkers; help qualify these biomarkers for specific applications in diagnosing disease, predict- ing therapeutic response, or improving clinical practice; generate infor- mation useful to inform regulatory decision making; and make Consortium project results broadly available to the entire scientific community. John Wagner, vice president for clinical pharmacology at Merck & Co., Inc., described the validation of adiponectin as a biomarker as an example of the work of the Consortium. Adiponectin is a protein biomarker discovered in the 1990s that is associated with obesity and insulin sensitivity. Certain drugs can drive up adiponectin levels very quickly in healthy volunteers and in patients, and attention was focused on the use of adiponectin as a predictive biomarker to identify patients who would or would not respond to particular therapies. Though considerable data about adiponectin existed in the files of companies and academic laboratories, relatively few data about the use of adiponectin as a biomarker were publicly available. The Biomarkers Consortium took on the task of compiling these data as a proof-of- concept project for the collaboration. A number of companies agreed to combine their data into a blind dataset derived from many trials involv- ing more than 2,000 patients. Using these data, the consortium concluded that adiponectin is a robust predictor of glycemic response to peroxisome proliferatorâactivated receptor agonist drugs used in the treatment of di- abetes. The results confirmed previous findings and investigators con- cluded that âthe potential utility of adiponectin across the spectrum of glucose tolerance was well demonstratedâ (Wagner et al., 2009). Wagner drew several important lessons from this experience. The project demonstrated that cross-company collaboration was a robust and feasible method for doing this kind of research. However, the project took a relatively long time to complete, which is a real problem, accord- ing to Wagner. The Consortium has since learned how to collaborate more efficiently, but time remains a concern. The pace was set based on the amount of time team members had to dedicate to this project. The Consortium was not the first priority of everyone involved in the project. âIt was the evening job for many people, myself included.â Good project
MODELS OF DATA SHARING 37 management skills have helped to address this problem, as has the devel- opment of new collaboration tools. The Consortium struggled with data-sharing principles and stand- ards, Wagner admitted. Negotiating a data-sharing plan with even a small number of companies was challenging and having a single legal liaison for each of the companies was found to be critical. Standard definitions were not all obvious. In some cases, people would fail to pass on crucial information before leaving for another position. However, in the end the project created a template for the Biomarkers Consortium for data- sharing plans, which should speed the work in subsequent projects. Also, FDA currently has an initiative to require uniform data submissions us- ing standardized data fields, which would result in data that are much more amenable for sharing, Wagner observed. Furthermore, health care reform is also expected to harmonize data practices, in part to reduce costs and improve care. The existing data had many limitations, Wagner indicated. The orig- inal studies were not designed to answer the research question investigat- ed by the Consortium. The adiponectin data also had limitations because different companies used different assays to measure the protein, which required more work to ensure that the data could be combined reliably. Broader issues also arose. The clarity of the research question is very important for defining the type of collaboration. The existence of a neu- tral convenerâin this case the FNIHâwas critical in gaining the trust of all the stakeholders involved in the project. Still, motivations were an issue. Depending on the question being asked, the openness of the con- tribution and of the output can change. In the case of the Biomarkers Consortium, the output is completely open, which is a good model for generating new knowledge. The nature of the collaboration also depends on whether it is developing standards and tools, aggregating data, creat- ing new knowledge, or developing a product, Wagner said. Collabora- tions depend on trust and openness. Being clear about common goals, realizing the unique value each party brings to the effort, and striving for open inclusiveness can greatly improve collaborations. THE NEWMEDS CONSORTIUM NEWMEDS, which is a project sponsored by the European Union, stands for Novel Methods for Development of Drugs in Depression and Schizophrenia. As discussed by Jonathan Rabinowitz, academic lead of
38 SHARING CLINICAL RESEARCH DATA NEWMEDS at Bar Ilan University, the NEWMEDS consortium was established to facilitate sharing of clinical trials dataâin particular, cod- ed participant-level dataâfrom industry and academia to examine re- search questions in the precompetitive domain. According to Rabinowitz, the schizophrenia database, which includes data from AstraZeneca, Eli Lilly, Janssen, Lundbeck, and Pfizer, encompasses 64 industry- sponsored studies representing more than 25,000 patients, along with studies sponsored by the National Institute of Mental Health and the Eu- ropean Union. The depression database, with data from several of the same companies, includes 26 placebo-controlled, industry-sponsored studies covering more than 8,000 patients. Rabinowitz went on to describe some of the major findings and les- sons learned from the schizophrenia database. When looking at patient response, analysis of the database revealed that results at 4 weeks were nearly the same as at 6 weeks, implying that studies could be shorter. Females show more pronounced differentiation between placebo and active treatment than males. Thus, the inclusion of more females in stud- ies, previously underrepresented, could show heightened differences from placebo. Patients with a later onset of disease showed more pro- nounced improvements, irrespective of their allocation to active treat- ment or placebo groups, but differentiation from placebo was not affected by age of onset. For unknown reasons, the active-placebo differ- entiation varies by geographical region, with considerably more differen- tiation in Eastern Europe than in North America. All of this information, which is useful in its own right, can be used to design more effective and efficient clinical trials with smaller treatment groups and shorter study durations, Rabinowitz stated, which together could significantly reduce costs of drug discovery trials. Rabinowitz described some of the lessons learned from his personal experiences with the Consortium. Just locating the data was a challenge. It might sound mundane, but it can be very complex, he said. For exam- ple, companies are bought and sold, and products are exchanged among companies. âTo locate who houses data [required] almost the work of a detective.â Also, competing internal resources and priorities mean that data sharing is not necessarily the top priority. Compared with the YODA projectâs experience, de-identification was much less expensive and time consuming, said Rabinowitz, requiring about 2 weeks of pro- gramming time. In the context of the amounts spent on clinical trials and the potential markets for new products, though, even rather expensive de- identification projects can be justified. The formulation of research ques-
MODELS OF DATA SHARING 39 tions and interpretation of data also need to be the result of active collab- oration so that understandings are shared as well as data. Rabinowitz talked about the increasing difficulties of drug discovery as incentive for companies to collaborate through precompetitive chal- lenges. These companies can be fierce competitors elsewhere, but they have common needs. Companies also need to send a clear message of support for collaboration to overcome various kinds of resistance, with ongoing support from the top levels of management. Previous relation- ships can be very helpful because they help foster the trust that compa- nies need to provide data to a collaborative effort. Peer pressure among companies aided data sharing, in that âif one company [provided] all their data, the others wanted to follow suit. They did not want to feel in- ferior in terms of their performance.â A paradigm shift is occurring that redefines data sharing as an âethi- cal imperative,â Rabinowitz concluded. Studies should be given extra credit if they are willing to share data. This could be taken into account by institutional review boards (IRBs), for instance, in judging the ethical validity of a study. âAllow yourselves to imagine what you might do in some therapeutic area that is near and dear to you if you had access to almost all of the data out there in your given area,â he said. âJust think about that for a second.â PATIENTSLIKEME PatientsLikeMe is a health informationâsharing website for patients where they can form peer-to-peer relationships, establish profiles, pro- vide and share health data, and make de-identified data available for re- search. Sally Okun, health data integrity manager at PatientsLikeMe, described some of the lessons learned from the website during its 7 years of operation. A prominent mandate of the site is âgive something, get something.â If patients provide information for a research project, they should receive information in return that can help them make meaningful decisions, said Okun. Another motto is âpatients first.â In a data-sharing environment, the interests of the patients need to come first, Okun said. âThey have a lot more skin in this game than any of us in this room do. . . . They have the expertise in managing [their conditions] that as clinicians and as re- searchers we could never have.â
40 SHARING CLINICAL RESEARCH DATA That observation leads to a third mandate: Listen well. Patients want to share their information. When patients were asked in a recent survey whether their health data should be used to help improve the care of fu- ture patients who have the same condition, 89 percent agreed (Alston et al., 2012). Yet, when they were asked whether they thought their data were being shared, the majority said they either did not know or did not think so. âWe have a huge gap between what patients are telling us they want and what they perceive us to be doing.â The data patients provide involve intimate parts of their daily lives. These patients are not simply human subjects, said Okun; they are actual- ly members of the research team. âI would change our paradigm com- pletely and start thinking of patients as patient researchers or citizen researchers.â Okun quoted a recent blog post to the effect that patient engagement is the blockbuster drug of the century. If this is true, she added, and if this âdrugâ is not currently being used, the research com- munity is essentially engaged in malpractice. âThe system is never going to be perfect,â she said. But the biomedi- cal research system has evolved to the point that all stakeholders can be involved in decisions. âWithout patients, we would have no research. Letâs start thinking about how we can best honor them, respect them, and allow them to develop the trust that they need to participate with us.â DISTRIBUTED SYSTEMS FOR CLINICAL RESEARCH INFORMATION SHARING An alternative to widespread data sharing was described by Richard Platt, professor and chair in the Department of Population Medicine, Harvard Medical School, and executive director of Harvard Pilgrim Health Care Institute. Platt proposed that sharing information derived from the data while minimizing the sharing of data themselves nullifies some of the barriers discussed previously (Chapter 3). He went on to de- scribe the Query Health Initiative, a system for sharing clinical infor- mation that has been promulgated by the Office of the National Coordinator for Health Information Technology. It uses the approach of sending the question to the data rather than bringing the data to the ques- tion. The question, in this case, is an executable program sent from the originator to the holder of data. The program then operates on a remote dataset and returns the answer to the sender.
MODELS OF DATA SHARING 41 An alternative approach based on the same idea, Platt indicated, is to let a user log onto a remote system and do the analyses. The user needs to be able to access the system through a firewall, which many organiza- tions are hesitant to permit. Other protections can be built into the system as well, such as a mechanism for determining whether the research has oversight by an IRB. A steering committee or IRB could be involved in reviewing and approving queries. Network management could provide for auditing, authentication, authorization, scheduling, permissions, and other functions. Local controls at the source of the data could monitor what kind of question is being asked, who is asking the question, and whether the question is worth answering. A logical extension of such a system would be a multisite system in which research data from several different organizations are behind sev- eral different firewalls (see Figure 4-2). According to Platt, a single ques- tion could be distributed to multiple sites and the responses compiled to produce an answer. Source data, such as information from electronic health records, could flow into research systems through firewalls. The result would be a system in which remote investigators can gain the in- formation they need to answer a question while data are protected. Platt described a system developed by his group that implements this concept. The system, called Mini-Sentinel, is being used by FDA to do postmarket medical product safety surveillance. It has a distributed data- base with data on more than 125 million people, 3 billion instances of drug Analytic Query Web Portal FIREWALL FIREWALL FIREWALL FIREWALL FIREWALL FIREWALL Approval Site A Site A Site A Notification Query Network Steering Result Committee/IRB Management INTERNET Auditing Query Authentication Site B Site B Site B Result Authorization Query interface Scheduling Query Permissions Aggregation Site C Site C Site C Result Synchronization Authorized Source Research Project User Data Data Data FIGURE 4-2 Distributed networks can facilitate working remotely with re- search datasets derived from routinely collected electronic health information, often eliminating the need to transfer sensitive data. SOURCE: Platt, 2012. Presentation at IOM Workshop on Sharing Clinical Re- search Data.
42 SHARING CLINICAL RESEARCH DATA dispensing, and 2.4 billion unique patient encounters, including 40 mil- lion acute inpatient stays. Each of the 17 data partners involved in the project uses a common data format so that remote programs can operate on the data. Data checks ensure that the data are correct. Data partners have the option of stopping and reviewing the queries that arrive before the code is executed. They also can stop and inspect every result before it is returned to the coordinating center. The amount of patient-level data that is transferred is minimized, with most of the analysis of patient-level data done behind the firewall of the organization that has the data. âOur goal is not to never share data. Our goal is to share as little data as possi- ble.â The analysis dataset is usually a small fraction of all the data that exist, and the data can usually be de-identified. As an example of the kinds of projects that can be done using this system, Platt described a study looking at comparative risks of angioedema related to treatment with drugs targeting the renin-angiotensin- aldosterone system. The results of the study had not yet been released at the time of the workshop, but Platt concluded from the experience that data from millions of people could be accessed to do the study without sharing any patient-level data. Yet, from the perspective of the investiga- tors, âessentially everything that was interesting in those datasets that could answer this question was accessible and was used to address the questions of interest.â Using such a system, it would be possible to address a large fraction of the questions thought to require data sharing by instead sharing pro- grams among organizations that are prepared to collaborate on distribut- ed analyses, Platt insisted. Organizations also could participate in multiple networks, further expanding the uses of the data they hold. At the same time, every network could control its own access and governance. Today, only FDA can submit questions to Mini-Sentinel, but FDA believes it should be a national resource and is working on ways to make it accessible to others. Toward that end, the week before the workshop, the NIH announced the creation of the Health Care Systems Research Collaborative, which will develop a distributed research network with the capability of communicating with the Mini-Sentinel distributed dataset. Such systems, by sharing information rather than data, could make pro- gress faster than waiting for all the issues surrounding data sharing to be resolved, said Platt.