Skip to main content

Currently Skimming:

2 Omics-Based Clinical Discovery: Science, Technology, and Applications
Pages 33-64

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 33...
... For instance, one might perform a proteomics study on normal human kidney tissues to better understand protein activity, functional pathways, and protein interactions in the kidney. Another common goal of omics studies is to associate the omics-based molecular measurements with a clinical outcome of interest, such as prostate cancer survival time, risk of breast cancer recurrence, or response to therapy.
From page 34...
... of the recommended omics-based test development process is discussed, beginning with examples of specific types of omics studies and the technologies involved, followed by the statistical, computational, and bioinformatics challenges that arise in the analysis of omics data. Some of these challenges are unique to omics data, whereas others relate to fundamental principles of good scientific research.
From page 35...
... Ideally, confirmation should take place on an independent sample set. Under exceptional circumstances it may be necessary to move into the test validation phase without first confirming the candidate test on an independent sample set if using an independent test set in the discovery phase is not possible, but this increases the risk of test failure in the validation phase.
From page 36...
... , have different spatial configurations and intracellular localizations, and interact with other proteins as well as other molecules. This complexity can lead to challenges in proteomics-based test development.
From page 37...
... , which is the complete set of lipids in a biological sample. EMERGING OMICS TECHNOLOGIES AND DATA ANALYSIS TECHNIQUES Many emerging omics technologies are likely to influence the development of omics-based tests in the future, as both the types and numbers of molecular measurements continue to increase.
From page 38...
... However, it is important to note that because next-generation RNA and DNA sequencing produces even more measurements per sample than do traditional approaches, these new technologies add to the challenge of extremely high data dimensionality and the risks of overfitting computational models to the available data (see the section on Computational Model Development and Cross-Validation for a discussion of overfitting)
From page 39...
... Recent interest has focused on measuring multiple omics data types on a single set of samples, in order to integrate different types of molecular measurements into an omics-based test. Such multidimensional datasets have the potential to provide deep insight into biological mechanisms and networks, allowing for the development of more powerful clinical diagnostics.
From page 40...
... Systems approaches that integrate multiple data types in functionally based models can be advantageous for the development of omics-based tests. For instance, the analysis of omics measurements in the con text of biomolecular networks or pathways can help to reduce the number of variables in the data by constraining the possible relationships between variables, ultimately leading to more robust and clinically useful molecular tests.
From page 41...
... STATISTICS AND BIOINFORMATICS DEVELOPMENT OF OMICS-BASED TESTS In recent years, a large number of papers have reported new omicsbased discoveries and the development of new candidate omics-based tests: that is, computational procedures applied to omics-based measurements to produce a clinically actionable result. However, few of these candidate omics-based tests have progressed to clinical use (Ransohoff, 2008, 2009)
From page 42...
... For the purpose of this discussion, the committee assumed that a clearly defined and clinically relevant scientific or clinical question or questions have been identified, and that an omics dataset from analyses of a set of patient samples, along with an associated clinical outcome for each patient, is available. For example, an investigator may ask whether gene expression measurements could be used to predict recurrence in node-negative breast cancer samples in a way that is substantially more accurate than standard clinical prognostic factors, such as tumor size and grade.
From page 43...
... Step 2: Computational Model Development and Cross-Validation Once investigators have determined in Step 1 that the data are of adequate quality, a candidate omics-based test associated with a phenotype of interest, such as a biologic subgroup, preclinical responsiveness to a novel therapy, or a clinical outcome, can be developed on the basis of the omics measurements. An almost unlimited number of statistical tools can be used to perform this task; therefore, they are not enumerated here.
From page 44...
... In fact, in the absence of adherence to proper statistical procedures, it is likely that the data will be overfit. That is, given a typical omics dataset and an associated clinical outcome, it is nearly always possible to develop a computational model that fits the data perfectly, even in the absence of any true association between the omics measurements and the clinical outcome.
From page 45...
... . At this point, the fully specified computational procedures are locked down, and the investigator proceeds into Step 3, in which the chosen model is evaluated on an independent dataset.
From page 46...
... Though both the training set/test set approach and the cross-validation approach provide error rates that estimate the accuracy of the computational model on independent test samples, these error rates can be highly optimistic. Cross-validation error rates tend to be overly optimistic because by randomly splitting the data it is guaranteed that the training and test sets within each cross-validation fold are drawn from the same population distribution.
From page 47...
... 4 In this report, the computational model is referred to as fully specified computational procedures after the candidate test is locked down in Step 2.
From page 48...
... Therefore, crossvalidation error rates provide insufficient evidence of a candidate test's performance. To avoid the wasted time, energy, cost, and resources associated with taking a test that has little chance of success into the later phases of the development process, it is important to confirm the computational model on an independent test set in the discovery phase.
From page 49...
... . In some cases it will not be possible to obtain independent sets of specimens and associated clinical data with all of these characteristics; however, it is important to keep in mind that the quality of evidence provided by good model performance on an independent specimen set depends critically on the characteristics of that set.
From page 50...
... Step 4: Release of Data, Code, and the Fully Specified Computational Procedures to the Scientific Community Once an omics-based measurement method and the locked-down computational procedures have been shown to perform well on an independent dataset (Step 3) , the candidate omics-based test is ready to proceed to the test validation phase in which analytical and clinical/biological validity are assessed, as described in detail in Chapter 3.
From page 51...
... 1. High dimensionality of the data: Omics datasets are typically char acterized by measurements (e.g., genes or gene products such as RNA or proteins)
From page 52...
... The inventor must disclose enough information to enable "a person of ordinary skill in the art"d to achieve the same result or make the same device when following the steps provided in the patent application; however, determining the level of detail needed to meet that goal is subjective. A patent can be obtained for inventions that are novel, useful, and nonobvious to fellow scientists in the same field.e Various aspects of an omics-based test could potentially be patentable, including the assay used to make molecular measurements, and the code and computational procedures used to analyze the results.
From page 53...
... A U.S. court would likely apply the machine-or-transformation test to the exact methods and computational procedures used in each test and make a decision on a case-by-case basis.
From page 54...
... The Common Rule requires researchers to get informed consent from a person to use his/her private identifi can often be obtained by beginning the omics-based test develop ment process using a subset of the omics measurements for which a plausible biological mechanism is available. For instance, there was a plausible biological mechanism behind the HER2 tests and Oncotype DX to motivate their initial clinical trials, but less so for the Duke, MammaPrint, and Ova1 tests (discussed in Appendix A and B)
From page 55...
... Hence, evidence of a computational model's performance based only on the dataset used to train the model, even if cross-validation is properly performed, provides little evidence of the model's suit ability for future samples. A relevant example here is the OvaCheck case study, discussed in Appendix A, in which signals obtained on one dataset did not hold up when the analysis was applied to other independent sample sets (Baggerly et al., 2004)
From page 56...
... 6. Computational procedure lock-down: It is crucial that at the end of Step 2, the fully specified computational procedures be locked down before progressing into confirmation on an independent test set in Step 3.
From page 57...
... , in which there was a lack of continuity in biostatistics personnel and numerous errors were identified in the statistical methodology and analyses. COMPLETION OF THE DISCOVERY PHASE OF OMICS-BASED TEST DEVELOPMENT A candidate omics-based test should be defined precisely, including the molecular measurements, the computational procedures, and the intended clinical use of the test, in anticipation of the test validation phase (Recommendation 1d)
From page 58...
... Cross-validation or a training set/test set approach can help reduce the risk of overfitting, but confirmation of all fully specified computational procedures and candidate omics-based tests on a blinded independent sample set is the "gold standard" for assessing the validity of any test. The importance of independent confirmation is also emphasized in the committee's recommendations for funders (see Chapter 5)
From page 59...
... A candidate omics-based test should be defined precisely, includ ing the molecular measurements, the computational procedures, and the intended clinical use of the test, in anticipation of the test validation phase. REFERENCES Ahrens, C
From page 60...
... 2009. Beyond the HIPAA Privacy Rule: Enhancing Privacy, Improving Health through Research.
From page 61...
... 2010. NCI Address to Institute of Medicine Committee Convened to Review Omics-Based Tests for Predicting Patient Outcomes in Clinical Trials.
From page 62...
... 2011. Recent workshops of the HUPO Human Plasma Proteome Project (HPPP)
From page 63...
... 2009. Metabolomic profiles delineate potential role for sarcosine in prostate cancer progression.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.