National Academies Press: OpenBook
« Previous: 6 Panel Discussion
Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2017. Refining the Concept of Scientific Inference When Working with Big Data: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/24654.
×

References

Angrist, J.D., and W.N. Evans. 1998. Children and their parents’ labor supply: Evidence from exogenous variation in family size. The American Economic Review 88(3): 450-477.

Barnett, I., R. Mukherjee, and X. Lin. 2016. The generalized higher criticism for testing SNP-set effects in genetic association studies. Journal of the American Statistics Association (accepted). http://dx.doi.org/10.1080/01621459.2016.1192039.

Bazot, C., N. Dobigeon, J.Y. Tourneret, A.K. Zaas, G.S. Ginsburg, and A.O. Hero. 2013. Unsupervised Bayesian linear unmixing of gene expression microarrays. BMC Bioinformatics 14. doi: 10.1186/1471-2105-14-99.

Begley, S. 2011. The best medicine. Scientific American 305(1): 50-55.

Belloni, A., V. Chernozhukov, and L. Wang. 2014. Pivotal estimation via square-root lasso in nonparametric regression. Annals of Statistics 42(2): 757-788.

Benjamini, Y. 2010. Simultaneous and selective inference: Current successes and future challenges. Biometrical Journal 52(6): 708-721.

Benjamini, Y., and Y. Hochberg. 1995. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B 57(1): 289-300.

Box, G.E.P. 1979. Robustness in the strategy of scientific model building. Pp. 201-236 in Robustness in Statistics (R.L. Launer and G.N. Wilkinson, eds.). Academic Press Inc., New York.

Box, G.E.P., G.M. Jenkins, and G.C. Reinsel. 1994. Time Series Analysis: Forecasting and Control. 3rd edition. Englewood Cliffs, N.J.: Prentice Hall.

Brown, E.N., P.L. Purdon, and C.J. Van Dort. 2011. General anesthesia and altered states of arousal: A systems neuroscience analysis. Annual Review of Neuroscience 34: 601-628.

Bühlmann, P., and S. van de Geer. 2011. Statistics for High-Dimensional Data: Methods, Theory, and Applications. New York: Springer Science and Business Media.

Chen, R., G.I. Mias, J. Li-Pook-Than, L. Jiang, H.Y.K Lam, R. Chen, E. Miriami, et al. 2012. Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell 148(6): 1293-1307.

Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2017. Refining the Concept of Scientific Inference When Working with Big Data: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/24654.
×

Ching, S., A. Cimenser, P.L. Purdon, E.N. Brown, and N.J. Kopell. 2010. Thalamocortical model for a propofol-induced α-rhythm associated with loss of consciousness. Proceedings of the National Academy of Sciences 107(52): 22665-22670.

Cimenser, A., P.L. Purdon, E.T. Pierce, J.L. Walsh, A.F. Salazar-Gomez, P.G. Harrell, C. Tavares-Stoeckel, K. Habeeb, and E.N. Brown. 2011. Tracking brain states under general anesthesia by using global coherence analysis. Proceedings of the National Academy of Sciences 108(21): 8832-8837.

Cornelissen, L., S.E. Kim, P.L. Purdon, E.N. Brown, and C.B. Berde. 2015. Age-dependent electroencephalogram (EEG) patterns during sevoflurane general anesthesia in infants. ELife 4. doi: 10.7554/eLife.06513.

Dattner, I., and C.A. Klaassen. 2015. Optimal rate of direct estimators in systems of ordinary differential equations linear in functions of the parameters. Electronic Journal of Statistics 9(2): 1939-1973.

Deng, J., W. Dong, R. Socher, L.J. Li, K. Li, and L. Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Miami, Fla., June 20-25.

Duchi, J.C., M.I. Jordan, and M.J. Wainwright. 2014. Privacy aware learning. Journal of the ACM 61(6). doi: 10.1145/2666468.

Firouzi, H., B. Rajaratnam, and A.O. Hero. 2017. Two-stage sampling, prediction and adaptive regression via correlation screening (sparcs). IEEE Transactions on Information Theory 63(1): 698-714.

Fithian, W., D. Sun, and J. Taylor. 2014. Optimal inference after model selection. arXiv preprint arXiv:1410.2597.

FTC (Federal Trade Commission). 2016. Big Data: A Tool for Inclusion or Exclusion? Understanding the Issues. https://www.ftc.gov/system/files/documents/reports/big-data-tool-inclusion-orexclusion-understanding-issues/160106big-data-rpt.pdf.

Genberg, B.L., J.W. Hogan, and P. Braitstein. 2016. Home testing and counselling with linkage to care. The Lancet HIV 3(6): e244-e246.

Haneuse, S., and M. Daniels. 2016. A general framework for considering selection bias in EHR-based studies: What data are observed and why? eGEMS (Generating Evidence & Methods to Improve Patient Outcomes) 4(1): article 16. doi: http://dx.doi.org/10.13063/2327-9214.1203.

Haris, A., D. Witten, and N. Simon. 2016. Convex modeling of interactions with strong heredity. Journal of Computational and Graphical Statistics 25(4): 981-1004.

Hawkes, A.G. 1971. Spectra of some self-exciting and mutually exciting point processes. Biometrika 58(1): 83-90.

Henderson, J., and G. Michailidis. 2014. Network reconstruction using nonparametric additive ODE models. PLoS ONE 9(4). doi: 10.1371/journal.pone.0094003.

Hero, A.O., and B. Rajaratnam. 2011. Large-scale correlation screening. Journal of the American Statistical Association 106(496): 1540-1552.

Hero, A.O., and B. Rajaratnam. 2012. Hub discovery in partial correlation graphs. IEEE Transactions on Information Theory 58(9): 6064-6078.

Hero, A.O., and B. Rajaratnam. 2016. Foundational principles for large-scale inference: Illustrations through correlation mining. Proceedings of the IEEE 104(1): 93-110.

Hsiao, K.J., A. Kulesza, and A.O. Hero. 2014. Social collaborative retrieval. IEEE Journal of Selected Topics in Signal Processing 8(4): 680-689.

Huang, Y. 2011. Integrative statistical learning with applications to predicting features of diseases and health [Ph.D. thesis]. University of Michigan, Ann Arbor, Mich.

Huang, Y., A.K. Zaas, A. Rao, N. Dobigeon, P.J. Woolf, T. Veldman, N.C. Øien, et al. 2011. Temporal dynamics of host molecular responses differentiate symptomatic and asymptomatic influenza A infection. PLoS Genetics 7(8). doi: 10.1371/journal.pgen.1002234.

Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2017. Refining the Concept of Scientific Inference When Working with Big Data: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/24654.
×

Hurvich, C.M., and C.L. Tsai. 1990. The impact of model selection on inference in linear regression. The American Statistician 44(3): 214-217.

Hurvich, C.M., and S. Zeger. 1987. Frequency Domain Bootstrap Methods for Time Series. New York: New York University.

Imai, K., G. King, and E.A. Stuart. 2008. Misunderstandings between experimentalists and observationalists about causal inference. Journal of the Royal Statistical Society: Series A (Statistics in Society) 171(2): 481-502.

Irizarry, R.A., C. Wang, Y. Zhou, and T.P. Speed. 2009. Gene set enrichment analysis made simple. Statistical Methods in Medical Research 18(6): 565-575.

Joffe, M.M., W.P. Yang, and H.I. Feldman. 2010. Selective ignorability assumptions in causal inference. The International Journal of Biostatistics 6(2). doi: 10.2202/1557-4679.1199.

Kass, R.E., V. Ventura, and E.N. Brown. 2005. Statistical issues in the analysis of neuronal data. Journal of Neurophysiology 94(1): 8-25.

Langfelder, P., P.S. Mischel, and S. Horvath. 2013. When is hub gene selection better than standard meta-analysis? PLoS ONE 8(4). doi: 10.1371/journal.pone.0061505.

Lee, J.D., Y. Sun, and J.E. Taylor. 2013. On model selection consistency of M-estimators with geometrically decomposable penalties. Pp. 342-350 in Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, Nev., December 5-10.

Lee, J.D., D.L. Sun, Y. Sun, and J.E. Taylor. 2016. Exact post-selection inference, with application to the lasso. Annals of Statistics 44(3): 907-927.

Liu, T.Y., T. Burke, L.P. Park, C.W. Woods, A.K. Zaas, G.S. Ginsburg, and A.O. Hero. 2016. An individualized predictor of health and disease using paired reference and target samples. BMC Bioinformatics 17. doi: 10.1186/s12859-016-0889-9.

Lock, E.F., K.A. Hoadley, J.S. Marron, and A.B. Nobel. 2013. Joint and individual variation explained (JIVE) for integrated analysis of multiple data types. Annals of Applied Statistics 7(1): 523-542.

Lockhart, R., J. Taylor, R.J. Tibshirani, and R. Tibshirani. 2014. A significance test for the lasso. Annals of Statistics 42(2): 413-468.

Loh, P.L., and M.J. Wainwright. 2011. High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity. Pp. 2726-2734 in Advances in Neural Information Processing Systems (J. Shawe-Taylor, R.S. Zemel, P.L. Bartlett, F. Pereira, and K.Q. Weinberger, eds.). Cambridge, Mass.: MIT Press.

Meng, Z., D. Wei, A. Wiesel, and A.O. Hero III. 2013. Distributed learning of Gaussian graphical models via marginal likelihoods. Proceedings of the 16th International Conference on Artificial Intelligence and Statistics (AISTATS). Scottsdale, Ariz., April 29-May 1. http://www.jmlr.org/proceedings/papers/v31/meng13a.pdf.

Miettinen, O.S. 1983. The need for randomization in the study of intended effects. Statistics in Medicine 2(2): 267-271.

Morris, A.P., B.F. Voight, T.M. Teslovich, T. Ferreira, A.V. Segre, V. Steinthorsdottir, R.J. Strawbridge, et al. 2012. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nature Genetics 44(9): 981-990.

Mukherjee, R., N.S. Pillai, and X. Lin. 2015. Hypothesis testing for high-dimensional sparse binary regression. Annals of Statistics 43(1): 352-381.

NITRD/NCO (National Coordination Office for Networking and Information Technology Research and Development). 2016. The Federal Big Data Research and Development Strategic Plan. Washington, D.C.: National Science and Technology Council. https://www.whitehouse.gov/sites/default/files/microsites/ostp/NSTC/bigdatardstrategicplan-nitrd_final-051916.pdf.

NRC (National Research Council). 2013. Frontiers in Massive Data Analysis. Washington, D.C.: The National Academies Press.

Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2017. Refining the Concept of Scientific Inference When Working with Big Data: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/24654.
×

NRC. 2014. Training Students to Extract Value from Big Data: Summary of a Workshop. Washington, D.C.: The National Academies Press.

NSCI (National Strategic Computing Initiative). 2016. National Strategic Computing Initiative Strategic Plan. https://www.whitehouse.gov/sites/whitehouse.gov/files/images/NSCI%20Strategic%20Plan.pdf.

Pearl, J. 2000. Causality: Models, Reasoning, and Inference. Cambridge, U.K.: Cambridge University Press.

Pillow, J.W., J. Shlens, L. Paninski, A. Sher, A.M. Litke, E.J. Chichilnisky, and E.P. Simoncelli. 2008. Spatio-temporal correlations and visual signalling in a complete neuronal population. Nature 454(7207): 995-999.

Poole, D., and A.E. Raftery. 2000. Inference for deterministic simulation models: The Bayesian melding approach. Journal of the American Statistical Association 95(452): 1244-1255.

Purdon, P.L., E.T. Pierce, E.A. Mukamel, M.J. Prerau, J.L. Walsh, K.F.K. Wong, A.F. Salazar-Gomez, et al. 2013. Electroencephalogram signatures of loss and recovery of consciousness from propofol. Proceedings of the National Academy of Sciences 110(12): E1142-E1151.

Radchenko, P., and G.M. James. 2010. Variable selection using adaptive nonlinear interaction structures in high dimensions. Journal of the American Statistical Association 105(492): 1541-1553.

Ramos, E.A. 1988. Resampling methods for time series [PhD Dissertation]. Department of Statistics, Harvard University, Cambridge, Mass.

Rau, A., G. Marot, and F. Jaffrézic. 2014. Differential meta-analysis of RNA-seq data from multiple studies. BMC Bioinformatics 15. doi: 10.1186/1471-2105-15-91.

Ravikumar, P., J. Lafferty, H. Liu, and L. Wasserman. 2009. Sparse additive models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71(5): 1009-1030.

Shen, R., A.B. Olshen, and M. Ladanyi. 2009. Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics 25(22): 2906-2912.

Simon, N., and R. Tibshirani. 2012. Standardization and the group lasso penalty. Statistica Sinica 22(3): 983-1001.

Singh, R., J. Xu, and B. Berger. 2008. Global alignment of multiple protein interaction networks with application to functional orthology detection. Proceedings of the National Academy of Sciences 105(35): 12763-12768.

Smith, M. 2016. “Computer Science for All.” White House Blog, January 30. https://www.whitehouse.gov/blog/2016/01/30/computer-science-all.

Song, S., K. Chaudhuri, and A.D. Sarwate. 2015. Learning from data with heterogeneous noise using SGD. Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS). San Diego, Calif., May 9-12. http://www.jmlr.org/proceedings/papers/v38/song15.pdf.

Sripada, C., D. Kessler, Y. Fang, R.C. Welsh, K. Prem Kumar, and M. Angstadt. 2014. Disrupted network architecture of the resting brain in attention-deficit/hyperactivity disorder. Human Brain Mapping 35(9): 4693-4705.

Sun, T., and C.H. Zhang. 2012. Scaled sparse linear regression. Biometrika 99(4). doi:10.1093/biomet/ass043.

Tian, X., and J.E. Taylor. 2015. Selective inference with a randomized response. arXiv preprint arXiv:1507.06739.

Tian, X., J.R. Loftus, and J.E. Taylor. 2015. Selective inference with unknown variance via the square-root LASSO. arXiv preprint arXiv:1504.08031.

Tukey, J.W. 1977. Exploratory Data Analysis. Boston, Mass.: Pearson.

Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2017. Refining the Concept of Scientific Inference When Working with Big Data: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/24654.
×

van de Geer, S., P. Bühlmann, Y.A. Ritov, and R. Dezeure. 2014. On asymptotically optimal confidence regions and tests for high-dimensional models. Annals of Statistics 42(3): 1166-1202.

VanderWeele, T.J. 2012. Invited commentary: Structural equation models and epidemiologic analysis. American Journal of Epidemiology 167(7): 608-612.

Wainwright, M.J., and M.I. Jordan. 2008. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning 1(1-2): 1-305.

Wang, K., M. Narayanan, H. Zhong, M. Tompa, E.E. Schadt, and J. Zhu. 2009. Meta-analysis of inter-species liver co-expression networks elucidates traits associated with common human diseases. PLoS Computational Biology 5(12). doi:10.1371/journal.pcbi.1000616.

Wasserman, L., and K. Roeder. 2009. High-dimensional variable selection. Annals of Statistics 37(5A): 2178-2201.

Wilson, J.D., S. Wang, P.J. Mucha, S. Bhamidi, and A.B. Nobel. 2014. A testing-based extraction algorithm for identifying significant communities in networks. Annals of Applied Statistics 8(3): 1853-1891.

Woods, C.W., M.T. McClain, M. Chen, A.K. Zaas, B.P. Nicholson, J. Varkey, T. Veldman, et al. 2013. A host transcriptional signature for presymptomatic detection of infection in humans exposed to influenza H1N1 or H3N2. PLoS ONE 8(1). doi:10.1371/journal.pone.0052198.

Wu, H., T. Lu, H. Xue, and H. Liang. 2014. Sparse additive ordinary differential equations for dynamic gene regulatory network modeling. Journal of the American Statistical Association 109(506): 700-716.

Wu, M.C., S. Lee, T. Cai, Y. Li, M. Boehnke, and X. Lin. 2011. Rare-variant association testing for sequencing data with the sequence kernel association test. American Journal of Human Genetics 89(1): 82-93.

Ying, R., M. Sharma, C. Celum, J.M. Baeten, H. van Rooyen, J.P. Hughes, G. Garnett, and R.V. Barnabas. 2016. Home testing and counselling to reduce HIV incidence in a generalised epidemic setting: A mathematical modelling analysis. The Lancet HIV 3(6): e275-e282.

Yuan, M., and Y. Lin. 2006. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68(1): 49-67.

Zaas, A.K., T. Burke, M. Chen, M. McClain, B. Nicholson, T. Veldman, E.L Tsalik, et al. 2013. A host-based RT-PCR gene expression signature to identify acute respiratory viral infection. Science Translational Medicine 5(203). doi: 10.1126/scitranslmed.3006280.

Zhao, P., and B. Yu. 2006. On model selection consistency of Lasso. Journal of Machine Learning Research 7: 2541-2563.

Zhu, R., D. Zeng, and M.R. Kosorok. 2015. Reinforcement learning trees. Journal of the American Statistical Association 110(512): 1770-1784.

Zubizarreta, J.R., D.S. Small, and P.R. Rosenbaum. 2014. Isolation in the construction of natural experiments. Annals of Applied Statistics 8(4): 2096-2121.

Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2017. Refining the Concept of Scientific Inference When Working with Big Data: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/24654.
×

This page intentionally left blank.

Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2017. Refining the Concept of Scientific Inference When Working with Big Data: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/24654.
×
Page 69
Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2017. Refining the Concept of Scientific Inference When Working with Big Data: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/24654.
×
Page 70
Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2017. Refining the Concept of Scientific Inference When Working with Big Data: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/24654.
×
Page 71
Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2017. Refining the Concept of Scientific Inference When Working with Big Data: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/24654.
×
Page 72
Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2017. Refining the Concept of Scientific Inference When Working with Big Data: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/24654.
×
Page 73
Suggested Citation:"References." National Academies of Sciences, Engineering, and Medicine. 2017. Refining the Concept of Scientific Inference When Working with Big Data: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/24654.
×
Page 74
Next: Appendixes »
Refining the Concept of Scientific Inference When Working with Big Data: Proceedings of a Workshop Get This Book
×
 Refining the Concept of Scientific Inference When Working with Big Data: Proceedings of a Workshop
Buy Paperback | $50.00 Buy Ebook | $40.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

The concept of utilizing big data to enable scientific discovery has generated tremendous excitement and investment from both private and public sectors over the past decade, and expectations continue to grow. Using big data analytics to identify complex patterns hidden inside volumes of data that have never been combined could accelerate the rate of scientific discovery and lead to the development of beneficial technologies and products. However, producing actionable scientific knowledge from such large, complex data sets requires statistical models that produce reliable inferences (NRC, 2013). Without careful consideration of the suitability of both available data and the statistical models applied, analysis of big data may result in misleading correlations and false discoveries, which can potentially undermine confidence in scientific research if the results are not reproducible. In June 2016 the National Academies of Sciences, Engineering, and Medicine convened a workshop to examine critical challenges and opportunities in performing scientific inference reliably when working with big data. Participants explored new methodologic developments that hold significant promise and potential research program areas for the future. This publication summarizes the presentations and discussions from the workshop.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!