Skip to main content

Currently Skimming:

1 Research Data in the Digital Age
Pages 11-32

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 11...
...  A petabyte represents a million billion characters, the equivalent of the text in one billion books.  Not all measures of computing power are increasing exponentially.
From page 12...
... Figure 1-1 shows one consulting firm's projection of how information and available storage will grow in the coming years. This exponential increase in computing power has had profound consequences for many aspects of modern society, including scientific, engineering, and medical research. Using digital technologies, researchers can measure, describe, and model phenomena much more comprehensively and in far greater detail than was possible in the past.
From page 13...
... Examples of the impact of digital technologies on research fields appear as sidebars throughout this report, and the number of such examples could be multiplied many times. The advances in digital technologies have caused a massive increase in the quantity of data generated by research projects.
From page 14...
... 3,500 1996 Las Campanas Redshift Survey (LCRS) 23,000 2003 2dF Galaxy Redshift Survey 250,000 2005 Sloan Digital Sky Survey (SDSS)
From page 15...
... Because the SDSS data archive is available to any astronomer, roughly half of the 2,100 refereed papers based on SDSS data have come from authors outside the project itself, and that proportion is rising. In fact, for the past 2 years, the SDSS has produced the most high-impact papers of any astro nomical observatory.c At the same time, the project has extended the "reach" of those wishing to participate in frontier astronomy research or to simply enjoy the ability to "be there" as amateur aficionados.
From page 16...
... See http://cdsweb.cern.ch/record/42370. bitmap image low res However, the most consequential changes being fostered by digital technologies involve issues that range beyond the quantities of data generated. Today, researchers can access a rapidly expanding range of digital information from around the world almost instantaneously.
From page 17...
... Digital communication technologies enable researchers to communicate and exchange data with colleagues around the world, creating electronic collaborations that can catalyze progress. By making it possible to address more complex and integrative questions, these technologies also catalyze interdisciplinary collaboration.
From page 18...
... Similarly, digital technologies have profound implications for scientific, engineering, and medical education.11 Students can have access to research information from instruments in distant locations.12 Computer owners around the world can contribute to the solution of particular research problems by allowing their computers to become parts of distributed computational networks.13 Data from cutting-edge research are being made available on the Internet for use not only by the research community but by educators or anyone else interested in the subject.14 Members of the public are participating in research projects as varied as analyses of genetic variation and galactic structure.15 Although fascinating, the full consequences of changing technologies for scientific, engineering, and medical education or for direct public participation in research lie outside the scope of this report. 11 National Research Council.
From page 19...
... In some cases, researchers may intentionally or unintentionally distort data in a misguided attempt to emphasize particular features and downplay others. In the worst cases, researchers can falsify or fabricate data, thereby violating both the ethical and methodological standards of research integrity.
From page 20...
... They must integrate such diverse datasets as cellular neuroimaging, gene expression data, genotype data, neuronal morphology, and clinical data. Making neuroscience data widely available holds tremendous potential for help ing science and society.
From page 21...
... Many questions have arisen in developing these and other databases. Which digital data and data stored on film need to be stored?
From page 22...
... It includes raw data, processed data, published data, and archived data. It includes the data generated by experiments, by models and simulations, and by observations of natural phenomena at specific times and locations.
From page 23...
... However, the treatment of materials in research introduces issues that are beyond the subject matter of this report.18 Finally, our definition excludes information that can be important in research but is not used to generate research conclusions, including interpre 18 Issues related to sharing research materials in the life sciences have been addressed by a previ ous National Research Council report. See National Research Council.
From page 24...
... Nevertheless, a distinction exists, and we do not mean to imply that all of the information associated with research necessarily constitutes research data. Metadata As used in this report, the term "metadata" refers to descriptions of the content, context, and structure of information objects, including research data, at any level of aggregation (for example, a single data item, many items, or an entire database)
From page 25...
... However, raw data may need to be retained to validate research findings and, in some research fields, to support patent applications, investigate instances of research misconduct, or justify public policies. Data used to draw conclusions, derive findings, and build models may undergo many changes as they are processed, distributed, and archived.
From page 26...
... For observational data, data of high "quality" (a term that we sometimes will use as a synonym for data integrity) have been validated through comparison with data whose quality is known or by being generated with an instrument that has been adequately calibrated or tested.
From page 27...
... This tremendous variety within the research community complicates the task of arriving at conclusions that apply across all fields of research. Research fields are also characterized by diversity in the origins of data and by the size and other characteristics of data collections.
From page 28...
... Data generated through computer simulations are increasingly important in a variety of fields.26 Data generated entirely by computation can in principle be regenerated, assuming that enough is known about the hardware, software, and inputs used in the computation. However, each of these three components of a computation may be so complex or indeterminate that the computational data have some of the characteristics of observational data.
From page 29...
... Largescale reference data collections may be the product of many small projects linked through digital networks. Or large projects may produce focused data collections that serve a narrow research purpose and never become publicly available.
From page 30...
... (Example: The Fluxes Over Snow Surfaces Project, http://www.atd.ucar.edu/rtf/projects/FLOSS.) Resource or community data collections serve a single science or engineering community.
From page 31...
... Chapter 4 discusses the stewardship of research data, that is, their longterm preservation in databases for various future research uses and other applications. Preserving data collections can be expensive and difficult -- so much so that it can compete with the conduct of research.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.