Skip to main content

Currently Skimming:

2 The Challenge: Preservation and Use of Scientific Data
Pages 13-32

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 13...
... The sections that follow describe briefly the two major types of data that are of critical importance in the physical sciences-experimental laboratory data in physics, chemistry, and materials sciences, and observational data in the earth and space sciences. In each of these broad areas the progress that has been made to date in terms of long-term preservation and accessibility is characterized, and the key issues identified.
From page 14...
... While the crowing use of electronic r~.~.r) r~lin~ Once ctnr~oP t~hnirill-~ is alr=~rl`r Fir the Preserving Scientific Data on Our Physical Universe ~.
From page 15...
... OBSERVATIONAL DATA IN THE PHYSICAL SCIENCES Over the past two decades, the National Research Council and other groups have issued numerous reports that have addressed data management issues, including long-term retention requirements, for digital observational data in the earth and space sciences (NRC, 1982, 1984, 1986a,b, l988a,b, 1990, 1992b, 1993; GAO, 1990a,b; Haas et al., 1985; NAPA, 1991~. Most of these reports have focused quite narrowly on the data management or archiving problems of specific disciplines or agencies, and
From page 16...
... One might therefore assume that it is the most highly processed data products that have the greatest value for long-term preservation, because they are more easily understood by a broader spectrum of potential users. In fact, just the opposite is usually the case for observational data, for it is only with the original unprocessed data that it will be possible to recreate all other levels of processed data and data products.
From page 17...
... The biggest data sets typically come from Earth observation satellite sensors and space science missions, and are challenging to some contemporary storage devices. However, it is clear that for the data set to exist at all, an adequate storage medium capable of capturing and maintaining the data for some time period must exist when the data are generated.
From page 18...
... Within NASA, space astronomy and astrophysics are organized in different wavelength-based disciplines, reflecting the organization in the scientific community. These disciplines include the infrared, whose main data center is the Infrared Processing and Analysis Center in Pasadena, California, where the data from the Infrared Astronomy Satellite mission are archived; the optical and ultraviolet, with data centers at the Space Telescope Science Institute in Baltimore, Maryland, where the Hubble Space Telescope data are archived, and at the NASA Goddard Space Flight Center in Greenbelt, Maryland, where the International Ultraviolet Explorer archive resides; and high-energy astrophysics, which maintains x-ray data at the Einstein Observatory Data Center in Cambridge, Massachusetts.
From page 19...
... 19 ~ ~ 4~o Cal ~ Q O V ~ O ~ ~ ID o ._ Cal cq ._ Cal Cal .
From page 20...
... The developers and current staff of the PDS recognize that the data from planetary missions make up the scientific capital of the agency's planetary exploration program and that these data are a national resource. The PDS tries to acquire all existing planetary data from NASA's missions and even from international ventures, in order to have a complete archive of our exploration of the solar system.
From page 21...
... Many data sets span decades, and a few span more than a century, with accompanying problems due to lack of homogeneity in measurement techniques and sampling strategies. The largest atmospheric science data holdings in the United States are those of the federal government.
From page 22...
... Some truly experimental data exist, including a few data sets that include the results from such work as sensor development and tests, fluid dynamics experiments, thermodynamic measurements, and laboratory chemical studies. Nevertheless, the vast majority of atmospheric science data describe observations of ever-changing phenomena, and thus they are unique, valuable, and irreplaceable.
From page 23...
... surface Daily, now 9,000 stations 1900-19939415 GB Selected Analyses (mostly global) Main National Meteorological Two times per day, 1945-19934850 GB Center analyses increasing at 4 GB/year National Meteorological Four times per day, 1990-1993458 GB Center advanced analyses increasing at 19 GB/year National Center for Atmosphenc Thirty-eight data sets 8 GB Research's ocean observations and analyses European Center for Medium Range Four times per day, 1985-1993976 GB Weather Forecasting advanced increasing at 8 GB/year analyses Selected Satellites NOAA geostationary satellites Half-hour, visible and infrared 1978-199316130 TB NOAA polar orbiting satellites 1978-199315 Sounders (TIROS Operational 15720 GB Vertical Sounder)
From page 24...
... Although barely two decades ago the study of climate was not a very high priority, today climate research issues are prominent; some of the nation's leading scientists specialize in climate studies, and policymakers seek information on likely climatic conditions of the future. The importance of old atmospheric data has become clear, but the reanalysis of these old data in the search for trends has often found them inadequate and poorly documented.
From page 25...
... Other examples are provided in the working paper of the Geoscience Data Panel (NRC, 1995~. The Landsat database consists of multispectral images of the Earth's surface, which have been accumulating since the launch of Landsat 1 in July 1972.
From page 26...
... The Incorporated Research Institutions for Seismology (IRIS) , a not-for-profit consortium of universities and private research organizations, is engaged in a major development of a global digital seismic network of about 100 continuously recording stations (the Global Seismic Network)
From page 27...
... Examples of their use for operational activities include tsunami warning and the rapid determination of the magnitude, location, and fault mechanism of destructive earthquakes and their aftershocks, both to inform the public and to assist in emergency response and special monitoring. On a longer time scale the data are used for hazard reduction and seismic safety in seismogenic regions, including local zoning decisions for future development, and siting and safety of critical facilities such as nuclear power plants.
From page 28...
... The data are archived in a broadly distributed manner. However, only a fraction of the archived data are under the direct control of federal government agencies, and it appears that many of these data sets are not considered official federal records.
From page 29...
... summary of the NODC's data holdings. The PO.DAAC is a major federally sponsored oceanographic data center, which is operated by the California Institute of Technology's let Propulsion Laboratory in Pasadena, California.
From page 30...
... Subtotal Total Physical/Chem~cal Marine Biological Data Master data files Fish/shellfish Benthic organisms Intertidal/subtidal organisms Plankton Marine mammal sighting/census Primary productivity Subtotal Individual data sets, for example Marine bird data sets Marine mammal data sets Marine pathology data sets Other (estimated) Subtotal Total Biological Total Data Holdings 9,679 4,290 1,645 1,557 872 125 89 68 18,325 12,841 60,000 4,743 11,000 88,584 106,909 115 69 30 32 21 7 274 52 4 4 200 260 534 107,443 Source: NOAA, private communication, 1994.
From page 31...
... during the active stages of projects and for some time afterward. Examples of notable successes include the NASA Planetary Data System, where the premise has been that the data have long-term value and must be accessible indefinitely into the future, and the NOAA National Data Centers, where the policy is to migrate archived data to new media every 10 years.
From page 32...
... Moreover, NARA's budget for its Center for Electronic Records, which has formal responsibility for archiving all types of federal electronic records, was only $2.5 million in FY 1994, a budget lower than that of many of the individual agency data centers reviewed by the committee in this study. Given NARA's current and projected level of effort for archiving electronic scientific data, it is obvious that NARA will be unable to take custody of the vast majority of the scientific data sets that require archiving.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.