National Academies Press: OpenBook
« Previous: 4 Data Stewardship
Suggested Citation:"5 What to Archive." National Research Council. 2007. Environmental Data Management at NOAA: Archiving, Stewardship, and Access. Washington, DC: The National Academies Press. doi: 10.17226/12017.
×
Page 55
Suggested Citation:"5 What to Archive." National Research Council. 2007. Environmental Data Management at NOAA: Archiving, Stewardship, and Access. Washington, DC: The National Academies Press. doi: 10.17226/12017.
×
Page 56
Suggested Citation:"5 What to Archive." National Research Council. 2007. Environmental Data Management at NOAA: Archiving, Stewardship, and Access. Washington, DC: The National Academies Press. doi: 10.17226/12017.
×
Page 57
Suggested Citation:"5 What to Archive." National Research Council. 2007. Environmental Data Management at NOAA: Archiving, Stewardship, and Access. Washington, DC: The National Academies Press. doi: 10.17226/12017.
×
Page 58
Suggested Citation:"5 What to Archive." National Research Council. 2007. Environmental Data Management at NOAA: Archiving, Stewardship, and Access. Washington, DC: The National Academies Press. doi: 10.17226/12017.
×
Page 59
Suggested Citation:"5 What to Archive." National Research Council. 2007. Environmental Data Management at NOAA: Archiving, Stewardship, and Access. Washington, DC: The National Academies Press. doi: 10.17226/12017.
×
Page 60
Suggested Citation:"5 What to Archive." National Research Council. 2007. Environmental Data Management at NOAA: Archiving, Stewardship, and Access. Washington, DC: The National Academies Press. doi: 10.17226/12017.
×
Page 61
Suggested Citation:"5 What to Archive." National Research Council. 2007. Environmental Data Management at NOAA: Archiving, Stewardship, and Access. Washington, DC: The National Academies Press. doi: 10.17226/12017.
×
Page 62
Suggested Citation:"5 What to Archive." National Research Council. 2007. Environmental Data Management at NOAA: Archiving, Stewardship, and Access. Washington, DC: The National Academies Press. doi: 10.17226/12017.
×
Page 63
Suggested Citation:"5 What to Archive." National Research Council. 2007. Environmental Data Management at NOAA: Archiving, Stewardship, and Access. Washington, DC: The National Academies Press. doi: 10.17226/12017.
×
Page 64
Suggested Citation:"5 What to Archive." National Research Council. 2007. Environmental Data Management at NOAA: Archiving, Stewardship, and Access. Washington, DC: The National Academies Press. doi: 10.17226/12017.
×
Page 65
Suggested Citation:"5 What to Archive." National Research Council. 2007. Environmental Data Management at NOAA: Archiving, Stewardship, and Access. Washington, DC: The National Academies Press. doi: 10.17226/12017.
×
Page 66
Suggested Citation:"5 What to Archive." National Research Council. 2007. Environmental Data Management at NOAA: Archiving, Stewardship, and Access. Washington, DC: The National Academies Press. doi: 10.17226/12017.
×
Page 67
Suggested Citation:"5 What to Archive." National Research Council. 2007. Environmental Data Management at NOAA: Archiving, Stewardship, and Access. Washington, DC: The National Academies Press. doi: 10.17226/12017.
×
Page 68

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

5 What to Archive PRINCIPLE #7: A formal, ongoing process, with broad community input, is needed to decide what data to archive and what data not to archive. NOAA needs to establish a high-level, enterprise-wide approach to decide what data to include in their archives. The decision to archive (or not to archive) data should be driven by societal benefits, should ensure that irreplaceable environmental data are preserved, and should explicitly incorporate broad community engagement and coordination with other agencies. Before a diverse group of users can be provided with a wide spec- trum of useful and reliable environmental data, it must first be decided what data to include in the data archive. The decision to archive or not to archive a data set is often made before the data actually become avail- able (for instance, as part of the requisite data management planning now required for all new NOAA data streams), and the decision must be repeated each time the data are revised or reprocessed in accordance with the data stewardship guidelines and practices described in Chapter 4. The iterative nature of this decision-making process is illustrated in Figure 5-1, which extends Figure 4-1 to indicate the two points during the data life cycle when the decision to archive or not to archive data is made. For some data sets, archiving requirements are explicitly spelled out in legislation, administrative orders, or agreements with other agencies. However, as noted in Chapter 2, these requirements are often not very 55

56 ENVIRONMENTAL DATA MANAGEMENT AT NOAA Figure 5-1  The data management life cycle, including decision points. Data and associated metadata are brought to the attention of data stewards, at which point a decision is made either to discard the data and metadata or to integrate them into the archive and access system. As described in Chapter 4, data stewards work with users to evaluate the data over time, leading to additional knowledge about the data. This knowledge might include fixable problems with the data or scientific findings that add to the metadata. If improvements are possible, they should be applied, at which point the archiving, evaluation, and improvement cycle begins again. If the problems are not fixable, or if the data are determined to be no longer useful, the data may be discarded in accordance with the guidelines described in this chapter. specific, leaving data managers with considerable flexibility, but also little guidance, for determining which data sets to archive. Based on this committee’s review of current data management practices at NOAA, it appears that most archiving decisions are made in a deliberate manner and with a concerted effort to meet user needs. There have also been some preliminary, sporadic attempts to involve users more directly and to improve communication with other agencies. However, the decision- making process is largely ad hoc from an enterprise-level perspective, and stakeholders are not engaged in a systematic way. The lack of a complete, publicly available inventory of all federal environmental data holdings and a formalized, inclusive process for making archival decisions not

WHAT TO ARCHIVE 57 only decreases the efficiency and the effectiveness of NOAA’s entire data management enterprise, but also leads to a situation where valuable envi- ronmental data could potentially “fall between the cracks” (see Box 3-1). In addition, stakeholders often develop their own products from NOAA data sources. NOAA should remain engaged with these stakeholders and incorporate these supplementary products into the agency’s archives if they have the potential to prove useful for a significant number of users. To ensure that all important environmental data are archived and improve the overall efficiency and effectiveness of NOAA’s data man- agement enterprise, the committee recommends that NOAA establish a formal, ongoing process to evaluate the need for original and continued archiving of all environmental data sets that may fall under NOAA’s mission, considered in the broadest possible context. This principle rep- resents a more focused manifestation of principle #9 (explored in further detail in Chapter 7) that “effective data management requires a formal, ongoing planning process.” As with other aspects of data management, the decision-making process for data archiving should be driven by the value to society, should incorporate periodic input from users, and should include ongoing coordination and collaboration with other agencies and international groups. Creating a complete, publicly available inventory of NOAA’s data would be a logical first step, and the resulting guidelines and procedures for data archiving would need to be sufficiently flexible and adaptable to accommodate the ongoing nature of the data life cycle and the wide spectrum of different kinds of environmental data managed by NOAA’s National Data Centers and centers of data (the entities that would ultimately be responsible for implementation). In the remainder of this chapter we explore the nature and nuances of data archiving deci- sions in additional detail and provide general guidelines that NOAA and its partners can use to develop more specific rules and procedures for deciding which current and future data sets to include in their archives. Data types Throughout this report, the term “environmental data” is used broadly to indicate all types of environmental Earth System observa- tions (including physical samples as well as in situ and remotely sensed data), model output, and synthesized products derived from these data. NOAA has defined seven different general types of data (see Appendix C), of which five are likely to contain candidate data sets for archiving under NOAA’s mission. Those five are: (1) Original Data; (2) Synthesized Products; (3) Hydrometeorological, Hazardous Chemical Spill, and Space Weather Warnings, Forecasts, and Advisories; (4) Natural Resource Plans; and (5) Experimental Products. The guidelines that follow address the

58 ENVIRONMENTAL DATA MANAGEMENT AT NOAA archiving requirements for some of these data types in detail, while oth- ers are examined in a more general sense. As part of the formal, ongoing planning process that we recommend NOAA develop and implement, NOAA may wish to revise their data type definitions to include more spe- cific guidance on the types of data that should be archived, thus improv- ing enterprise-wide coordination. Guideline: It is especially important to save the most primitive use- ful forms of all environmental data. Original Data, which represent the most primitive useful form of environmental observations, should always be considered for long-term archiving. Most environmental observations are irreplaceable, since re­sampling is usually impossible, and are expensive to obtain, especially compared to typical data management costs. Although it is extremely difficult to assess the quantitative value of most Original Data, especially for future applications, access to a broad and continuous collection of high-quality observational data is essential to long-term environmental monitoring, model development and testing, and many other areas of Earth System research. For instance, in the report Temperature Trends in the Lower Atmosphere: Steps for Understanding and Reconciling Differences (CCSP, 2006), archived radiosonde and satellite measurements were used to resolve a debate concerning the amount of warming observed in the troposphere. Since they cannot be regenerated or reproduced from other archived data, Original Data should always be considered for archiving and should be preserved even when improved or reprocessed versions of the same data are later developed. In fact, it is advisable to begin each new iteration with Original Data so that the quality and error assessment of new versions is independent of previous data manipulations. Although it is critical to plan for the archive growth associated with current and future data streams, many applications, such as climate change detection and attribution, require data collected over extended periods of time. A number of Original Data sources critical to NOAA’s mission are currently at risk or are essentially inaccessible, often because they are stored on deteriorating, substandard, or outdated media. Although some of these data might be considered “permanently stored,” they cannot be considered properly archived if they are inaccessible to researchers should an important future application arise. NOAA should make certain that critical at-risk observations are at least migrated to stable media; it would be preferable to add these data to NOAA’s archives. The Climate Data Modernization Program has already initiated a number of effective data   http://www.ncdc.noaa.gov/oa/climate/cdmp/cdmp.html.

WHAT TO ARCHIVE 59 “rescue and recovery” efforts that have resulted in drastically improved archiving and access of many at-risk data sets. NOAA’s National Data Centers should continue and expand these efforts by working with users and other agencies to identify, obtain, and archive at-risk environmental data. Guideline: It may be more cost-effective to regenerate certain kinds of environmental data on demand. Some types of Synthesized Products, such as model output, repro- cessed data, and analyzed or derived products, could potentially be regenerated from other archived data. In some cases, it may be more cost- effective to regenerate these products on demand, rather than archiving every single version of every single data set associated with a particular data stream. Proper care needs to be taken to ensure that the hardware and software needed to perform the requested operation remains sup- ported and that it provides the exact same high-quality information as a true archive. These two requirements are challenging because processing algorithms—especially model code—are often complicated and machine dependent. For on-demand processing or reprocessing, it is also impera- tive to maintain an easily accessible archive of “first-stream” data, which could be the Original Data (such as radiosonde data) or a product gener- ated from Original Data (such as radar precipitation data). The evolution of the data should also be fully documented, including records of suc- cessive improvements, recalibrations, and other modifications (includ- ing changes to the model code) that impact regeneration. Proprietary algorithms should also be avoided unless they are fully transparent and reusable. Box 5-1 contains an example that illustrates how processing on demand has proven to be useful in reducing the archive volume for Moderate Resolution Imaging Spectroradiometer (MODIS) “Level 1b” data. NOAA should consider employing similar techniques for some of its data, especially the data expected from the National Polar Orbiting Envi- ronmental Satellite System (NPOESS) and NPOESS Preparatory Project (NPP). This will require careful planning and close collaboration with fed- eral agency partners (NASA and the Department of Defense) and private contractors. However, some caution is also warranted since there is not a general consensus within the data management community about the true utility and reliability of processing or regenerating data on demand versus archiving all levels of each data set. In addition, even though some applications may benefit from a process-on-demand approach, there may be legal concerns with failing to archive all intermediate versions of a data set. For example, certain versions of climate data sets are cited in a

60 ENVIRONMENTAL DATA MANAGEMENT AT NOAA BOX 5-1 Regenerating Versus Archiving MODIS Data NASA has taken responsibility for archiving and stewardship of Earth Ob- serving System (EOS) satellite data. In order to improve the cost-effectiveness of the Moderate Resolution Imaging Spectroradiometer (MODIS) data archive, the data will be moved from the central Goddard Earth Sciences Distributed Active Archive Center (DAAC) to the decentralized MODIS Adaptive Process- ing System (MODAPS). In addition, the “Level 1b” data (calibrated, geo-located radiances based on “Level 1a” data of radiance counts) will be changed from “permanent archive” to “regenerate on demand” status (Maiden, 2006). This change will be based on an extension to existing MODAPS science team produc- tion systems that produced only higher-level (higher than Level-1b) MODIS land and atmosphere products in the past, and it will use commodity-based systems for processing and online data storage and access. Anticipated benefits of this transition include: • Reduction in archive growth through on-demand processing; • Faster access to products due to reduced processing time resulting from all online storage; • Reduced costs due to use of commodity disks and reduction in operating costs at the DAAC; • 10 percent fewer products and 90 percent reductions in total archive volume and archive growth at the Goddard Earth Sciences DAAC; and • Closer involvement and control by the science community, which is ex- pected to make the products, tools, and processing capabilities at the archive more responsive to scientific needs. variety of peer-reviewed publications and assessments. Future requests to access these data for verification purposes need to be accommodated, even though more recent versions of the data are available. The next two guidelines address some of these concerns. Guideline: The most obvious candidates for reduced archiving requirements are data that are obsolete or redundant, that could be regenerated on demand, or that clearly have only short-term uses. This includes older versions of reprocessed data and model output. One strategy that could help data managers make practical but care- ful archiving decisions is to first identify those data that clearly have only short-term uses. Candidate data sets include those whose original and predicted purpose has been satisfied, those that may be cost-effectively regenerated from archived first-stream input (that is, when the cost of

WHAT TO ARCHIVE 61 storing the data exceeds the cost of reproducing or regenerating the data), or those that are obsolete or redundant. These data may include Experi- mental Products (see Appendix C) or data collected for specific short-term applications such as near-term operational decisions, because these data typically have little value after the decision, product, or improvement has been made and can often be reproduced if necessary. Other candidates may include high-resolution data intended for modeling or image display applications that would be adequately served with lower-resolution ver- sions of the data, or data used in field research programs (for example, satellite data over a limited area) that are redundant with data held in global archives. Some of the data used to create Hydrometeorological, Hazardous Chemical Spill, and Space Weather Warnings, Forecasts, and Advisories (Appendix C) may also fall under this guideline, such as long-range and intermediate forecast model output or high-volume data (sub-second wind or radar returns, for instance) used to generate longer averages of specified parameters for operational reporting purposes. In situations where multiple versions of derived products have been generated, it would be helpful to have a defined process in place to deter- mine which versions need to be archived. The following three questions, for example, could form the basis for such decisions: 1. Is it feasible to retain multiple versions of the data? 2. Are the differences among the various versions sufficiently large and scientifically important to make it worth preserving multiple versions? 3. Is it too technically difficult to regenerate earlier versions? If the answer to all three questions is positive, then multiple versions should be archived. For example, NCEP has produced and is currently extending both the NCEP-NCAR Reanalysis (see Box 4-2) and the NCEP/ Department of Energy (DOE) Reanalysis (Kanamitsu et al., 2002). These two data sets could be considered different versions during their period of overlap (1979 onward). However, they have some large and scientifically important differences: for example, the NCEP/DOE Reanalysis is gener- ally viewed to have superior surface radiation analyses, while the NCEP- NCAR Reanalysis is critical for climate assessments because it begins in 1948. In addition, many scientific papers, assessments, and other publica- tions have been produced based on earlier versions of the NCEP-NCAR Reanalysis. Furthermore, due to the complexity of the models used to perform these reanalyses, archiving earlier versions would be much more cost-effective than regenerating them from archived first-stream input. Comparative use of multiple reanalyses, including multiple versions of the same reanalysis, is also likely to be helpful to those responsible for generating future versions of the reanalysis. Thus, none of the criteria for

62 ENVIRONMENTAL DATA MANAGEMENT AT NOAA designating the earlier version as a candidate for disposal is presently satisfied. In contrast, the MODIS team (see Box 5-1) has produced four versions of MODIS data, with the fifth version due to be delivered in 2007. All of the derived products included in the initial two versions would be candi- dates for disposal, because all three of the above criteria are met: as initial products, their original purpose has been largely satisfied; the archive cost of these products is quite high; and they have become obsolete due to their known serious deficiencies. Similarly, the volume of ensemble weather prediction model output may be too large to retain complete data on all members of the ensemble, although there have been a variety of plans put forward to archive subsets of these forecasts for research purposes. An even more complicated example is described in Box 5-2. Guideline: Archiving and access decisions are closely related. In general, when resources are limited, access to older or less commonly used data should be scaled back rather than removing data from the archive. As noted previously, not all data sets are of equal value, and practical constraints prevent all data from being archived and made readily acces- sible, so at some point certain data will need to be designated for reduced archiving and/or access requirements. Ideally, this decision would be made based on the current utility and potential future value of the data, but as noted at several points in this report, it is extremely difficult to assess even the current value of any particular environmental data stream (see, for example, Millard et al., 1998). Likewise, it is virtually impossible to anticipate its potential future uses. The decision-making process also needs to be ongoing, with data managers/stewards continually reviewing the data holding under their purview to determine the appropriate level of service for each data set given legal and mission requirements, user needs, and available resources. A difficult task, but one that could yield major economic benefits for NOAA, is to determine the realistic and required level of service for each archived data set. For example, large but little-used large data sets could be placed in a “deep archive” that is secure and offers a clearly defined access path that potential users could discover, but the data would not be readily available on demand. This strategy would be particularly appro- priate in cases where the decision to stop archiving a data set would be irreversible (that is, the data could not be regenerated or resampled). One such example might be the hard copies of original high-resolution

WHAT TO ARCHIVE 63 BOX 5-2 Multiple Versions of MSU Data A Microwave Sounding Unit (MSU) is a satellite instrument that measures radiation emitted from the Earth’s atmosphere, with different frequencies cor- responding to different atmospheric layers. Data from these instruments, which have been installed on a series of polar-orbiting satellites operated by NOAA since 1979, have been analyzed and pieced together to yield a global record of atmospheric temperature commonly referred to as the “MSU temperature record.” Not only is this record constantly being extended, but a number of subtle errors have been detected and removed, and different versions are being produced by different groups using different analysis techniques (Christy et al., 2000, Christy et al., 2003, Spencer et al., 2006, etc.). The Climate Change Science Program’s (CCSP’s) Synthesis and Assessment Product 1.1, Temperature Trends in the Lower Atmosphere: Steps for Understanding and Reconciling Differences (CCSP, 2006) discusses these and other issues associated with constructing records of global temperature change in the lower atmosphere. The MSU temperature record poses an interesting and illustrative challenge for data archiving because it is not immediately clear how many different versions of the data should be archived and made readily available to users. The most recent versions of the data will obviously be in high demand, but older versions might also be of interest due to the public controversy that has surrounded some of the revisions to the data, as well as the fact that some prominent publications (including international assessments) have referenced the older data sets. Thus, it would be prudent to provide some degree of access to at least the last few ver- sions of the data. Detailed descriptions of all revisions should be made available in the metadata for each data set, and ideally the metadata for older versions will contain information about more recent versions of the data and also the publica- tions that have used the data. Given the high level of public interest in the data, it would also be advisable to create a standing advisory group to provide ongoing advice on the level of archiving and access needed for MSU data and other similar types of data. (one-third of a nautical mile) Defense Meteorological Satellite Program (DMSP) satellite imagery, which is archived at the National Snow and Ice Data Center for use in analyzing snow cover and is only available in hard copy format. However, it is imperative to make certain that the data could still be retrieved if needed, and implementation of this guideline at NOAA would likely require some modification to existing policies across the agency.

64 ENVIRONMENTAL DATA MANAGEMENT AT NOAA The Importance of Stakeholder Involvement Since the benefit to society is the ultimate rationale for federally sup- ported data archives, the decision to archive or continue archiving a data set or a general category of data should be driven by the societal benefits that the data provides or could provide. However, there is a vexing prob- lem associated with this imperative: while the societal benefits of envi- ronmental data are both ubiquitous and diverse, estimating the present and especially the future value of any particular environmental data set or data type is extremely difficult. This difficulty complicates all data man- agement decisions, especially the decision of what data not to archive. While data managers can use a variety of strategies to help inform their archiving decisions, the most effective and essential practice is to actively engage a broad range of stakeholders on an ongoing basis. These stake- holders include the current and potential future users of environmental data, data managers from other program elements at NOAA, and data managers at other federal agencies, universities, and international entities. The following section focuses on the usefulness and importance of broad stakeholder involvement in data archiving decisions. Guideline: It is essential to solicit user input when making deci- sions on whether to archive or continue archiving a data set. There are many aspects of data management that benefit when users are engaged in a substantial, active, and ongoing manner. In regard to data archiving decisions, which are inherently complex, feedback from users is critical to help data stewards evaluate the current uses and poten- tial future societal benefits associated with a particular data set or general category of data. While such evaluations must necessarily be qualitative in nature, even a relative assessment of the current or likely future value of data is critical for three main purposes: (1) to ensure that all important data are preserved; (2) to identify data sets that could potentially be repro- cessed or regenerated on demand; and (3) to inform decisions about the appropriate level of service for each data set. For example, advice from users can help data managers decide on the appropriate resolution of data to include in the archive for observations taken at very high temporal, spatial, and/or spectral resolutions, such as radar or satellite data. The participation of expert users is also helpful for identifying obsolete or redundant data and for identifying important or irreplaceable at-risk data that are currently not in the archive. Even when user input is solicited, deciding what environmental data to archive is difficult because the user communities are large and diverse and each data set has a broad range of potential future applications. The

WHAT TO ARCHIVE 65 archiving requirements for global change science applications are par- ticularly difficult to predict because the field is young and rapidly evolv- ing. These considerations reemphasize the need for a broad, formal, and ongoing process to inform data archiving decisions. In addition to engag- ing ”pure” users, who often focus on a particular application or type of data, the advice of data scientists both inside and outside of NOAA who critically evaluate data, integrate data from different sources, and develop value-added products should also be weighed carefully in any decision to archive or stop archiving a particular data set. In addition to being aware of data quality, applicability, and limitations, data stewards should be aware of the past uses for the data as well as new research, data sources, and processing algorithms that might influence the future usefulness of the data. Data usage metrics are a valuable tool for assessing the current status, potential future developments, and level of service required for different data sets, but communication with data providers is also essen- tial. NOAA might wish to consider implementing a decision-making process similar to the one described in Box 5-3. Guideline: Because the decision to stop archiving is normally irre- vocable, extra attention to community engagement is needed before final disposal of any data. One of the most important practical functions that users can serve is to identify data sets that could be considered for disposal or for reduced lev- els of service when all foreseeable uses of the data have been exhausted or when more current or comprehensive information is available elsewhere. In the case of decisions to stop archiving data, which are irrevocable, it is essential to notify and actively engage the broadest possible spectrum and number of users before final disposal of the data. Currently, NOAA data managers only notify known stakeholders (that is, historically active users and, sometimes, data managers at other agencies) about the potential loss of data. With the continuing expansion of multidisciplinary analysis and the increase of new potential users that will result from improved data discovery and integration capabilities (see Chapter 6), a better enterprise- level notification system should be established to advertise data manage- ment decisions, especially decisions to dispose of data but also decisions to significantly reduce levels of service. The decision to stop archiving a data set should at least be communicated to any current users and could also be posted in the federal register at least one year prior to the sched- uled disposal date. This would give all users, including other federal agencies, nongovernmental organizations, universities, and international groups, an opportunity to comment on the decision or take over archiving responsibilities.

66 ENVIRONMENTAL DATA MANAGEMENT AT NOAA BOX 5-3 Decision-making Process for Archiving Data at EROS The records disposition schedule process at the U.S. Geological Survey’s (USGS’s) Earth Resources Observation and Science (EROS) data centera is an example of a formal, enterprise-wide policy for data disposal designed to preserve essential data while maximizing cost-effective disposal of data that are no longer needed. The three objectives of the program are to: 1. Preserve records that have permanent value; 2. Destroy records of temporary value as soon as they have served the pur- pose for which they were created; and 3. Remove old records from office space and filing equipment to storage facili- ties, thereby improving use of files and reducing maintenance costs. To accomplish these objectives, USGS developed mission-specific records schedules for geological, geographical, and water data; a schedule for their biology records program is currently being planned.b These records disposition schedules provide mandatory instructions on what to do with records (and non-record ma- terials) that are no longer needed for current business and also the authority to dispose of recurring or nonrecurring records. There is an online tool to facilitate this decision-making process.c Overall, this process provides an efficient, cost-ef- fective, and transparent method for making data archiving decisions. ahttp://edc.usgs.gov/. bhttp://www.usgs.gov/usgs-manual/schedule/432-1-s1/howtouse.html. chttp://eros.usgs.gov/government/RAT/tool.php. Guideline: NOAA should establish close partnerships with other national and international data holding institutions and engage these institutions as part of the archiving process. It is important to have clear agreement on which partner has what archival responsibility. All environmental data collected or generated using governmen- tal resources or that fall under NOAA’s mission need to be considered for archiving. However, many different federal agencies, state and local agencies, universities, international organizations, and other groups are involved in the collection and management of environmental data. For example, some of the international organizations responsible for collect- ing and distributing environmental data relevant to NOAA’s mission include the Group on Earth Observations (GEO), Committee on Earth

WHAT TO ARCHIVE 67 Observation Satellites (CEOS), International Council for Science (ICSU), World Meteorological Organization (WMO), International Oceanographic Commission (IOC), and International Hydrographic Organization (IHO). NOAA should coordinate with all of its domestic and international part- ners to establish common standards and protocols for the broad spec- trum of environmental data that falls under its mission, to ensure that all important data and metadata are archived and made available to users, and to maximize the overall efficiency and cost-effectiveness of govern- ment archiving activities. The importance of using agreed-upon standards and protocols to facilitate and promote data stewardship was discussed in Chapter 4, and Chapter 6 includes additional discussion of the benefits of interagency and international coordination for data discovery, access, and integration. Before these higher-level benefits of interagency and international coor- dination can be realized, however, a formal process should be developed and implemented to determine which agency is responsible for archiving each data set. Since archiving and access are so closely related, ideally this process would be part of a full evaluation of roles and responsibili- ties throughout the data management life cycle. Coordination is essential to improve cost-effectiveness, to reduce duplication and redundancies, to identify at-risk data streams, and to make sure all critical environmental data are archived. The 1989 and 1992 memorandums of understanding (MOUs) between NOAA and NASA, discussed previously in Chapter 2 and Box 3-1, illus- trate both the benefits and the pitfalls of interagency agreements. Origi- nally, these MOUs called for careful coordination between the two agen- cies and suggested that NOAA would be willing to provide the long-term archive for Earth Observation System (EOS) data generated by NASA satellites. However, NOAA has subsequently assumed archiving respon- sibilities for only a small fraction of EOS data (specifically, the Moderate Resolution Imaging Spectroradiometer [MODIS] Level 1b data), with oth- ers not being archived. This decision was presumably driven by lack of resources and has been justified by citing a lack of relevance to NOAA’s operational mission, even though much of the EOS data is critical for global change research. Moreover, a number of program elements within NASA were not made aware of NOAA’s decision for some time. Fortu- nately, NASA has subsequently found a way to take on the archiving responsibilities for the remaining EOS data sets through several of its distributed data archives, but it is unclear how discoverable these data will be to regular NOAA archive users or what further inefficiencies this unfortunate lack of coordination might engender. In circumstances such as these, where specific archiving responsibilities were not agreed to at

68 ENVIRONMENTAL DATA MANAGEMENT AT NOAA the beginning of the data collection campaign, closer ongoing collabora- tion is needed to make certain that all environmental data are properly archived. If interagency agreements continue to prove ineffective, other mechanisms for improving coordination, such as oversight at the CCSP, Office of Management and Budget, and/or Office of Science and Technol- ogy Policy levels, should also be considered.

Next: 6 Data Discovery, Access, and Integration »
Environmental Data Management at NOAA: Archiving, Stewardship, and Access Get This Book
×
 Environmental Data Management at NOAA: Archiving, Stewardship, and Access
Buy Paperback | $45.00 Buy Ebook | $35.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

The National Oceanic and Atmospheric Administration (NOAA) collects, manages, and disseminates a wide range of climate, weather, ecosystem and other environmental data that are used by scientists, engineers, resource managers, policy makers, and others in the United States and around the world. The increasing volume and diversity of NOAA's data holdings - which include everything from satellite images of clouds to the stomach contents of fish - and a large number of users present NOAA with substantial data management challenges. NOAA asked the National Research Council to help identify the observations, model output, and other environmental information that must be preserved in perpetuity and made readily accessible, as opposed to data with more limited storage lifetime and accessibility requirements. This report offers nine general principles for effective environmental data management, along with a number of more specific guidelines and examples that explain and illustrate how these principles could be applied at NOAA.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!