Skip to main content

Currently Skimming:

1 Introduction
Pages 1-5

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 1...
... Large-scale data integration refers to the challenge of aggregating data sets that are so large that searching or moving them is nontrivial, or to the challenge of drawing selected information from a collection (possibly large, distributed, and heterogeneous) of such sets.
From page 2...
... Data integration must overcome the challenge of finding disparate, distributed sources of data, which is often referred to as "data discovery," and the challenge of effectively utilizing the collective information in those sources to produce new insight -- a process known as "data exploitation." The workshop on which this report is based did not try to characterize comprehensively the various ways in which data integration is useful or necessary for the advance of science. The term "data integration" first emerged in connection with the need for organizations to provide data users "with a homogeneous logical view of data that is physically distributed over heterogeneous data sources" (Ziegler and Dittrich, 2004)
From page 3...
... Workshop participants were also aware that a growing number of opportunities require the aggregation of large numbers of modest-size datasets, and some of the workshop discussion reflects the challenges associated with those situations. To bound the discussion and produce the most useful outcomes, the workshop planning committee decided to focus on issues related to integrating scientific research data.1 The particular disciplines discussed include physics, biology, chemistry, Earth sciences, satellite imagery, astronomy, geospatial data, and research medical data.
From page 4...
... He gave the example of modeling the effects of hurricanes and storm surges, which requires bringing together a wide range of models and data, including satellite observations, atmospheric models, storm-surge models, wave models, levee models, traffic flow models, and so on. This increase in the prevalence of collaboration calls for cyberinfrastructure to support distributed teams of researchers who collaborate through sharing data.
From page 5...
... The goals were to identify areas in which the emerging needs of research communities are not being addressed and to point to opportunities for addressing these needs through closer engagement between the affected communities and cutting-edge computer science. The workshop also discussed policy barriers to widespread data sharing, considering the pros and cons of various ways forward.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.