Skip to main content

Currently Skimming:

11 Conclusions
Pages 161-166

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 161...
... The study that led to this report reached the following conclusions: • Recent years have seen rapid growth in parallel and distributed computing systems, developed in large part to serve as the back bone of the modern Internet-based information ecosystem. These systems have fueled search engines, electronic commerce, social networks, and online entertainment, and they provide the platform on which massive data analysis issues have come to the fore.
From page 162...
... In general, data analysis is based on assumptions, and the assumptions underlying many classical data analysis methods are likely to be broken in massive data sets. • Massive data analysis is not the province of any one field, but is rather a thoroughly interdisciplinary enterprise.
From page 163...
... • Massive data analysis creates new challenges at the interface be tween humans and computers. As just alluded to, many data sets require semantic understanding that is currently beyond the reach of algorithmic approaches and for which human input is needed.
From page 164...
... Statistical research has rarely considered constraints due to real-time decision-making in the development of data analysis algorithms, and computational research has rarely considered the computational complexity of algorithms for managing statistical risk. • There is a major need for the development of "middleware" -- software components that link high-level data analysis specifica tions with low-level distributed systems architectures.
From page 165...
... It is perhaps premature to suggest curricula for such programs, particularly given that much of the foundational research in massive data analysis remains to be done. Even if such programs minimally solve the difficult problem of finding room in already-full curricula in computer science and statistics, so that complementary ideas from the other field are taught, they will have made very significant progress.
From page 166...
... Moreover, deployment of such a system will require modeling decisions, skill with approximations, attention to diagnostics, and robustness. As much as the committee expects to see the emergence of new software and hardware platforms geared to massive data analysis, it also expects to see the emergence of a new class of engineers whose skill is the management of such platforms in the context of the solution of real-world problems.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.