Skip to main content

Currently Skimming:

3 Principles for Working with Big Data
Pages 13-21

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 13...
... • Data preparation is an important, time-consuming, and often over looked step in data analysis, and too few people are trained in it. (Juliana Freire)
From page 14...
... The course uses real-world data from a variety of sources, including Twitter, Wikipedia, and other companies. Teams of three students propose projects, including the data set to use, the expected results, and how to evaluate their results.
From page 15...
... BIG DATA MACHINE LEARNING -- PRINCIPLES FOR INDUSTRY Alexander Gray, Skytree Corporation Alexander Gray began by briefly describing the first three phases of machine learning: artificial intelligence and pattern recognition (1950s-1970s) , neural net works and data mining (1980s and 1990)
From page 16...
... And he pointed out the utility of visualizing data in a data-specific and domain specific approach and indicated a need for improved exploratory data analysis and visualization tools. A workshop participant supported the use of visualiza tion and emphasized the need to include the human in the loop; the user should be responsible for and involved in the visualization, not passive, and the visualization should enhance understanding of the data.
From page 17...
... Students tend to have had few computing and statistics classes on entering graduate school in a domain science. Temple Lang then described the data analysis pipeline, outlining the steps in one example of a data analysis and exploration process: 1.
From page 18...
... A small set of statistical measures can characterize scatter plots, and exploratory data analysis can be con ducted on the residuals.4 A workshop participant noted the difference between a data error and a data blunder. A blunder is a large, easily noticeable mistake.
From page 19...
... The path from data to knowledge, she noted, is human-based and has many complicated elements. Freire explained that the CRA data analysis pipeline tasks can be classified into two categories: data preparation (which includes acquisition and recording; extraction, cleaning, and annotation; and integration, aggregation, and representa tion)
From page 20...
... She stated that data preparation takes a long time, is idiosyncratic, and can limit analyses. She also noted that new data sets continually provide new challenges in big data, and many needs are not met by existing infrastructure.
From page 21...
... She said that computer science and data management research have partly failed in that it has not been able to create usable tools for end users. Freire stated that the complexity of data science problems is often underestimated.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.