Skip to main content

Currently Skimming:

2 Knowledge for Data Scientists
Pages 12-34

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 12...
... The ability to measure, understand, and react to large quantities of complex data can shape scientific discovery, social interaction, political interactions and institutions, economic practice, public health, and many other areas. Data science workflows not only consume data, but they also produce data -- such as intermediate data sets, statistics, and other by-products such as visualization -- that need to be understood.
From page 13...
... Data scientists often work at the interface of disciplines and can help develop new approaches to address problems in these areas. Data science applications have varying levels of risk.
From page 14...
... Increasingly, domains in the humanities, such as philosophy, rhetoric, history, and literary studies, embrace elements of data science while issues of algorithmic bias present moral and ethical questions. Data scientists have the potential to help address critical real-world challenges.
From page 15...
... KNOWLEDGE FOR DATA SCIENTISTS 15 the analysis of tissue images, promising a more accurate diagnosis than traditional techniques (Codella et al., 2017)
From page 16...
... . Three-fifths of the data science and analytics jobs today C are in the finance and insurance, professional services, and information technology sectors, but the manufacturing, health care, and retail sectors also are hiring significant numbers of data scientists (Markow et al., 2017)
From page 17...
... The lessons learned from other disciplines can help pave the way to ensuring the success of data science education. DATA SCIENTISTS OF TODAY AND TOMORROW As was discussed in the previous section, there is a current shortage of workers with data science skills.
From page 18...
... • Data storage and access. Data scientists who focus on managing data storage solutions as well as extracting, transforming, and loading data for modeling should have the ability to manage exception ally large data sets from a variety of heterogeneous data sources and in batch or streaming form, and to assess the predictive value of these data sources.
From page 19...
... Beyond the differences among them, there is considerable variance in the lower-order and higherorder knowledge and skills that some data science jobs require. There are also many commonalities among the varied types of data scientists.
From page 20...
... This will result in programs that prepare different types of data scientists. Recommendation 2.1: Academic institutions should embrace data science as a vital new field that requires specifically tailored instruc tion delivered through majors and minors in data science as well as the development of a cadre of faculty equipped to teach in this new field.
From page 21...
... noted the need for data scientists who can face "essential questions of a lasting nature and [use] scientifically rigorous techniques to attack those questions." Students also need to learn how to ensure that outcomes are valid -- extracting the right insights and having confidence that, start to finish, what one says is true, within some margins of error.
From page 22...
... Finding 2.3: A critical task in the education of future data scientists is to instill data acumen. This requires exposure to key concepts in data science, real-world data and problems that can reinforce the limitations of tools, and ethical considerations that permeate many applications.
From page 23...
... Some data scientists and programs require a deeper understanding of mathematical underpinnings. This might include the following: • Partial derivatives (to understand interactions in a model)
From page 24...
... While it would be ideal for all data scientists to have extensive coursework in computer science, new pathways may be needed to establish appropriate depth in algorithmic thinking and abstraction in a streamlined manner. This might include the following: • Basic abstractions, • Algorithmic thinking, • Programming concepts, • Data structures, and • Simulations.
From page 25...
... A sound knowledge of basic theoretical foundations will help inform their analyses and the limits to their models. Successful graduates will be able to apply statistical knowledge and computational skills to formulate problems, plan data collection campaigns or identify and gather relevant existing data, and then analyze the data to provide insights." To avoid drawing invalid or incorrect conclusions, data science students need to understand the concept of inference, including sampling and nonsampling errors.
From page 26...
... A key challenge for data scientists is to be able to tell a story with data and translate key aspects of the data science life cycle and outcomes of efforts to both users
From page 27...
... Data Modeling and Assessment Data scientists have a rich and growing set of models and methods at their disposal. The challenge is how to identify which models are most appropriate for a given setting and assess whether the assumptions and conditions needed to apply that method are tenable.
From page 28...
... control systems, • Reproducible analysis, and • Collaboration. Communication and Teamwork One major distinguishing attribute of the work of data scientists centers on their capacity to frame research questions well and then communicate the findings in writing, in graphical form, and in conversation.
From page 29...
... will help ensure that data scientists develop the capacity to pose and answer questions with data. Reinforcing skills and capacities developed in data science courses in the context of a specific domain will help students see the entire data science process.
From page 30...
... Students also need to be aware of legal requirements aimed at protecting individuals' privacy such as the European Union General Data Privacy Regulation, which aims to increase the rights of data subjects and provides penalties for individuals or organizations that violate them.7 Ethical considerations, in other words, lie at the heart of data science. Unique ethical considerations arise in each step of and throughout the data science life cycle (i.e., when posing a question; collecting, cleaning, and storing data; developing tools and algorithms; performing exploratory analysis and visualization; making inferences and predictions; making decisions; and communicating results)
From page 31...
... In addition to learning about standards for responsible behavior through such case studies, students would also benefit from instruction in developing specific skills to navigate the challenging ethical problems with which data scientists struggle. Key aspects of ethics needed for all data scientists (and for that matter, all educated citizens)
From page 32...
... Given the sensitive nature of certain types of data and the significant ethical implications of working with such data, efforts to establish a code of ethics for data scientists are under way throughout the field.8 Data science ethics might be codified in an "oath" similar to the Hippocratic Oath taken by physicians as a way to crystallize what is being asked of them. Although the specific content and form of an oath may be controversial, it can also underline the importance of the commitment being made.
From page 33...
... 2017. IBM predicts demand for data scientists will soar 28% by 2020.
From page 34...
... 2017. The Quant Crunch: How the Demands for Data Science Skills Is Disrupting the Job Market.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.