Skip to main content

Currently Skimming:

4 Overview of Data Science Methods
Pages 53-79

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 53...
... The second aspect involves data analytics and includes data mining, text analytics, machine and statistical learning, probability theory, mathematical optimization, and visualization. This chapter discusses some of the data science methods and practices being employed in various domains that pertain to the analysis capabilities relating to personnel and readiness missions in the Department of Defense 53
From page 54...
... Some of the important considerations discussed include how descriptive and predictive analytics methods can be used to better understand what the data indicate; how decision making under uncertainty can be enhanced through prescriptive analytics; and the usefulness and limitations of these approaches. The proliferation of data and technical advances in data science methods have created tremendous opportunities for improving these analysis capabilities.
From page 55...
... , flat files, and text documents) must first go through various data preparation methods to prepare them for analysis.
From page 56...
... The following sections discuss some of the tasks involved in preparing data for analysis and some techniques to assist analysts with those tasks. Common Data Preparation Tasks Data in their original form are not typically ready to be used without some initial work.
From page 57...
... For example, surveys can have a missing or inappropriate response or have entire sections that were not completed or failed to cover certain segments of the target population. Data Uniformity and Reconciliation If an analysis is to use multiple data sources, how a particular data item is interpreted or collected across sources can vary.
From page 58...
... This section describes some of the main data preparation methods. Reusing Attention Avoiding data preparation tasks by capturing results of previous work is the first step in reducing an analyst's load.
From page 59...
... This risk may not be tolerable for many of the sensitive and confidential sources in this domain. Data Assessment, Cleaning, and Transformation For structured data, such as relational databases,2 there is a wide range of mature commercial tools for data preparation, especially in connection with data warehousing and data integration activities (Rahm and Do, 2000)
From page 60...
... However, not all imputation methods are suitable for applying at the data preparation stage; rather, they are applied as part of analysis. For example, multiple imputation constructs several data sets from an initial data set with missing values, then runs the analysis on each and combines the results (Enders, 2010)
From page 61...
... Davenport and Harris (2007) define data analytics to be the "extensive use of data, statistical and quantitative analysis, explanatory and predictive models, and fact-based management to drive decisions and actions." Using the often-cited diagram in Figure 4.2, they go on to characterize analytics as starting with statisti 3  These discussions occurred during the committee's meetings, site visits, and follow-up discussions with relevant individuals.
From page 62...
... (2010) to comprise descriptive, predictive, and prescriptive analytics, where descriptive analytics is defined as a set of technologies and processes that use data to understand and analyze an organization's performance; predictive analytics is defined as the extensive use of data and mathematical techniques to uncover explanatory and predictive models for an organization's performance as represented by the inherent relationship between data inputs and outputs/outcomes; and prescriptive analytics is defined as a set of mathematical techniques that computationally determine a set of high-value alternative actions or decisions given a complex set of objectives, requirements, and constraints, with
From page 63...
... defined analytics as comprising descriptive, predic­ tive, and prescriptive analytics, focusing on predictive analytics based on methods from data mining, machine learning, and statistics, and on prescriptive analytics based on methods from stochastic modeling and mathematical optimization.4 Following in kind, this section presents data analytics methods in those same terms. As described in detail above, data cleaning and linking challenges need to be overcome before any analysis can occur.
From page 64...
... . and subject to various constraints using stochastic models of uncertainty, Going beyond an inferential data analysis, but it affects your average risk.
From page 65...
... While there are a variety of ways to do this in a pictorial or graphic format, visualizations using bar charts, box plots, and scatter plots are common approaches. Predictive Analysis Predictive analysis goes a step beyond descriptive and exploratory analysis by extracting information from data sets to determine patterns and predict future outcomes and trends.
From page 66...
... ; dimensionality reduction to simplify a multidimensional data set; clustering input data into cohesive groups; multi­ ariate querying to find the objects most similar to each other or a v particular candidate; and density estimation of an unobservable underlying density function. Linear Regression Linear regression is a widely used approach to modeling the relationship between a dependent variable and one or more explanatory variables using linear predictor functions, where unknown model parameters are estimated from the data.
From page 67...
... . Nonlinear Regression Nonlinear regression is a form of regression analysis in which observational data are modeled by a function that is a nonlinear combination of the model parameters and depends on one or more independent variables.
From page 68...
... The goal of classification trees is to predict or explain responses on a categorical dependent variable; as such, the available techniques have much in common with the techniques used in the more traditional methods of discriminant analysis, cluster analysis, nonparametric statistics, and nonlinear estimation. The flexibility of classification trees makes them a very attractive analysis option (Hill and Lewicki, 2006)
From page 69...
... . Text Analytics Text analytics (also called text data mining)
From page 70...
... , is a form of predictive analytics useful for scenario development and analysis and what-if studies. As with all forms of predictive analytics, it relies heavily on a descriptive understanding of the underlying population
From page 71...
... Simulation by itself can never say anything about the quality of the solution -- it can only provide statistical analyses of possible outcomes for a fixed set of prespecified decision parameters. Prescriptive Analysis The role of prescriptive analytics is to provide recommendations in support of decision-making processes, where the objective is to determine a set of decisions and/or actions that gives rise to the best possible results based on various outcomes predicted by predictive analytics and subject to various constraints.
From page 72...
... models of uncertainty, where the goal is to model different aspects of the decision-making problem and their various sources of uncertainty, often building on top of the results of predictive analytics. The second area concerns mathematical optimization of decisions within the context of stochastic models of uncertainty of different aspects of the problem, where the goal is to either (1)
From page 73...
... The domain knowledge needed for this area spans stochastic processes, probability theory, s ­ tochastic modeling, and simulation theory. The interplay of stochastic models and data creates critical and complex dependencies.
From page 74...
... More specifically, a general formulation of a multi­ period decision-making optimization problem can be expressed in terms of minimizing or maximizing over time an objective functional of interest subject to various constraint functionals. The time-dependent objective functionals and constraint functionals define the criteria for evaluating the best possible results over a given time horizon with respect to the timedependent decision variables and other dependent variables, where these and related variables are based on the stochastic models of the system of interest.
From page 75...
... , where both the decision variables and the system are adapted to this filtration. Hence, mathematical optimization generally renders solutions that identify a set of decisions or actions at the start of the time horizon or identify a set of dynamic decision-making policies for dynamic adjustments to decisions or actions throughout the time horizon adapted to filtrations, in both cases having the goal of achieving the best possible results within the context of the stochastic models of the system of interest and subject to various constraints.
From page 76...
... Another form of mathematical optimization for prescriptive analytics concerns an elevated form of what-if and scenario analysis that explores optimal solutions across a spectrum of objectives, constraints, conditions, and other aspects of the problem formulation. As an illustrative example, a mathematical optimization under uncertainty method can be used to determine the solution of an instance of the single-period optimization problem above that maximizes the objective in expectation under a given set of standard deviations for the random variables involved, and then this step is repeated for different sets of standard deviations.
From page 77...
... Optimal Solutions It is important that prescriptive analytics methods provide the user with some basis for understanding why the optimal set of decisions or actions provided will give rise to the best possible results subject to the specified constraints. A key element for doing so involves providing the user with the predicted outcomes from predictive analytics for the optimal set of deci­ ions or actions, as well as the predicted outcomes for alternative sets s of decisions or actions for comparison.
From page 78...
... 2003. Exploratory data mining and data cleaning.
From page 79...
... 2005. Regularization and variable selection via the elastic net.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.