Vision for K–12 Data Science Education and Outcomes
Data science is in its infancy, and there is an opportunity to shape it in a way that will best serve students and future citizens. Given this early stage, there is a need to think deeply and creatively about how data science looks within K–12 and how it might develop in ways that do not replicate the disparities observed in other fields and disciplines. Moreover, to ensure that all students have access to high-quality programs and are able to make sense of the data around them, sessions were designed around the following questions: Why are we interested in data science at the K–12 level? What outcomes do we want for students, for the practice of data science education, etc.?
CREATING A VISION OF “HIGH-QUALITY” K–12 DATA SCIENCE EDUCATION
In this session of the workshop, speakers and participants explored what defines a valuable learning experience for students, what research has revealed about successful and unsuccessful curricular interventions, and how these learnings can be articulated into policy and practice. M. Wilkerson moderated the session by posing a series of questions to the panelists. A Q&A session with workshop participants followed the moderated session.
Importance of Data Science Education
Why do we think data science education is important? Why are we all here today?
The level of data knowledge that is now required is “historically high,” said Rob Gould (University of California, Los Angeles). For example, the p value crisis in the statistics world1 demonstrates the need to adequately prepare students and practitioners to understand and use the fundamental tools of data analysis. Undergraduate students are coming into college with no foundation to build on, he said, which makes teaching introductory statistics challenging. At the same time, we live in a world where nearly every interaction—both personal and professional—is mediated through data. This situation presents both “perils and promises,” said Gould. The peril is that ignorance of data is harmful to individuals and society. The promise is that even a small amount of knowledge about how to make sense of data and how to communicate about them can be “tremendously empowering.” It is critical to begin to develop data acumen at the youngest ages and to include all students in these efforts. At this point, he said, “we are a long way from building [data knowledge] successfully and at the level that we require.”
Josh Recio (Dana Center) agreed and added that the way math education has been conducted has historically excluded and harmed some students, specifically Black, Latino, and lower-income students. The one-size-fits-all approach to math education has sent a message to these students that they do not belong and that they cannot be a “math person.” To address this issue and to reach all students with data literacy efforts, he said, it is essential that math education be presented in a relevant and useful way. For example, if students can see explicit links between math education and the world around them, they may be encouraged to see themselves as “math people” and pursue further education in the area. Recio said that data science presents an opportunity to make these types of linkages for students, and that using data science to make math more relevant makes it possible to change the message about “who belongs in math courses and who can be successful.”
There are three overwhelmingly important reasons to address data science in K–12 education, said Alfred Spector (Massachusetts Institute of Technology). First, our ability to sense, process, and promulgate data has changed the world, and this trend will continue to accelerate and have an enormous impact. With this as the backdrop, said Spector, data science is a critical part of K–12 education. Second, data science is informed and used by a wide variety of disciplines; incorporating data science into education provides an opportunity to challenge students and
1 For more information, see Amrhein and colleagues (2019).
enrich disciplinary education. Third, data science—including the use of computational models—is becoming ever more dominant. Whether students eventually go into computer science, STEM, health, public policy, economics, or other fields, Spector concluded, there is a need for everyone to be able to understand and critically think about the use of data and the conclusions drawn from these data.
The Next Generation Science Standards (NGSS),2 said Tricia Shelton (National Science Teaching Association), do not address data science specifically. However, they emphasize that students are expected to explain their world, to develop scientific explanations, and to design solutions to problems. As other speakers have noted, the problems and phenomena of our world are now complex and often involve large amounts of data. Empowering children to make sense of the rapidly changing and complex world can give them agency and an ability to be critical consumers and critical decision makers. By integrating data science into all disciplines—not just science—students will be prepared to have an impact in whatever they choose to do. Today’s K–12 students are the future leaders of our world, added Trena Wilkerson (Baylor University; National Council of Teachers of Mathematics). Helping students to understand and be able to use and critique data science is critical for giving them a voice and preparing them to lead.
Identifying “Success” for K–12 Data Science Education
Given these reasons for why data science education is important, how can we be sure we are doing it well? What does it mean to be successfully educating students in terms of what they learn? Who ends up participating in data science, and what will data science look like in the future?
Spector noted the “dual nature” of data science education. One side is the mathematical, computational aspect of data science; the other is understanding the role of data and how to think critically about data. The second aspect is extremely important for all students and can be infused across all disciplines. Data are widely available, but unfortunately, he said, there is “very little understanding and care” about the proper use of data, (e.g., the difference between correlation and causation). Spector mused that improper use of data to “make a point” is one of the reasons for the divisiveness in society today. There are ample opportunities throughout the K–12 curriculum to draw attention to this issue and to teach students about data. Whether students are preparing to enter STEM fields, pursuing other educational degrees, or entering the workforce after high school, data science education is critical to provide them with fundamental tools.
2 For more information, see https://www.nextgenscience.org/
One of the driving motivations for teaching data science in K–12 is the need for a data-literate workforce, said Recio. Ten or 15 years in the future, he said, we will be able to assess these efforts by looking at whether employees and students in higher education are coming in with the preparation and knowledge they need to succeed. However, he said, it will also be clear that efforts in this area have failed if there are clear lines between students who benefited and students who were harmed, in particular if there are lines between racial/ethnic groups or between income groups. It is important, said Recio, to carefully consider what is defined as “success” and for whom. T. Wilkerson added that one way to monitor whether data science education efforts are successful is to look at the pathways that are developed. She explained that pathways that carry students from pre-K through higher education could be inclusive and ensure that all students have opportunities or could create further barriers and reify structural issues. There is a need to continually assess the pathways for data science education to find what is working and what is not working; this assessment is particularly important because data science cuts across all disciplines.
Opportunities for Data Science in K–12 Education
Where does data science fit in K–12? Does it fit in STEM or other courses, or how should it be integrated?
“Data science should be integrated everywhere,” said Shelton. Two of the key elements in science education are relevance and authenticity, she said; studying complex and relevant phenomena helps to keep students engaged and interested. When considering data science education, the approach should be the same. When data science is integrated with other disciplines, and the questions about data are interesting and relevant to students, “magic in a classroom” can happen. One way to accomplish this, said Shelton, is to partner with those who have expertise in data science to develop instructional materials and approaches for integrating data science in the classroom.
Gould noted that although data science can be integrated across disciplines, it is also critical to focus on the “fundamental core” of data science. Looking at data science can be like looking at an elephant, he said; people from different disciplines and industries may all see a different part of the elephant. When implementing data science education into K–12, there is a need to focus on the essential ideas and approaches that apply to all disciplines. Gould argued that data science needs to be presented as a “subject worthy of itself” as well as explored in the context of other subjects.
Implications for Policy and Action
What recommendations do you have for this workshop? What should be done?
One critical step, said Gould, is to build bridges to the university level. If a high school student takes a statistics course, how does this course fit in with future studies in college? Are there opportunities for students to build on their learning and integrate it into coursework in computer programming, the use of algorithms, and problem solving? Gould said there is a need to build a curricular bridge between K–12 and higher education, and to develop clear and explicit pathways for learning. There is also a need to build a bridge across the K–12 grade levels; although there are courses, he said, we need a coherent scaffold.
The interdisciplinary nature of data science, said Recio, demands “a collective movement.” Stakeholders across multiple disciplines and from the education community, the workforce, and practitioners need to work together to organize and communicate about how to move data science education forward. Further, these stakeholders need to bring what they’ve learned back to their own institutions and push for change. T. Wilkerson challenged workshop participants to consider two areas in which progress needs to be made. First, which of the stakeholders and collaborators need to be brought into conversations? How could bringing in the voices of community members, families, students, and others help move data science forward? Second, said T. Wilkerson, it is critical to examine policies and practices that are inhibiting these efforts, and to identify ways to eliminate or mitigate them. For example, K–12 educators are unlikely to have the time necessary to sit down and collaborate and develop data science curriculum; mitigating this challenge may require providing teachers with more time or engaging more stakeholders in the process. Shelton added that professional learning will be a critical part of implementing data science education in K–12; research has found that many science teachers are hesitant and uncomfortable with the topic. Partnerships between science educators and those with expertise in data science can also be essential in supporting educators in moving data science forward, she said. In those partnerships, everyone is both “learner” and “contributor,” bringing both the science expertise as well as learning more about data science and vice versa.
In considering how to move data science forward, Spector encouraged workshop participants to use a “divide and conquer” method. Focusing on making progress on specific smaller goals may be easier than trying to solve every problem simultaneously, he said. For example, there are a large number of teachers who are untrained, educational districts have different requirements, and parents and communities have their own views and priorities for education; solving these issues simultaneously would be “quite
hard.” One area in which there is tremendous opportunity, said Spector, is in using computational technologies to make education immersive, adaptive, and “actually fun.” Although technology cannot solve every curricular issue, there is an opportunity to infuse data, analysis, programming, and statistics into the curriculum to improve efficiency and learning.
How are we defining “data science”? How does it relate to data literacy? Is data science its own emerging discipline or a set of tools that lives across multiple disciplines?
There is a false dichotomy between teaching data literacy and data science, said Gould. Before teaching data science—and specific tools such as machine learning—students need a foundational understanding in data literacy. Teaching the skills before the understanding, he said, is “putting the cart before the horse.” Gould noted that he prefers the term “data acumen” rather than “data literacy.” Spector added that while data science has become its own discipline, it is built upon multiple other disciplines. Although it is important to teach a holistic and unified view of data science, it is also critical to provide opportunities for students to gain an in-depth understanding of the underlying disciplines (e.g., statistics, computation).
Given the siloed nature of courses in middle and high school, how do we ensure that data science occurs across all disciplines in parallel with learning the basics of statistics and data analysis? How can all teachers participate in adding elements of data science into their teaching?
T. Wilkerson responded that data science is a “perfect venue” for collaboration across disciplines. Although this type of collaboration is unfortunately not common, it is essential in this area to show students the transdisciplinary nature and importance of data science. To facilitate collaboration, time and opportunities for collaboration need to be provided to teachers throughout their training and careers. Recio added that moving data science forward will require investing in teachers; teachers need training and time to learn how to teach data science and to understand the ultimate goal for students. “If we don’t make that investment” in preservice and in-service training, said Recio, we will struggle to implement data science in K–12 education.
How do we address the tension between the rapidly changing world of data and technology, and the slow-to-change world of education?
Although technology itself changes quickly, said Gould, the fundamentals do not. Recio agreed and said that he is often asked which programming language teachers should use in the classroom and his answer is that it
doesn’t matter. Technologies are constantly changing but exposing students to any technology will help them moving forward. He added, however, that it is important to prepare teachers to be able to confidently work with available technologies and tools, and that they need continuous learning and support in this area. Gould said that the “elephant in the room” is the cost of technologies; obtaining and updating technology for the classroom is “extraordinarily expensive.”
A VISION FOR K–12 DATA SCIENCE LEARNING AND OUTCOMES
This session of the workshop was designed to explore evidence on learning and critical data literacy to consider what students should be able to do with data and how outcomes should be measured. Before the start of the session, participants were broken up into small groups to discuss outcomes for data science education. Discussions centered on both student-level outcomes (e.g., developing specific skills and proficiencies, developing interest or disciplinary identity) and outcomes related to policy and practice (e.g., access to opportunities, funding). The session, moderated by Horton, included two invited speakers and commentary by an invited panelist, followed by discussion with workshop participants including a summary of the small group conversations.
K–12 Data Science Learning Through the Lens of Agency3
“What do we know about data science learning?” asked Ryan “Seth” Jones, Middle Tennessee State University. This is a difficult question to answer for two reasons. First, data science is an emergent and diverse field. Many different communities are generating and revising knowledge using data science ideas and practices, and technologies and methods change quickly. Second, research on data science learning is happening in diverse contexts, including K–12 classrooms and undergraduate classrooms, and across diverse theoretical traditions. For these reasons, Jones said, it may be difficult to synthesize the research on student learning.
Jones told workshop participants about what he and his colleague found when examining the body of research on student learning. Through the landscape review, of the papers identified, it was found that researchers in this area privilege different types of agency within data science learning: material agency, personal agency, and disciplinary agency. Material agency,
3 For more information about the methods and resources described throughout the presentation, see https://www.nationalacademies.org/event/09-13-2022/docs/DD667E469D0EC5DD91A7D85BC839A9852491A3CF9F15
Jones explained, reflects the idea that the world does things and is “hard to wrangle.” He noted that this includes worlds that emerge from computing tools. Personal agency describes the way that individual people respond to the world, whereas disciplinary agency occurs when communities create a new form of collective agency that constrains the choices of individuals. Jones clarified that these are not mutually exclusive categories, but that in general, most studies highlight one type of agency over the others.
Studies that highlight material agency, said Jones, place emphasis on the role of data science in giving students the opportunity to learn fundamental ideas about variability, measurement, sampling, etc. Material agency supports learners to understand the context through which data came to be, and it can be productive for students to describe sources of variability, including variability due to measurement, the phenomenon under investigation, and random noise. Studies that focus on the role of personal agency, in contrast, highlight different aspects of learning. First, they recognize that learners possess epistemic (knowledge-related) assets and that it is important to recognize these assets. Second, these studies focus on the ways in which students can be positioned to see data as personally meaningful, can construct coherent data stories, and can develop data science approaches that resemble disciplinary approaches in meaningful ways. Jones noted that personal agency is not a “magic bullet.” It takes a lot of work to design environments and to scaffold and sequence learning so that learners can exert personal agency and also make progress toward generating answers to the questions on which they are working. Finally, said Jones, research that prioritizes disciplinary agency emphasizes the roles of computing and highly valued disciplinary procedures and tools. For example, many studies examine specific programming languages that are valued in the discipline (e.g., R and Python). These studies find that students can learn to use particular tools valued in a discipline when supported through careful instructional design.
Examining these different types of agency, said Jones, led to several questions and considerations about how to move forward with the work of K–12 data science education:
- Should different forms of agency be prioritized depending on where the work is oriented?
- What is the role of systems-level organizations and support to synthesize the work?
- What new communities and voices are needed?
- How much weight should be given to the different dimensions of data science learning?
- Whose voice is highlighted in research on data science learning?
Critical Data Literacy: Creating a More Just World with Data4
People have long called for stronger data literacy in the general population, said Josephine Louie (Education Development Center, Inc. [EDC]). Large-scale surveys of business managers and the public have found that up to 80 percent of people have little confidence in their abilities to make sense of data. We now live in a world of “big data,” she said, which presents new questions about privacy and power. Powerful actors are collecting massive amounts of data from the public constantly, with or without consent, and these data are used to predict and modify behavior. Many writers have described the harms that arise when our data are used to decide “who we are, what we will buy, where we will go, what we will do, what messages we will gravitate toward, and what we will believe.” Within this context, said Louie, there are growing calls for critical data literacy.
General data literacy is anchored in statistical literacy and requires an understanding of and fluency with the data investigation process (Figure 2-1). This process involves formulating questions that can be answered with data, assembling data to address these questions, using statistical and other tools to analyze the data, drawing inferences, summarizing conclusions, and communicating findings to various audiences. Data literacy also involves familiarity with multivariable data and reasoning, said Louie, because large-scale data and social and natural phenomena are multivariable. In addition, the process involves constant questioning throughout all stages about the “who, what, when, where, and why” surrounding the data. Notions of internal and external validity of any claims based on data depend on answers to all of these questions, she said.
In the world of big data, there is a need to expand beyond these statistical foundations. Critical data literacy expands on data literacy to include an understanding of when and how data are collected from us, an understanding of algorithms and what they do, and a consideration of the ethical impacts of data collection and data-based decisions. Further, critical data literacy requires awareness of unequal power structures in society and how data can be used to perpetuate or worsen social inequality.5 Louie said that “there is a tradition in educational research and practice that looks critically at unjust structures in society and the role that education and data may play in questioning and countering these structures to create a more just world.” Applying these ideas, said Louie, critical data literacy can be seen as “learning to read and write the world with data.” Reading the world
4 For more information about the methods and resources described throughout the presentation, see https://www.nationalacademies.org/event/09-13-2022/docs/D16254F310D01BBDA873920E4EFB8151F2D8334181AA
5 Books such as Data Feminism, Weapons of Math Destruction, and Algorithms of Oppression make these arguments.
with data, explained Louie, could mean using data to raise students’ critical consciousness of social and political inequalities in the world. Writing the world with data could mean helping students to build social agency to take action, both individually and collectively, against unjust social arrangements. Part of this work, said Louie, is using data to help non-dominant groups build pride in their own cultural and social identities.
Louie described some takeaways from a commissioned paper she wrote for the workshop.6 In the paper, she highlighted eight examples of interventions aimed at promoting critical data literacy in K–12 education. Four of the studies, she said, could be seen as helping students to read the world with data. Several of these studies took place in middle and high school mathematics classes, where students examined large-scale social and economic datasets to uncover persistent and large-scale patterns of inequality in housing, income, and the local lottery. Another study took place in a high school summer technology program; students were able to read the social media posts and private information of everyone within a
specific geographic location, allowing them to see how technologies can violate people’s privacy. The other four studies, said Louie, could be seen as helping students write the world with data. These studies took place in science classes, social studies classes, media art classes, and a public library. Students used data to envision and propose how to improve local public spaces, to craft and relay stories about their own family histories, and to share an aspect of their own personal lives expressed artistically through the design of t-shirts. Outcomes reported by these studies included more critical perspectives of society and awareness of different forms of social and economic inequality, greater agency in authoring with data, and greater feelings of power to shape and control what data are shared or hidden.
These studies also reported that there were challenges involved in promoting critical data literacy, said Louie. For example, critical sociopolitical views may not arise naturally or easily, due to student resistance and persistence of existing views. In addition, discussing social and political inequality can be difficult, with the potential for invoking feelings of disempowerment and negative stereotypes of non-dominant groups. These discussions can be particularly harmful to students when teachers are not trained in how to conduct them well, she said.
In conclusion, Louie said that learning to read and write the world with data may be a helpful framework for promoting critical data literacy because reading and writing with text are already fundamental and familiar goals in K–12 education. The examples that Louie provided suggest promise in using data to build awareness of unequal social structures and build agency in authoring one’s own stories and social visions. However, questions and considerations remain. First, most studies did not report a strong focus on quantitative reasoning. If critical data literacy can be separated from quantitative reasoning, she said, we need to consider how it differs from critical information literacy or critical media literacy. If these competencies cannot be separated, there is a need for a framework that builds both competencies in a coordinated manner. Second, Louie said that promoting critical data literacy is an interdisciplinary effort, and there is a need for spaces and support for educators to work across disciplines. Third, most of the examples that Louie found were with small groups of students. If critical data literacy is important, there is a need to find ways to scale up interventions and tools to assess progress. Finally, said Louie, there is a need to determine whether and how interventions designed for certain groups will need to be adapted for other groups. For example, she said, there may be resistance among White students or parents when data are used to critique existing power structures.
Panelist Reflection: K–12 Data Science Learning and Assessment
Following the presentations by Jones and Louie, Jo Boaler (Stanford University) offered her reflections on the topics discussed and how they related to the issue of assessment. Boaler concurred that the presentations covered topics that are essential aspects of data science. A framework of agency highlights the importance of students asking questions, building models, interpreting data, and communicating, whereas critical data literacy focuses on students’ ability to understand and shape their world using data. When considering how to assess the effectiveness of data science education, said Boaler, it is important to capture these competencies and practices rather than simply assessing content knowledge. For example, she said, it is fairly simple to evaluate a student’s knowledge of statistics, programming, or multivariate reasoning. However, concepts like agency and critical data literacy are “the essence of data science” and if assessments do not capture these practices, she said, they will misrepresent data science and threaten the achievement of equitable outcomes.
Assessments that focus on capturing practices rather than merely content knowledge are critical to equity, said Boaler. For example, a large part of the variance in SAT math scores is predicted by race, once achievement is factored out. In contrast, the Smarter Balanced7 assessment for mathematics (which permits the use of a calculator) captures the ability to reason and use mathematical practices. When this assessment is used instead of the SAT, it results in a more socioeconomically and racially diverse group of students going forward. These assessments “really matter,” she said. For assessments to capture the “essence” of data science and to prioritize equitable outcomes, assessments must capture practices and competencies such as agency and critical literacy rather than merely evaluating content knowledge, said Boaler.
Participant Report Out and Q&A with Panelists
Following the reflections from Boaler, a subset of workshop participants discussed their conversations, asked questions of the panelists (Jones, Louie, and Boaler), and reflected on emerging themes. To set the stage for discussion, workshop participants contributed their answers to the question: “What are some desired outcomes for K–12 data science education?” A word cloud was made with their answers (Figure 2-2).
Zarek Drozda (Data Science 4 Everyone, University of Chicago) began by summarizing themes that emerged from his group’s discussion. First, he said, several group members emphasized the importance of explicating
the motivations for teaching data science so that they can be critiqued and adapted over time. Second, multiple individuals stated that data science is essential not only for students entering tech careers but for all students to be educated and productive members of society. Students need to be able to understand how and when data are collected, how to analyze and critique data visualizations, and how to use data to make decisions and shape their world. M. Wilkerson shared two insights from her group. First, individuals discussed the importance of developing assessments that capture a broad range of student outcomes. Second, there was discussion about how to ensure that interested students are prepared to pursue data science at the college level, while not prematurely cutting off other students from pursuing this path. One idea that was floated, said M. Wilkerson, was developing summer bridge programs at universities to ensure preparedness. Horton commented that this idea points to the need for collaboration and coordination of data science curricula across multiple levels of education.
Relationship to Other Education Movements
A virtual workshop participant commented that if you look at computer science standards, the end goal is often working with real-world data; she asked whether this goal is consistent with the goals of the data science education movement. Horton said that whereas some aspects of computational thinking are different from aspects of data science, there is a “lot of overlap” and the ideas are often mutually reinforcing. Drozda added that
there are opportunities for collaboration between computer science and data science; for example, a computer science educator can serve as a resource for other teachers to help with programming. Creating more explicit spaces for collaboration between educators working in related fields will be critical for moving data science forward, he said. Jones agreed and said that other fields, like science and mathematics, often have specific practices or ideas for developing competencies in students, and that data science education could lean on these experiences.
Louie said that her journey in data science has always been based in real-world data. For her, data was a way for her to understand her identity and how she fits into the world. One of Louie’s projects involves bringing the U.S. Census, particularly American Community Survey data, into math and social studies classes in order for students to ask questions and engage with the data. Louie said that the students in these classes—many of whom are immigrants and/or from non-dominant communities—welcomed the opportunity to learn about issues such as income inequality and immigration. Many said it was the best class they’d ever taken, said Louie, because the data are being used to answer questions and delve into topics they care about.
Eileen Manchester (Library of Congress) shared some information and resources about Indigenous data sovereignty. She said that this topic is critical for those working in data science education and encouraged workshop participants to read works by scholars including Stephanie Russo Carroll, Desi Rodriguez Lonebear, Lynn Lavalley, Eve Tuck, and Linda Tuhiwai Smith. These works, she explained, explore the issue of Indigenous and tribal groups having control over the data that are collected about them and retaining governance over such data.
Relationship to Citizen Science
The ideas explored in this workshop, said Allissa Dillman (Biodata Sage), echo the concept of “citizen science.” Citizen science projects, she explained, involve students going into their communities, collecting data, interpreting the results, and potentially advocating for changes. For example, a group of middle school students in Maine tested arsenic levels in their community water and testified in the state senate in favor of action. Louie commented that the idea of citizen science is related to the discussions around agency. In a citizen science project, there are multiple opportunities for students to have agency over asking questions, collecting and analyzing
data, and communicating the data. She noted, however, that there is a need for scaffolding and structure to support students in all of these actions, particularly if the goal is for students to develop data science competencies. Jones agreed and said that citizen science can look very different from place to place, and it is important to pay attention to what types of agency are being prioritized. For example, a project in which students are measuring water pollutants may use methods given to them by a local researcher. In this example, said Jones, disciplinary agency is prioritized. While this is not necessarily a bad thing, it brings up questions of what the students are learning about water sampling, and whether they are understanding the various approaches and why certain methods may be used.
This page intentionally left blank.