In the final sessions of the workshop, participants reflected on the presentations and discussions, and identified areas in which more research or work is needed.
On both days of the workshop, a “townhall” session was held to explore and discuss research needs for data science education. The first townhall was aimed at identifying the highest priorities for additional research, and the second townhall was designed to explore how practice can inform current and future research needs. Each session was moderated by a member of the planning committee, who invited in-person and virtual workshop participants to share their experiences, thoughts, and questions.
Townhall Day 1: What Are the Highest Priorities for Additional Research?
At the end of the first day, M. Wilkerson began the townhall session by asking workshop participants to reflect on the discussions throughout the day at the workshop, and to use these as a jumping-off point for identifying priorities for additional research in the area of data science education (see Box 5-1 for a summary of participant-identified priorities).
It is evident from the discussions at this workshop, said Santo, that there are a wide range of motivations, purposes, rationales, and potential outcomes for data science in K–12 education. Rather than treating these aspects like background information and immediately moving on to the
design of programs, Santo encouraged stakeholders to make these an object of inquiry in order to clearly articulate them. For example, what are the different purposes of data literacy and data science, and how do these differences impact the way they are taught and the desired outcomes? It is important, he said, to be intentional and clear about the rationale behind teaching data science and not to take rationales at face value. For example, one rationale for teaching data literacy might be that it would eliminate issues such as vaccine resistance; however, the research on motivated reasoning suggests that this is unlikely to be the case. Making rationales an object of inquiry allows us to then consider how different rationales lead to different designs and institutional arrangements, said Santo; the rationale, purpose, and objectives should completely shape the design of the intervention.
Carol Bertram, a high school teacher in Chicago, offered her request that more work be done to connect math and statistics to social awareness and social justice. Bertram said that she gets “a lot of leeway” in the data science class she teaches, but she currently is on her own to find data, develop questions, and figure out how to engage students. She said she would love to see more prepared resources and curricula for teachers on data literacy, particularly given our experiences during the pandemic.
There has been a lot of discussion about ensuring that datasets and inquiries are relevant, authentic, and important to students, said Nathan Holbert (Teachers College, Columbia University). Several speakers have emphasized the importance of engaging students in inquiries that matter to them, their communities, and the world, he said. The missing piece, however, is enabling students to share their discoveries in a legitimate and impactful way. Holbert noted that when students have the opportunity to present their projects, it is often in a perfunctory manner. There is a need, he said, to make sure that the voices of these passionate learners can be heard in a way that gives them agency to change their world using data.
Data science education requires datasets, said Manchester. Although the Library of Congress has digital collections available, the datasets tend to be raw, very big, or in complex formats that require modification. There are different needs for datasets, depending on whether they are for pedagogical purposes or research purposes. It takes a lot of work to create data that are appropriate for classroom use, and we are still figuring out exactly how to do this, she said. More research in this area is needed.
Coming into this workshop from a background in computer science, said David Weintrop (University of Maryland), it is clear that there are a lot of overlapping challenges, issues, and opportunities between data science and computer science. Some of the questions being asked—about the goals of data science education, the role of tools, the role of industry, and the potential for integration with other disciplines—are questions that were explored for computer science education 10 or 15 years ago. Although data science is distinct from computer science in some ways, said Weintrop, there are also many similarities that offer the opportunity to learn from the experiences in that field. “It would be a real shame” if the field of data science was still struggling to answer some of these questions a decade from now, he said. Weintrop encouraged stakeholders to think critically about where data science education overlaps with other disciplines and where it is unique in order to identify where lessons could be learned. For example, integrating computer science into preservice teacher methods classes has been a struggle; the field of data science can learn from these experiences and lessons.
The work that needs to be done, said Colleen McEnearney (Tuva), is not in convincing teachers to buy in to data science education but in persuading districts and policymakers of its importance. McEnearney said she would have loved to use data science tools and techniques in her years as a high school math teacher, but she felt constrained by all of the requirements and testing that had to happen. In her experience, teachers would be excited to learn how to integrate data science into their classes, but they feel that “their hands are tied” by the state. There is a need to work with districts and policymakers to make room for data science in K–12 education, she
said. Alternatively, it may be time to be “more radical” and advocate as a community for a different system for our students.
Townhall Day 2: How Can Practice Inform Current and Future Research Needs?
At the end of the second day, Horton asked workshop participants to think holistically, broadly, and in a “visionary fashion” about how teaching practice can inform current and future research needs in the area of data science education based on the discussions throughout the day, and then opened the floor for conversation (see Box 5-2 for a summary of participant-identified needs that are informed by teaching practice).
Steve Leinwand (American Institute for Research) began the session by asking to broaden the perspective beyond practice to include experiences, products, and perspectives. Leinwand described himself as a former “hippie math teacher” and a “self-proclaimed mathematics education change agent.” Based on his experiences through the years, Leinwand emphasized
the importance of clarifying exactly what is meant by K–12 data science. He analogized it to the Bay Area Writing Project,1 in which writing is taught from kindergarten through 12th grade and infused throughout. Data science needs to be conceptualized the same way. Data science needs a two-page explanation, he said, as well as a 50-plus-page document making the case for data science and data literacy. These documents need to be created by a broad, diverse group of stakeholders, including researchers, teachers, future educators, data scientists, and mathematicians, he said.
Gould agreed with the need for further clarification about the goals of conceptual understanding and statistics, and the need for a better understanding of issues such as the role of provenance, the synthesis of the investigative process with technology, and the ethical and societal implications. There is a need for a curriculum-agnostic and tool-agnostic assessment of what students are learning when they are learning data science. With this knowledge, he said, we could make a better argument to universities about the value of data science education and how it prepares students for higher education and the workforce. Louie suggested that one way to begin to understand these issues would be to survey existing data science courses to see what learning objectives have been identified. Heidi Schweingruber (the National Academies of Sciences, Engineering, and Medicine) said that along with research on the learning objectives of data science, research is needed to explore the tension between data science as a standalone subject and data science as a topic integrated into other disciplines. Some speakers have advocated for adding data science as a subject into schools, whereas others have advocated for using data science to solve problems and improve learning in existing disciplines. There are learning and teaching questions related to this tension, she said, such as the following: Do the skills and knowledge of data science exist outside of the content being interrogated? If so, to what extent are they abstract, and how can assessments best capture these skills and knowledge?
The impediments to data science education are far larger than what can be addressed in teacher training, said Stephanie Teasley (University of Michigan School of Information). Changes need to be made at multiple levels, including changes to state standards for licensure, district-level changes to curriculum, shifts in how high school counselors and college admissions officers think about data science, and increases in the accessibility of data tools. Until and unless these changes are made, we will be “pushing the rock uphill,” continued Teasley, even if teachers are motivated and prepared to teach data science.
On the topic of teachers, V. Lee said that an important research issue is the current circumstances and lived experiences of teachers with respect to teaching data science. In the two days that participants have been at this
workshop, he said, typical teachers “have seen 160 students and put out 250 fires.” They are being asked to do socioemotional learning in the midst of a pandemic and recovery and, in some places, to do hybrid instruction without extra pay or support. Now we are asking them to integrate data, computing, and technology into the classroom, said V. Lee; how much are we asking, and can we support teachers in these efforts? Horton agreed that this is an important point and said that we need to consider what will be sustainable, what supports are needed, and where the point of exhaustion is.
“This is just the beginning of our work together,” said Nancy Lue (Valhalla Foundation). From the perspective of a philanthropist, Lue said that she sees a number of opportunities to chart a path forward for data science education. She shared three key takeaways from the workshop that are relevant to the issue of funding. First, the field needs more evidence on what is working and how to scale it. Lue noted that while the K–12 data science movement is quite nascent, there is a lot of momentum and many promising innovations; there is a need to accelerate research to better understand what is working, develop a clear evidence base of the impact, and scale the promising practices and programs to ensure that all students are able to succeed in a data-rich world. In particular, she said, there is a need to advance research on assessments, on student-level outcomes, and on the best ways to prepare and support educators. Second, the “promise and perils” of data mean that we must start developing data literacy skills for all students, starting with the earliest learners, using a student-centered approach that spans all of K–12. This may include standalone data science courses, data science that is integrated into other courses, or both. In addition, there is a need for a clear bridge to postsecondary education. Third, said Lue, as the field evolves, the needs of the most marginalized students need to be kept at the forefront, and programs, resources, and tools need to meet the needs of our diverse student population in an inclusive and equitable manner. Lue said that this is an “exciting juncture” for the field of data science education and invited two other funders to make remarks.
Ulrich Boser (Schmidt Futures) said that Schmidt Futures, which focuses on raising and supporting talent to solve pressing problems, has made several investments to improve the pipeline for data science. One of the interesting things about the data science field, he said, is that people can be considered experts at a young age, purely because of their expertise and abilities. This is clearly very different from other fields where a Ph.D. is the bare minimum to be considered an expert. Boser identified two key challenges that need to be addressed in data science in order to make progress.
First, he said, how can these “incredibly exciting” demonstration projects be scaled across the nation? Second, there is a need to bring together people from diverse backgrounds—including those from both sides of the political aisle—in order to come to a shared understanding of data science and grapple with how to implement it in K–12 education.
The California Education Learning Lab, said Lark Park (California Education Learning Lab), is a state-funded higher education grant program that funds faculty doing innovative work to improve learning outcomes and close equity gaps at the three large public systems. Most of the funding has been dedicated to improving STEM education at the postsecondary level, which included some data science projects. There is a lot of overlap between the discussions at this workshop and the discussions happening at the Learning Lab, she said, and there is a close relationship between K–12 education and higher education. For example, as other speakers have noted, higher education sets expectations and opportunities for students pursuing postsecondary pathways by setting admissions standards and giving credit for courses. Park said that she looks forward to deepening this relationship and working together. One of the most exciting things about the workshop, she said, has been the emphasis on student agency and student voice, and the acknowledgment that data science presents an opportunity for students to understand, critique, and engage in the world. Park shared a recent grant opportunity from the Learning Lab that focuses on data science. After offering general grants for STEM, as well as specific grants for calculus reform, the Learning Lab became interested in creating an opportunity for data science because of its importance to other disciplines, the workforce, and productive citizens of the world. The Learning Lab announced a Grand Challenge in Building Critical Mass for Data Science2 to help create an inflection point in data science education. There are multiple opportunities available, including opportunities for faculty development, pathway development, and interdisciplinary collaboration. While this grant is targeted at California institutions, California is a big state and “what we do here can matter.” Park said that university and community college faculty are the target audience, but it is strongly encouraging collaboration with high schools.
In the closing session of the workshop, planning committee members offered their reflections on the discussions held at the workshop, identified key takeaway messages, and identified possible next steps for the field. H. Lee began the reflection session with an observation that the idea of co-design was frequently mentioned during the workshop, and she encouraged
stakeholders to consider ways to bring teacher voices and student voices into their work. When planning teacher professional learning, teachers can be brought in as co-designers and become part of the team. When designing a tool for educational use, the student and teacher experience with using the tool should be kept front-and-center.
Clegg asked, “What is the end goal of data science education?” There are several overlapping and discrete goals, including training students to be data scientists and helping students become data-literate members of society. The broadness of data science requires developing a broad view of what counts as expertise and pulling from a diverse pool of expertise to support learners and teachers. For example, said Clegg, teachers from different disciplines, librarians, community members, and artists all have expertise that can be used to create learning opportunities for students. This diverse pool can also serve as resources, support, and collaborators for educators. Clegg said there is a need to think about new models of student learning and professional development that are able to leverage this expertise, both in and out of the school context.
Erickson gave workshop participants a brief “optimistic historical perspective.” He said that the field of data science has been around for only about 10 years. To people of a certain age, 10 years “is nothing.” But looking around the room, he noted that there were many young people to whom “10 years is a lifetime.” This is “fantastic” because it means that young researchers and young educators are getting involved in the field; he said he was “so glad” to see them there.
As one of the youngest people in the room, Drozda said that it has been an incredible honor to be involved in these conversations. He offered three observations about K–12 students and presented a brief framework of three practices to consider. First, he said, students today are inundated with data, from TikTok to Twitter. It is not a winning proposition to take away their phones or laptops because “you can’t win.” These are not distractions; instead, they are the tools of the future. Second, Drozda said that the cost to acquire information is “close to zero today,” which means that simply memorizing facts or formulas is not very valuable for students. Third, with the immense quality of online content consumed by children today, the bar for what is relevant or authentic has risen quite high. Drozda said that students have an “increasingly precise radar” for what is relevant versus what is corny, fake, or overly modified for the classroom. Throughout the workshop, he said, we have seen examples of how relevant, authentic data can engage students. Drozda offered three practices to keep in mind as new programs are considered. First, prove relevance to students up front, and tell them explicitly why the content matters. Second, pick topics that are not just relevant but also recent; “we need to be thinking of last week or last
night.” Third, pacing is critical. Drozda said, “We have to create an aura of speed while trying to engage in content depth.”
What has been striking about this workshop, said Matuk, is how many learners and teachers across various subjects and contexts are already engaging in data literacy and data science, whether or not they are aware of it. In addition, data science is already naturally integrated in the professional work of many fields, including biology, social science, and journalism. Instead of trying to find places for data science in classes, Matuk suggested that stakeholders consider how to make learning experiences “more authentic to those domains in and of themselves.” That is, instead of teaching data science for its own sake, she said that educators should integrate data science because it is an integral part of the domain that is being taught.
Horton shared his reflections on the two days of the workshop. First, he noted that there is “tremendous excitement and enthusiasm” in the area of data science; although there are huge challenges ahead, we have the energy to tackle them, and this is the time to make a difference. Horton emphasized that one area that needs to be thought about is the levels at which students should be prepared to do certain things. From an academic statistics standpoint, the Ph.D. is the “pinnacle of some kind of pyramid,” and there has been work on building master’s programs as well. However, Horton said there is a need to think about how to prepare students with bachelor’s or associate’s degrees as well; he stated, “We can’t afford people to go through four years of expensive higher education and not do useful things.” Another opportunity, he said, is to do work on defining data science and associated terms like data literacy, data fluency, and data acumen. Using existing learning outcomes to begin to shape these definitions would be a good start, he said. Teacher preparation is a major challenge that will require major changes such as breaking down siloes, building new partnerships and bridges, and aligning incentives. In all the work that lies ahead, Horton emphasized that accessibility, diversity, equity, and inclusion must be at the forefront. In particular, the data divide must be addressed; it is critical to the future of the country and the work we are doing. On this point, he shared two quotes from the National Science Foundation report Keeping Data Science Broad:3
Consequently, as data-driven decision-making becomes more commonplace, having the skills to understand and make sense of data can provide a sense of power to the larger citizenry or conversely powerlessness to communities without these skills. This “Data Divide” separates communities that have access to devices and services that provide rich, data-driven
services from those that don’t; it separates data-savvy individuals, and communities that have understanding and awareness of how their data is being collected and used to provide individualized services (and thus informed protections), from those that do not. The economic and social consequences of the Data Divide stratify populations, and severely limit the opportunities of those who are unable to take advantage of the data revolution. (Page 7)
If we do not make diversity and inclusion a priority now, we will not have it in the future. We do not want to repeat the mistakes of the past, so we must reverse the trend for the growing divide to make and keep data science broad. Diversity will bring a lot of ideas and voices to the table, which may lead to significantly fewer models producing biased results when trained using algorithms on biased data sets. (Page 30)
M. Wilkerson closed out the workshop with her reflections. First, she emphasized that there are multiple types of data expertise and multiple types of data experts. Whereas there is a pattern of talking about those experts with a Ph.D. in statistics, or those who end up at Google, other professionals—including journalists, biologists, and geoscientists—are deeply engaged with data. She encouraged stakeholders to think about these different areas of expertise and how these pathways can be made visible and accessible to students. “When we broaden what it means to be successful in data science,” she said, it invites students in and broadens their possible futures. M. Wilkerson’s second point was to rethink what is meant by success and failure. Although we want to avoid the false promises of the power of data, she said, we want to present a nuanced view of how data fit into the advocacy toolkit, along with tools such as storytelling and organizing. Finally, collaboration is critical for the changes that are needed in this area. This includes collaboration among teachers; across levels of education administration; and with community members, librarians, practicing scientists, and others. Building and sustaining these collaborations will require incentives and supports, she said, such as time, encouragement, and opportunity for teachers to pursue the skills and knowledge that it takes to teach data science to their students.