Current Landscape of Data Science Education
Not only is it important to have a vision of the goals and intended learning and outcomes for K–12 data science education, but it is also crucial to understand what is happening and how data science is being incorporated into K–12 settings. This chapter summarizes the presentations and discussions from three sessions: one on hearing from practitioners who work in or with schools implementing data science education, one examining the broader contexts in which data science can be situated to offer more meaningful and relevant data science experiences for students, and the last focusing on the integration of data science with other content area.
HEARING FROM PRACTICE
This session explored the reality on the ground in data science education, with a deep focus on the specifics of designing student learning opportunities. Topics included student learning progressions, opportunities for integration between data science and other subjects, and the wraparound resources needed for implementation. The session started with an overview by Zarek Drozda (Data Science 4 Everyone, University of Chicago), session moderator, of a commissioned paper on the national landscape of K–12 data science implementation. Four panelists who work in or with schools in implementing data science education gave brief overviews of their work and then participated in a moderated Q&A.
Overview of Data Science Programs
There has been an explosion of growth in data science programs over the last several years, said Drozda. However, despite this growth, data science education is currently reaching only a small percentage of students. In a paper commissioned for the workshop, Drozda and his colleagues used a variety of methods to survey the landscape of K–12 data science implementation, but he cautioned that the data are incomplete.1
Drozda shared a map (Figure 3-1) of the states that had statewide discrete data science education programs as of the summer of 2022; he noted that this map excludes hundreds of individual school and district programs. There are now 14 states with data science programs, compared to only two statewide programs in 2019, he said. Drozda estimated that there are over 1,600 schools or districts with active K–12 data science programs; this translates to over 2,000 teachers reaching over 180,000 students. Proportionally, this means that there are programs at around six percent of high schools, with 0.1 percent of high school teachers and 3.6 percent of high school students participating. He noted that the implementation of data science in K–12 schools is a global phenomenon, with programs in countries including Israel, China, South Korea, Australia, New Zealand, Germany, the United Kingdom, Sweden, and Mexico.
Drozda and his colleagues also looked at the frameworks and standards that guide content across school subjects, in order to determine where and to what extent data science is included. A survey of cross-state standards in mathematics, statistics, science, and computer science found 21 shared data-related topics across subjects, as well as multiple meaningful references to data. For example, he said, the College, Career, and Civic Life (C3) Framework for Social Studies State Standards2 uses the word “data” 53 times. However, Drozda cautioned that these existing guidelines are potentially not enough for data science education.
He and his colleagues also conducted a series of stakeholder interviews across the country to surface the perspectives of teachers. Teachers reported a number of barriers to data science implementation, including:
- Statistics is often cut from algebra courses, or data are brushed over or are secondary to a discipline;
- Low teacher confidence persists in data analysis, technology, or statistical thinking;
- A lack of technology exists;
- A lack of standards or guidance exists around privacy, bias in data sets, data monetization, or other data ethics issues; and
- Methods are too simplistic rather than reflecting the modern world of data and computing.
In statewide K–12 programs, data science content most frequently appears in mathematics classes, followed by career technical education, computer science, science, and social studies (Figure 3-2). Drozda noted that there is a tension between integration and focus; data science reaches more students when it is integrated into existing classes, but it is more likely to be taught with a dedicated focus and fidelity when it is offered as a discrete elective to fewer students. If classes are designed well, however, “you can have the best of both worlds.” Drozda gave an example from Oregon, where data science is infused into the core sequence that all students complete in their first two years of high school, and then elective classes are offered to enable some students to focus more deeply. He noted that there is no perfect pathway to teach data science, and that each student and their educational trajectory is individual. Drozda encouraged workshop participants to consider how they can increase options for students and how they can make data science in earlier grades work as a “launchpad” rather than a “filter” to encourage students to move toward data-related pathways.
Los Angeles Unified School District, California
In 2010, the Los Angeles Unified School District (LAUSD) and the University of California, Los Angeles (UCLA) were awarded a grant to bring statistical and computational thinking to math, science, and computer science classrooms, said Suyen Machado (UCLA). They began by creating modules to integrate data into algebra, biology, and computer science classes; she said this experience led to three insights. First, it is difficult to compete with established curricula. For example, teachers have a set of standards to meet over the course of the year in Algebra I. Many teachers did not see the connection between data science and the subject of the class, so there was a lot of attrition between summertime professional development and actual implementation. The second lesson was that a six-week module is not enough time to teach the concepts deeply, or for students to get to interact with data as much as they should have. The third lesson, said Machado, was that the program suffered in part because of a lack of awareness about the importance of data science. At the time, it was difficult to even find schools that offered statistics to high school students.
With these lessons in mind, the partners created a high school course that introduced students to dynamic data analysis, tools (R via RStudio class platform), as well as the necessary techniques and principles for reasoning about the world with data. Machado said there was a “perfect storm” of circumstances that allowed the Introduction to Data Science
course to take hold. First, the Common Core State Standards were being adopted, which included calls for statistics to be included in high school level algebra and geometry classes. Second, there was a lot of momentum in the computer science area, including new computer science standards and the “computer science for all” movement. This was the “perfect opportunity” to create a course where students could authentically engage with data, and also make the standards for math practice come alive through data. Machado emphasized that although their original intention was to integrate data science into existing courses, their experiences led to the realization that there was a need for a standalone high school data science course. The course was piloted with 10 teachers in LAUSD in 2014 and has now been adopted nationally and internationally.
San Diego Unified School District, California
In 2020, when education was largely remote due to the COVID-19 pandemic, the San Diego Unified School District (SDUSD) began piloting a curriculum called CourseKata,3 said Stephanie Melville (SDUSD). One of the valuable aspects of CourseKata, she said, is its unique professional development structure. The curriculum includes ongoing professional development for teachers that includes content, pedagogy, workshops, office hours, and collaborative spaces. Melville explained that teachers get an “initial boost” at a study group, and then have ongoing opportunities to learn and share ideas with others. For example, two of the initial adopter educators did not know each other but soon began sharing quizzes, tests, and ideas for projects. This study group now has six teachers in it from four different schools; Melville noted that this type of opportunity to work with others is “kind of a big deal” because schools generally only have one standalone statistics teacher. The group also holds daily Zoom office hours to support teachers who are new to the program. Melville said that teachers participate in the ongoing professional development “even if they don’t have to.” They go to support each other, to learn different ways of doing things, and to learn how to best support students in their learning.
The plan is to expand data science education into all courses, from pre-K through grade 12, said Melville. However, there are a lot of details to figure out. She compared the K–12 space to a dimly lit room where there are flashlights in cabinets labeled “pedagogy,” “methodology,” and “curriculum”; everyone is searching for the fix that will make math relevant and meaningful to students. Prior to this program, there was a single track to graduation for math, and many students were opting to not continue to study math. Adding data science to the curriculum in SDUSD has created
an entirely new pathway to graduation, said Melville, and has opened doors for many underserved populations including Black, Latino, and low-income students.
Fairview High School, Colorado
Paul Strode (Fairview High School) teaches high school biology and incorporates data and statistics into every part of the curriculum. His journey to this approach, he explained, began when he was a college student with no real understanding of how to use data. Statistics was not an available class in his high school, and he chose not to take a college statistics class in part because it was five days a week at eight in the morning. As part of his biology degree, he did an independent research project on water quality; he used some statistics in the project but didn’t understand what they meant. After graduation, Strode began teaching high school biology, and the only data analysis conducted in his classes was the Chi-square test. He noted this is typical high school biology content, and he “didn’t understand even that simple calculation that well.” Once Strode began graduate school, he came to the realization that he needed to “pull back the curtain” on data and actually learn how statistics were calculated and what they meant. When he headed back to teaching, he brought this new understanding with him and wanted to convey it to his students. Data science is foundational and constant in his biology class, with students computing descriptive statistics by hand (e.g., t test, regression analysis, variance). This approach “has made all the difference” because the students understand the operations and what the data mean. Strode said that teachers sometimes ask him how long his statistics unit is, and he tells them, “There’s no unit… It’s just seamless. It’s part of every single unit that we cover.”
Mobile City Science
Katie Headrick Taylor (University of Washington) said that her work with data science programs could be best described as “first doing the work and then naming it much later.” She explained that 12 years ago, when her work involved supporting young people to collect data about the lack of transportation in their cities, data science was not a construct that she was aware of. However, the process of collecting data and using those data to make arguments for resources in the community brought up many of the same issues and questions that have been discussed at this workshop, she said. For example, who gets to decide what “counts” as data? Once data are constructed and imbued with epistemic authority, what are the material consequences for people’s homes, schools, parks, neighborhoods, and
lives? Taylor’s project on transportation, now called Mobile City Science,4 demonstrates the “promise and precarity” of data, the contentious and politically fraught decision-making process, and how wielding data can usurp relations of power. It is still an open question, she said, whether giving data and data practices to young people of color is sufficient support to undo and re-imagine problems that are the products of years of racist and misogynistic systems. Further, does calling this process “data science” move us any closer to that goal?
Moderated Discussion and Audience Q&A
Following the panelists’ remarks, Drozda moderated a Q&A session followed by audience Q&A.
Drozda asked the panelists to comment on how educators could think about designing their own programs, and to describe what an “exciting, meaningful, and impactful” K–12 data science education experience looks like. Taylor responded that sometimes educators are already doing data science without knowing it. She noted that during the COVID-19 pandemic, nearly everyone was engaged in some form of data literacy, trying to make decisions about where to go and who to see when there were so many unknowns. The pandemic helped us, Taylor explained, to realize the consequentiality of data, and the fact that we need data to make decisions about how to stay safe and healthy. Connecting data to students’ everyday lives and facilitating advocacy opportunities can bring data science to life. Taylor’s work on Mobile City Science demonstrated how data are necessary for making decisions about the health and well-being of communities and neighborhoods. Young people wanted bike lanes to safely move around the community, so they collected geospatial data of daily routines, highlighted “mobility deserts” and dangerous intersections, and made visualizations to convey their findings. These data, combined with participation in neighborhood planning conversations and other efforts, resulted in a new bike lane that links outdoor areas, the school, and the public library. Taylor noted that there have been mixed results with respect to changes in the community for similar Mobile City Science efforts in other cities; some young people followed a very similar progression of data advocacy but did not achieve concrete changes in their neighborhoods. However, she emphasized that lack of a tangible change does not mean that the process was meaningless
to the students, and that learning how to engage with data to advocate for their own neighborhoods is an invaluable educational opportunity.
Strode agreed that hands-on participation by students is essential for quality data science education. He shared several examples of his classroom activities. In his biology classes, students collect data on themselves; for example, students take measurements such as heart rate, arm span, and height, and use apps to visualize the data and determine if and where there are relationships. The students enjoy being able to see their own and their classmates’ data points on the screen, he said. Another project involves students collecting data to detect patterns in nature. Students choose a relationship to explore, collect data either systematically or randomly, and then use tools to visualize and analyze the data. For example, one group of students collected data on the height of a plant and the number of branches it had. By collecting 150 data points, they found a “beautiful” exponential relationship between the two measurements. The students, said Strode, realized that they would not have found the relationship with only a few data points, and they were very excited about their findings. Finally, Strode said that he uses published papers to create a dataset, then allows the students to analyze and visualize the data themselves. When they discover that their creations are similar or identical to those in the paper, “that’s exciting for them.”
Figuring out the story that’s in the data is “part of the magic” for students, said Melville. Rather than starting with the data or the calculations, she said she has found the most success when she allows students to “look out into the world” to see what relationships might exist and how they might detect them. Melville said that while practicing calculations is important, it doesn’t always bring more students to the table of conceptual understanding; her primary goal is ensuring that data science education is meaningful, impactful, and powerful, particularly for historically excluded populations. Her students often share what they are excited about, and they see math, coding, and data science as tools for figuring the world out. Some students see connections that motivate them to take action—for example, advocating for a crisis hotline phone number on student ID cards to address a lack of mental health services. Melville said that this educational process is “beautiful” to watch, and that her students “love the story so much they don’t even realize—maybe they don’t even care—that it’s hard math they’re doing.”
Machado described an exciting, meaningful, impactful data science experience as one that engages diverse learners and provides authentic learning experiences. The Introduction to Data Sciences course5 uses a model of “participatory sensing,” in which students use their devices or cell phones to collect data about themselves or anything they’re interested
in. She emphasized that students are engaged and interested when the data are personal to them—whether because the data are about them or their interests, or because they are consequential to the students’ lives. Data science, said Machado, needs to be focused on the learner. Learners should have the ability to develop the questions that can be answered with data, they should be involved in the collection of the data, and the data should connect to their world. Machado gave the example of high school students posing a question about who is most likely to survive by the end of a horror movie (e.g., certain genders or races). Students can collect their own data, upload them into tools, and analyze the data to find their answers. In closing, Machado shared the “Data Science Sniff Test,” as created by Tim Erickson. She noted that there are sometimes questions about the difference between data science and statistics, and that Erickson’s description can make it clearer. The Sniff Test describes data science as a situation in which the participant is “doing data moves” and is “awash in data” (i.e., there are so many data, they are messy, and they need to be cleaned up), and the data have certain properties (i.e., are unruly and need to be unpacked), which goes beyond what you might get in a statistics class.
Drozda asked the panelists to identify the resources that teachers and others need to implement quality data science education. Strode responded that teachers need lots of training. He said that there is “lots of fear” around data, and teachers are asking for help. Machado agreed that training for teachers is critical, noting that teachers come with a wide range of knowledge but sometimes lack the conceptual understanding of data science. She added that another important stakeholder group is high school counselors and administrators. Machado and her colleagues bring these groups together for a professional development session to better understand data science, so that they are able to help students choose their courses and pathways. She noted that administrators also need education about the importance and role of data science because they are “bombarded” with mandates in all subject areas. Melville added that teachers can only do so much to convince students that data science is important and relevant. If counselors or university admissions staff continue to prioritize courses such as calculus over data science, there is a disconnect. She said implementing data science education will require changes in the messaging and actions in higher education. Taylor encouraged workshop participants to think about how to distribute the onus of teaching across a variety of professionals with disciplinary expertise in making sense of data. In addition, she said libraries can be a great resource for students and communities. There is a need for
research in this area in order to determine the best models for teaching data science and how to involve people from multiple sectors and disciplines.
A virtual workshop participant asked about the role of assessments in data science education. Machado responded that there is a need to come to a consensus about what the learning progression for data science looks like before considering what assessments should look like. However, she emphasized that the conversations should focus on authentic assessment rather than “more multiple-choice tests.”
Victor Lee (Stanford University) said that he has been working in the field of data science for many years and has noticed a recent resistance to the idea of classes using data collected from students. Although student-based data are relevant and interesting to students, there are concerns about privacy. He asked panelists to comment, and what they might say to policymakers or teachers about this issue. Taylor said that these concerns are substantiated and well founded, and it is critical to set up agreements with students about how their data will be used and who will be able to access them. For example, data may be anonymized but viewable at the aggregate level. These are thorny ethical issues, she said, and it is essential to communicate about boundaries and to allow students to opt out whenever they are not comfortable. Machado said that early in their Introduction to Data Science course, students collect data using their cell phones. Steps are taken to guard the data, including anonymizing and aggregating them, and giving students some opt out options. Later in the course, students design their own data collection; as teachers, it is our job to guard those data, she said.
Relationship with Other Disciplines
Christine Hirst Bernhardt (Einstein Fellow, University of Maryland) observed that many of the activities discussed at the workshop are already happening in science classrooms (e.g., videos of data visualizations from the National Aeronautics and Space Administration), and wondered whether there are conversations being held about how to strengthen data science education within science courses. Strode responded that although many teachers use visualizations of data to teach, fewer are finding or creating datasets for students to engage with. Some teachers say that there is not enough time for true data science education in their classes, he said, but he has found that incorporating data science throughout the curriculum does
not add a significant amount of time. A participant added that teachers from multiple disciplines have complained about the time it takes to teach students how to interpret data and graphs “over and over again”; he noted that making investments upfront in data science education might ultimately save time in the long run.
CONTEXTUAL FACTORS FOR K–12 DATA SCIENCE EDUCATION
Previous speakers have noted that when learning is relevant to youth, they engage more deeply in it and begin to initiate and drive their own learning, said Tammy Clegg (University of Maryland). To create these types of learning opportunities, we need to understand how and in what contexts data science is relevant to learners, she said, as well as what topics they are already interested and engaged in. The goal of this session is to explore the research on the settings and contexts of K–12 data science education with an emphasis on what data science looks like in these contexts and the connections with broader informal contexts that are relevant to K–12 learners’ lives. Each panelist in the session made brief remarks about their work, followed by a Q&A session moderated by Clegg.
Online Privacy and Security
“Most of my research is geared at trying to make the Internet a more trustworthy and inclusive space for all kinds of users, including K–12 learners,” said Marshini Chetty (University of Chicago). Since 2016, Chetty and her colleagues have been studying how K–5 learners deal with privacy and security concepts in the classroom, at home, and in other environments. The project focuses on issues including data access, data collection, data management, and questions about who collects data and why. Chetty noted that K–5 students generally have little opportunity to learn about data; teachers do not have the training or time to teach these concepts, and parents often defer to teachers on the topic. As a result, she said, these students lack critical data literacy. For example, students in an under-resourced school on the south side of Chicago may not be aware of data being collected on them from neighborhood cameras, or how this collection of data affects them or their communities.
Most efforts to teach children about online privacy and security start around middle school, said Chetty, because this is when children are spending time online and using social media. She compared this practice to “teaching kids how to swim in a pool when it’s already full of sharks,” and said that starting at a younger age is critical. Building a foundation about data science, privacy, and security as early as kindergarten allows for these
concepts to be built upon as children begin to use technologies and the internet.
Leveraging Student Interest: Data Literacy Through the Arts
Kayla DesPortes (New York University) told workshop participants about her work on data literacy through the arts;6 the project is aimed at exploring the co-design of data art units with middle school art and math teachers to examine how various arts disciplines can couple with data literacy to be mutually supportive. DesPortes works on two different units: data dance and data comics. In the data dance unit, students choose topics they are interested in and analyze data on the topic. For example, a group of students interested in women’s rights explored bar graphs that reflected the proportion of respondents who agreed with the statement, “A woman’s most important role is to take care of her home and children.” The graphs showed how responses differed among geographic regions and between men and women. The students then choreographed a dance, which DesPortes described: “All four dancers are representing females. We start by doing movements lower and work toward higher levels to show women working up toward their goals over time. All dancers then walk in circles to show different roles women can take if the perception is changed like it has been in North America and Latin America.” This transdisciplinary practice changed how learners worked across both data science and dance, said DesPortes. Learners practiced embodying numbers and values in dance movements by, for example, exploring the shape of the graph or the numerical values. They embodied the context and implications of the data through an exploration of the variables and how they are situated in the world. Students worked together to make meaning through collaborative movements and practiced interpreting and critiquing dance through perspective-taking as an audience member.
Integrated Computational Thinking
“My work focuses broadly on the intersection between values, social futures, and institutional change in digital learning,” said Rafi Santo (Telos Learning). Santo shared his insights into the institutional dynamics related to bringing data science into K–12 settings, based on his experience with the Integrated Computational Thinking project.7 Through his work on computer science education, Santo and his colleagues found that K–12 administrators were looking to develop K–12 groups and sequences
around computer science that were district-wide and comprehensive, and that middle school options were quite limited. Computer science tended to be integrated into science and math classes, and when administrators attempted to integrate the subject into other disciplines, “they often found themselves facing a coup.” Santo explained that while there is a close relationship between computer science and math and science, teachers in the humanities tended to push back against integrating computing and data into their courses. The goal of the Integrated Computational Thinking program was to identify design principles that would integrate computational thinking into language arts, social studies, and arts while enhancing disciplinary learning in these areas. In addition, Santo and his colleagues wanted to understand the institutional and disciplinary factors that mediate the possibilities for meaningful integration, and to develop evidence-based professional development resources that support computational thinking integration in these subject areas.
Santo highlighted one integration pathway: analyzing texts through computational methods; engaging in data practices for social studies inquiry; and seeing data in art and making data as art. Santo noted that in the course of their research, he and his colleagues identified a number of tensions that speak to the epistemic and cultural differences between computational thinking and the humanities: contextual reductionism, procedural reductionism, epistemic chauvinism, threats to epistemic identities, and epistemic convergence.8 He highlighted the tension of “epistemic chauvinism,” in which computational thinking epistemologies are elevated at the expense of the epistemologies of other subject areas; this can lead to the sidelining of existing ways of knowing, he said. Another tension, said Santo, can occur when learners who identify with humanities-related epistemologies feel alienated by the elevation of computational thinking identities (e.g., an “art kid” resists being asked to do math in art class). Santo emphasized that these theoretical tensions need to be taken seriously when considering whether and how to integrate data science into other subject areas.
8Contextual Reductionism: valuation of the abstract and quantifiable overrides the valuation of nuance particularly in historical events or literacy texts; Procedural Reductionism: inappropriate application of algorithmic logics to knowledge production; Epistemic Chauvinism: elevation of computational thinking epistemologies at the expense of those related to a focal discipline leading to “sidelining” existing ways of knowing; Threats to Epistemic Identities: elevation of computational thinking identities alienating learners who identify with epistemic identities associated with humanities disciplines; Epistemic Convergence: “reskinning” certain humanities practices as computational thinking foreclosing possibilities for substantial interdisciplinary novelty in problem solving. Definitions provided through presentation by Rafi Santo, September 13, 2022.
Data Science for All
Data Science for All is an initiative to identify and address equity, literacy, and data science across sectors, said Stephen Uzzo (New York Hall of Science). The project is part of a regional initiative called Northeast Big Data Innovation Hub,9 which was created to build a community of practice for cultivating a data-literate society and to establish a baseline set of principles to define what it means to be data literate. Uzzo and his colleagues brought together experts in a process of collaborative inquiry, culminating in a workshop to identify pathways into literacy that could be developed through informal settings such as libraries, cultural institutions, and community centers. They also identified organizational needs, such as the need for resources and training for library staff so they could better serve their patrons.
Uzzo shared the details of one data literacy project that focused on young children. “Big Data for Little Kids” brought together early learners and their families from immigrant and under-resourced communities. Families worked together in a museum to collect data, compile and organize the data, and interpret the data. The study found that families with young children can actively engage in data science, and that parents and children can talk about a wide range of data science concepts, said Uzzo.
Another project brought together educators, designers, learning scientists, and tool developers to identify barriers and opportunities for equitable data science education at the high school level. This convening resulted in several conclusions, said Uzzo, including the need for inclusive tools, resources, and curricula; the need for teacher support and teacher enfranchisement in the process; the need for integrating data science across the curriculum; and the need to make data science available to all students. Based on these findings, Uzzo and his colleagues partnered with STEM Teachers NYC to conduct monthly co-design workshops in which teachers and tool and curriculum developers worked together to identify integration strategies for data science and learning progressions that could be applied to curricula. Each workshop, said Uzzo, functioned like a design charrette—teachers drove the agenda, provided use cases, and focused on integration of data tools and techniques.
In another collaborative project called DataJam, teams of students brainstorm a research question, formulate a hypothesis, use big data to uncover answers, make their own data visualizations, and present their findings. The teams receive assistance and guidance from DataJam mentors. In one example of this project, a team from the Parliament of Mission
Indians conducted a water testing project and presented their findings in both English and their native language.
Moderated Discussion and Audience Q&A
Clegg moderated a Q&A session with panelists by posing her own questions as well as welcoming questions from workshop participants.
Working with Qualitative and Quantitative Data
Clegg began by asking panelists to comment on how they see students understanding and working with quantitative and qualitative data, and if there are ways that data science education needs to shift to better convey the rich context of data. Uzzo responded that using quantitative and qualitative data together to better understand issues such as climate change is an emerging area that educators need to focus on more in order to give students and community members a better understanding of what data are and how they can be used. Santo noted that the epistemic tensions he mentioned between computer science and the humanities are relevant to this issue. For example, contextual reductionism occurs when there is an overemphasis on quantitative data in the abstract rather than taking a holistic perspective of the context and how data fit in. Santo said that using data tools—such as textual analysis—can help students ask critical questions about the value of data, and to consider when and in what contexts quantitative data are useful. These types of practices can “build the muscle” of students to think critically about the use of data in context. DesPortes said that one benefit of the data dance project is that students get to work with quantitative data in a way in which there is no “wrong answer” and that allows students to reflect their interpretation of data in a creative representation of the context.
Buy-in from Stakeholders
During this session, said Clegg, many different stakeholders were mentioned—teachers, parents, librarians, community members, artists, and others. She asked panelists to comment on how to get buy-in from stakeholders, and how engagement with data science can be done in a way that allows stakeholders to do it “on their own terms and in their own ways.” Chetty responded that “buy-in is really tough.” Educators have an enormous amount on their plate already and bringing in data science adds to their load. Data science learning needs to be everywhere—from math class to art class—but getting buy-in from teachers will require professional development and resources to give them the confidence and the tools they need to integrate data science into their classes. For example, said Chetty,
“micro lessons” can be developed so teachers can easily incorporate a small lesson about data without adding an extra burden. Chetty added that in addition to buy-in from teachers, there is a need to get buy-in from school district officials and administrators to facilitate the implementation and prioritization of data science education in K–12. Santo concurred with Chetty, saying that making changes in the K–12 system requires participation and willingness of people along the entire chain of command. In order to get people on board, Santo said it is critical to take their concerns seriously. He noted that in his work, they start with the assumption that the teachers don’t care about computational thinking or data science. “Then it’s our job to figure out” how a data science approach can enhance the work of these teachers. He urged stakeholders to pay attention to what people say they care about rather than assuming they will be excited about the “new shiny thing.” Getting buy-in is not a transaction; it is a relational process that requires respect for teachers and their ideas and priorities.
Uzzo agreed that respect is critical to buy-in. Teachers “need to be, want to be, and have to be” involved in the development of tools, resources, and curriculum, and they must be viewed as respected experts in their field. Teachers have been enthusiastic participants during the co-design workshops that Uzzo has conducted, and their insight has been essential. For example, at one workshop, tool designers asked the teachers why they weren’t using the tools that were available. Teachers responded that one of the tools required a blog, and they weren’t allowed to have a blog. People were “gobsmacked,” said Uzzo; the tool designers had no idea that this was why tools weren’t being used. DesPortes added that stakeholders vary in terms of their ability or support to be co-designers; she stressed the importance of flexibility and building relationships with partners.
Creating a Collaborative Environment
Expanding on the idea of getting buy-in from stakeholders, Clegg asked panelists to comment on how data scientists need to think differently about data science to welcome stakeholders from across multiple disciplines and practices. Santo replied that “enormous amounts of listening” are always the first step to understand the context, the communities, and the concerns of potential partners. Further, he said, we need to be intentional about whose goals are being served and what communities are truly asking for. For example, if a community organization is interested in having a data science program because there are good jobs in that field, it is important to keep this specific end goal in mind when designing a program. He emphasized again that building relationships is critical to success. Uzzo agreed that listening is “job one.” Communities and stakeholders have essential questions that they want answered, and data scientists can help facilitate
the process of answering them. For example, in his work on hazards in coastal communities, they use participatory modeling10 to help community members explore scenarios and make decisions about policies and resources that are needed.
Overcoming Math Phobia
A virtual workshop participant noted that “math phobia” is common, even among teachers, and asked how to overcome this barrier to incorporating data science. DesPortes responded that in her work with math and art teachers, each side was “phobic” of the other side’s discipline. She said that the teachers didn’t necessarily overcome these phobias but instead worked together to put scaffolding in place so that students had the support they needed to work across disciplines. Santo said that he is math phobic, but there are tools that allow him to willingly engage in data science. When there is an “authentic context and relationship to a problem to be solved” combined with useful tools and resources, people are motivated to work and dive into the data. Chetty commented that it could be very helpful for people to see that there are math-phobic data scientists like Santo; she said that “representation matters.” Uzzo shared his opinion that math phobia is largely due to the way that math is taught, and there is a need for change in math education. He said that math is often taught first as abstract ideas and equations rather than through the real-world problems that can be solved using math as a tool.
Goal of Data Science Integration
V. Lee reflected on the discussions about integrating data science into other disciplines and said he had heard three different potential goals or approaches for this work. First, he said, other disciplines can be seen as a “Trojan horse” that is used to get students to engage with data and become data literate. In this goal, data are the point. A second approach is to use data to improve students’ understanding of and engagement with the other disciplines; here, data are subservient. Finally, there is a third approach where data open up opportunities for new forms of expression and exploration. Clegg added that she thinks the third approach is the right one, with the ultimate goal of educating students to be literate participants in an increasingly connected world.
10 In this approach, “participants co-formulate the problem and use modeling practices to aid in the description, solution, and decision-making actions of the group” (https://participatorymodeling.org/).
INTEGRATION INTO OTHER CONTENT AREAS
This session explored the ways in which data science has been integrated with other subjects beyond mathematics. Panelists shared details of their own work and discussed various approaches for integrating data science into the study of other content areas in both school-based and out-of-school contexts. Camillia Matuk (New York University) moderated the session. Each of the five panelists described their approach to data science, followed by a Q&A session.
Four Key Elements to Data Science Integration
Emmanuel Schanzer (Bootstrap) introduced himself as a proud former public school math teacher, computer scientist, and co-founder of an organization called Bootstrap.11 Bootstrap reaches more than 30,000 students each year in 49 states, in grades 5–12. The program works to integrate data science into mathematics, physics, history and social studies, and life/Earth sciences. As a curriculum architect, Schanzer said he thinks of curriculum design like baking a cupcake. There is room for flexibility, but there are also certain ingredients that are essential to a cupcake. At Bootstrap, there are four ingredients that are considered key to data science education: statistics, computing, civic responsibility, and domain investment. Civic responsibility is necessary in order to put data and analysis in context. For example, using a random sample that is drawn from a “society filled with bias” will result in racially biased algorithms and weaponized social media. It should not solely be on the shoulders of math and computer science teachers to teach this concept (i.e., civic responsibility) to students, nor are they necessarily the best ones to do it. Data science can benefit from the expertise of teachers in other areas—such as history and social studies—and these teachers can benefit from data science.
Domain investment, Schanzer explained, reflects the idea that data science teachers need to be able to go deep in multiple different domains to provide students multiple on-ramps for interest. For example, teaching data science using a dataset about wines in Tuscany versus a dataset about stop and frisk policies is going to have profoundly different impacts on relevance and engagement for students. Data science education can and should draw on diverse datasets, such as climate data, data on COVID-19, and data on gerrymandering. Schanzer said that people who are “scratching their heads” trying to figure out how to integrate data science with non-STEM classes are focusing too heavily on the first two ingredients of data science education (statistics and computing) and not enough on the second two (civic
responsibility and domain investment). Integration works when all four ingredients are valued. For example, he said, instead of trying to convince history teachers to make room for linear regression, we should be thinking about how data on redlining serve the existing goals of that classroom.
Justice-focused Data Science
Angela Calabrese Barton (University of Michigan) studies engagement with data in middle school science, most recently in the context of COVID-19 and sustainable communities. Calabrese Barton shared quotes from students talking about the pandemic to set the stage:
- “I don’t see myself in the data.”
- “It’s not numbers that will solve the pandemic but the stories people tell with numbers… If we don’t have different perspectives on the numbers, it will only offer one story. That won’t help everyone.”
- “It’s never only about COVID-19.”
Calabrese Barton explained that viewing the pandemic as a socioscientific issue makes visible how lives are rendered through data, quantifying and categorizing people in communities. In data science education, Calabrese Barton draws upon a data justice framework to explore how and why youth engage with data to make sense of, to make decisions about, and to take action on science-related issues. She shared three key takeaways from this justice-based framework. First, “datafication” does not impact all people equally. Data science education should consider the ways in which engaging with data takes place in socio-historical and political contexts, shaping what and whose data are made accessible and visible. Second, learning and engaging with and about data involves more than cognitive processes. Students also employ senses and feelings, and tools need to be developed to support youth in using these senses and feelings in making sense of data and taking action toward meaningful social change, she said. Third, data science involves epistemic (in)justice; that is, data are always produced and engaged from multiple epistemic positions, and how such productions are recognized as legitimate forms of knowing and doing in science matters. This has implications not only for what is taught, said Calabrese Barton, but also how it is taught and how it is assessed.
Enacting data justice in learning environments requires attention to how young people engage with and use data toward affecting their own and others’ lives, social relations, and possibilities, said Calabrese Barton. She gave several examples of young people’s engagement with data. One student, a 12-year-old named Prez, was experiencing mental anguish about COVID-19 in May 2020, in large part because of the data he found on the
Centers for Disease Control and Prevention website about hospitalization and death rates in the Black community. He found that his anxiety was lessened when he watched YouTube videos that encouraged taking reasonable precautions and put the dangers of COVID-19 into context. Calabrese Barton described this process as Prez “producing critical data practices overlaying YouTube with the big data to critique power structures that determine what data and data narratives count, to care for himself when data, society, and science [were] not.” As a Black youth, Prez encountered dominant data narratives that invoked harmful racialization and he used critical data practices to move beyond critique toward liberatory outcomes.
Another example that Calabrese Barton shared was of a 15-year-old Black girl named Jasmine who curated a complex data infrastructure involving data from different sources and epistemological origins in order to decide how to safely participate in protests for racial justice. Jasmine researched modes of viral transmission, examined protest images online to analyze mask-wearing and social distancing practices among these groups, and looked at rates and patterns of infection and spread in her city. Jasmine stated, “I had to decide whether to protect myself and my family against injustice by protesting, or to protect myself and my family by not going.” Jasmine’s journey, said Calabrese Barton, reflected a complex analytic process of weighing different data inputs to mitigate health risks and to fight for justice.
Expanded Approaches to Data
Rahul Bhargava (Northeastern University) asked, “Is it a necessity that working with data requires us to use computational methods and means?” Bhargava said his work tries to demonstrate that the answer is no. He noted that the history of data is rooted in collecting data about the “other” from a position of power and privilege, and doing something different can be a “radical act.” For example, he works with community groups to look at data about the community, to produce data when there are gaps, and to find a story in the data to paint as a community mural. Practices like data murals are a way of restating what data are, who they are for, and why they are used, said Bhargava. As data become more central to civic and community decision-making, it is essential that we broaden the people who are invited to the table. This broadened approach can take numerous forms, including building critical data practices, embracing epistemological pluralism, using arts-based approaches, and considering social impact. Bhargava recalled the earlier workshop discussion about math phobia and said that reframing data and data practices is a way to invite all people to the table, regardless of their comfort with math.
Data as Narration of Worlds
Josh Radinsky (University of Illinois at Chicago) works on data science in three areas, all of which inform each other. First, he works in the classroom to create and improve curriculum and instruction. Second, Radinsky studies how people narrate their social worlds using data, texts, and tools. Third, he is engaged in the ongoing struggle for educational justice in Chicago, using data. He described each of these projects in turn. Radinsky has been working in the classroom for 25 years, mostly middle school through college. He works closely with teachers to design and explore ways of introducing data science to students. For example, a tool called Social Explorer12 allows students to work with Census data from 1790 to the present; Radinsky noted that the excitement of learning with Census data can overcome the barrier of math phobia for many teachers.
Radinsky shared some maps with workshop participants to illuminate how people use data to construct the meaning of themselves, their communities, and their cities. One map demonstrated the disappearance of the middle class in Chicago between 1970 and 2015; another showed the huge racial disparities in police killings in Chicago. These types of data come together to construct the story of Chicago and to reveal how Chicago is different for people of different backgrounds. Radinsky said that data science education should reflect this approach of using data to construct meaning for our everyday lives.
His third area of work involves using data to advocate for educational justice in Chicago. Some city projects have devastated public schools but at the same time sparked “incredible organizing” of parents, community groups, and others who are fighting in support of public schools. Data are a powerful tool in this debate, he said, and they are used by both sides to make their arguments. Overarching all of these projects, said Radinsky, is the idea that data, texts, and tools are central to how we learn about the social world and change it. He emphasized that narrating data should be understood as a way we engage with and change the world, not just a secret code for describing it. Data science education needs to teach about data as part of—not separate from—our human relationships.
Data in a Youth Newsroom
Lissa Soep (Vox Media, LLC) spoke with workshop participants about her work at YR Media (formerly Youth Radio). YR Media is a platform and learning center for emerging digital media creators to report on and produce stories about pressing social issues. Headquartered in Oakland, YR
Media works with young people from across the country; most collaborators are youth of color and youth who have been ignored or betrayed by society. YR Media’s focus on arts and music is vital to how it uses data to produce stories, said Soep, noting that she comes from a humanities background and is “not a data nerd.” To her, the integration of data into journalism is a recognition that data are needed to move the conversation in the direction that young people are organizing for.
Soep shared several examples of the work being conducted at YR Media. The first example was a project called “Can You Teach AI to Dance?” This project, she said, stemmed from a discovery that Spotify uses a hidden, artificial intelligence (AI)-based rating system that scores songs on the basis of a number of qualities, including danceability. “These were fighting words” for the young people, who did not want to be told by AI which songs were danceable. They developed an interactive experience that invites users to compare their ratings against the AI ratings; in the process, the project reveals and unpacks how AI is operating in this space and what type of data are being used. The second example was called “Erase Your Face.” This project explored facial recognition tools by inviting users to digitally scribble over a face in order to determine how difficult it is to go undetected. The third project was “Surveillance U,” part of a year-long investigation into virtual proctoring tools13 that are becoming more commonly used, particularly during the pandemic. Soep said that this project was spurred by nearly 90 petitions filed by students to stop the use of these tools; students’ concerns included mental health impacts, harms to the learning environment, and privacy and bias issues.
There are several frameworks that are relevant to this work, said Soep. The first is a framework she has developed with a colleague for critical data expression (or critical computational expression; Figure 3-3). Critical data expression, which is at the intersection of technology, justice, and art, involves using data as storytelling material, and discovering, analyzing, and sharing data in expansive ways that engage young people and people in positions of power to bring change.
Another key framework, said Soep, is “humanizing collegial pedagogy,” which requires rich collaborations between young people and veterans; through these collaborations, humanity is made central in an effort to hold young people accountable to themselves, their peers, their partners, and to the broader public. Digital afterlife is another framework that accounts for what the work will do once it is released into the world, said Soep. Unlike a traditional classroom assignment that is turned in to the teacher, the work
13 A simulator that is given to users to experience what it’s like to have your every move monitored under the suspicion that you may or may not be cheating on tests that are being taken remotely.
done at YR Media is published and becomes part of the national conversation. This requires a framework like digital afterlife that centers not only on findings but moves forward toward dialogue and engagement.
In closing, Soep shared a quote from an upcoming publication by herself and a colleague: “Through the digital media products they create, young people can reimagine what’s possible in their lives and futures, and work with others to pursue that vision for change. Critical [Data] Expression synthesizes vibrant traditions from cultural studies, STEM, and the arts, honoring the interdisciplinarity and deep humanity of a society that depends on, but cannot be reduced to, its near-constant use of technology.” (Lee and Soep, 2023).
Moderated Discussion and Audience Q&A
Following the panelists’ remarks, Matuk led a Q&A session with panelists and workshop participants.
Looking at Sensitive Issues Through a Data Lens
Matuk began by recalling a comment in Louie’s presentation (Chapter 2) about how there is a risk to vulnerable students when teachers aren’t prepared to address social and political issues with awareness and sensitivity. She asked panelists to comment on what lessons they have learned through their work about how to best approach socially and politically sensitive issues through a data lens. Schanzer responded that at the same time that he and his colleagues were working in New York City to implement data science education, there was a separate effort to help teachers engage in difficult conversations with students. Some teachers felt as though they’d “been thrown off a cliff,” he said, but they found that data science actually could provide a useful framework for these conversations. Schanzer explained that difficult conversations often evoke a strong emotional response, and data can help ground the conversation in facts. Calabrese Barton added her perspective that it is critical to question what is considered productive, legitimate conversation around tensions. She explained that youth are experts of their lives and stories, and the tensions reflected in difficult conversations are not abstract but are grounded in the lives of real people. When teachers introduce difficult topics, it is essential that the conversation reflects the students’ lived experiences and the complex and multilayered nature of the topics. Radinsky observed that the concept of “difficult conversations” can mean different things to different people, and conversations can be difficult for both White teachers and non-White teachers. He said that when teachers aim to avoid difficult conversations by, for example, avoiding the topic of race, this makes “all kinds of conversations difficult.” He observed that one way of avoiding difficult conversations is by using code words such as “diversity” rather than clearly communicating the name of the racial, ethnic, or other group (e.g., calling a 98% Black school “diverse”). By naming race or other issues clearly and explicitly, we can better see and reach the learning objectives. Bhargava agreed with Radinsky’s perspective and said, “If we can’t say the word Black…we can’t talk about these issues.”
Bhargava added that when taking risks in the classroom, he likes to use the analogy of building a playground. On a playground, students might be willing to take risks they wouldn’t normally take, willing to swing higher or jump farther. Similarly, if the right environment is present in an educational setting, students may be willing to take risks and have conversations that they are otherwise reluctant to have. For example, he said, he has been working on an approach to data theater that builds on the populist tradition of participatory theater. One of the findings from this work is that people are more willing to question the epistemic power of data in the theater setting than in other more traditional data science settings. The media that we use to examine and explore issues, he said, shape the environment
and the conversations that can be held. Soep shared her perspective on difficult topics and difficult conversations, saying that we often lean too quickly toward technology solutions, but it is important to be flexible when leaning into sensitive, emotional, personal, and cultural issues. She shared the example of a story on AI tools that were being used to reanimate still photographs of people who had died; as the reporter was investigating, the story turned away from the details of the technology and more toward an exploration of grief and societal support for those who are grieving.
Applying Lessons to K–12 Classrooms
Given the broad range of experiences discussed in this session, Matuk asked panelists to consider what they had learned from their work that could be applied to data science education in the K–12 system. Specifically, she said, where are opportunities to build on synergies between disciplines and stakeholders?
Calabrese Barton began by emphasizing the importance of considering “who benefits and who gets hurt and what that means for who’s at the table” when designing and implementing data science projects. Calabrese Barton and her colleagues have used a co-design approach that gives youth, families, and communities the chance to decide when, how, and why data are produced. Co-design is not something that happens overnight, she said, but instead requires intentional, relational practices and vulnerability. She shared an example of an engineering project that started with two years of work co-designing materials and tools using a data justice lens in community contexts, and then moved into a classroom setting in the third year. Setting this work in the community allowed youth and families to have different forms of authority than they might have if it took place in schools, she said. One of the insights that grew out of the project was a framework called “pedagogies of community ethnography”; this framework centers community knowledge as a valuable part of STEM knowing, creates spaces for co-generating and integrating community knowledge into STEM learning, and provides tools for navigating data from different epistemological origins. This approach, said Calabrese Barton, supports youth in making meaning and engaging in critique and liberatory goals. As youth consider technical specifications for how an engineering design might work, they learn to balance these with the needs and wants of their communities that they have collected through surveys, interviews, and observations. For example, one unit challenged students to develop a prototype that involved multiple energy transformations and a renewable energy source. One group designed a light-up limbo stick in response to their ethnographic findings that “lack of fun” was a big concern among youth in the community. The students, said Calabrese Barton, developed a nuanced and critical view of
what “lack of fun” meant for youth; for example, they observed that kids who are bored often get in trouble and lose future opportunities for fun. Calabrese Barton said that this project gave the students the opportunity to both make meaning with and embody the data.
Bhargava shared a story about a fellow teacher’s experience with teaching data science in a high school math class. Students were given the assignment of collecting and producing data about an issue they were passionate about, and then giving a presentation on their findings. Students were given multiple options for how to present, including making data sculptures. One student, a Black girl whose mother was facing breast cancer, chose to explore data on breast cancer and the disparate outcomes for women of color. She created a three-dimensional sculpture with two breasts that represented women of color and other women; each breast was layered based on the frequency of incidents or positive outcomes. The sculpture, said Bhargava, communicated data on multiple levels; it spoke to the actual physical condition, the disparities, and the idea of mastectomy. The student won the “students’ choice award” for the sculpture; Bhargava noted that the student had not previously been creating work of this quality. This story, he said, speaks to the power of “finding windows and opportunities” to be creative and to challenge students in new ways.
How to Work with NGSS Standards
Andee Rubin (TERC) said that if a science teacher is following the Next Generation Science Standards (NGSS), there is a tension between fidelity to the standards and allowing students to be creative with using their own data. For example, she said, a unit might be designed around students finding specific patterns in a dataset, which requires the teacher to provide data that are specifically curated for this purpose. She asked panelists to comment on this tension and how to balance the needs of student-led data science and NGSS requirements. Radinsky replied that using student-generated data is not always appropriate or ideal, and he often tinkers with datasets to make them work for a particular activity. Mapping to the standards is critical for science classes, he said, and it is possible to get students engaged in datasets that are designed for that purpose by considering “where are the hooks in this dataset?” Getting students engaged and interested is more a matter of the “culture of the classroom,” he said, than the particular dataset. Schanzer acknowledged that this tension is real and said that different learning goals require different ways of approaching data science. He agreed with Radinsky that looking for a “hook” in the data is essential and said that there is a need for datasets that have depth and allow students to explore the mysteries within them.
Erika Shugart (National Science Teaching Association) observed that many of the projects discussed in this session were “one-offs” that require a very well-prepared teacher. She asked panelists how these types of data science activities could be scaled for implementation into classrooms. Bhargava began by noting that the Data Culture Project14 website has a suite of 12 activities with curricular guides for teachers; the activities are small and easy to weave into the work of the classroom. Radinsky said that clarity of learning objectives and clarity of assessments are essential for teachers when they are implementing new materials. A site like Social Explorer is useful because it has prepackaged vignettes, but the lack of teaching supports makes it hard for teachers to use. Teachers talking to teachers is the only way that data science innovations will make it into the classroom, he said; teachers need to see examples of how other teachers have made it work.
This page intentionally left blank.