ChapterPDF Available

Teaching Data That Matters: History and Practice

Authors:

Abstract and Figures

The massive growth in data learning offerings in higher education is mainly focused on technical skill and tool training. There is a growing movement to educate with “data that matters,” introducing students to the social structure and processes that have produced data, and in which it can have the most impact. This chapter introduces case studies of some of these efforts and summarizes four guiding principles to support them. These examples encourage creating playgrounds in which to learn, connecting students to real data and communities, balancing learning goals with student interests, and letting learners take risks. We close with a “call to arms,” supporting data educators in challenging the historical structures of power embedded in data, diving into the ethical complexities of the real work, and teaching how to use data for the overall social good.
Content may be subject to copyright.
Teaching Data That Matters: History and
Practice
Cite this chapter:
Bhargava, R. (2023). Teaching Data That Matters: History and Practice. In: Raffaghelli, J.E.,
Sangrà, A. (eds) Data Cultures in Higher Education. Higher Education Dynamics, vol 59. Springer,
Cham. https://doi.org/10.1007/978-3-031-24193-2_11
Rahul Bhargava, Northeastern University
r.bhargava@northeastern.edu
Short Bio: Rahul Bhargava is an educator, researcher, designer, and facilitator who builds
collaborative projects to interrogate our datafied society with a focus on rethinking
participation and power in data processes. He has created big data research tools to investigate
media attention, built hands-on interactive museum exhibits that delight learners of all ages,
and run over 100 workshops to build data culture in newsrooms, non-profits, and libraries.
With Catherine D’Ignazio, he built Databasic.io, a suite of tools and activities that introduce
learners from various domains to working with data. Rahul has collaborated with a wide range
of groups, from the state of Minas Gerais in Brazil to the St. Paul library system and the World
Food Program. His academic work on data literacy, technology, and civic media has been
published in journals such as the International Journal of Communication, the Journal of
Community Informatics, and been presented at conferences such as IEEE Vis and ICWSM. His
museum installations have appeared at the Boston Museum of Science, Eyebeam in New York
City, and the Tech Interactive in San Jose. Rahul is an Assistant Professor in Journalism and Art +
Design at Northeastern University, where he directs the Data Culture Group.
Abstract
The massive growth in data learning offerings in higher education is mainly focused on technical
skill and tool training. There is a growing movement to educate with “data that matters,”
introducing students to the social structure and processes that have produced data, and in
which it can have the most impact. This chapter introduces case studies of some of these
efforts and summarizes four guiding principles to support them. These examples encourage
creating playgrounds in which to learn, connecting students to real data and communities,
balancing learning goals with student interests, and letting learners take risks. We close with a
“call to arms,” supporting data educators in challenging the historical structures of power
embedded in data, diving into the ethical complexities of the real work, and teaching how to
use data for the overall social good.
Keywords
Higher education, social justice, data storytelling
Introduction
Data science is one of the most popular careers of the past decade (Manyika et al., 2011).
Computer scientists are enrolling in statistical training programs to enhance their data analysis
skills; social scientists are taking up courses in R Studio and Jupyter Notebooks; statisticians are
studying algorithms to begin employing computational quantitative methods to their work. Yet
what is this all in service of? Many in the corporate sector see data as a new resource to be
mined for efficiency, opportunity, and profit. Others view data as a potentially neutral reporting
of the conditions of some situation, driving more informed decision making. A third group
centers data as a tool to reveal historical inequalities and work to mitigate their impacts from a
social justice lens.
Data has become increasingly central to civic and corporate life. Data-driven decision making in
civic settings has become a global norm. The Sustainable Development Goals, a set of 17 goals
adopted by the UN General Assembly in 2015, were introduced with an associated set of data
indicators intended to aid in the monitoring and assessment of their implementation.
Departments and Secretaries of Statistics across the globe now employ these data-driven
assessments. Every two years they gather at the UN World Data Forum to connect and discuss
these approaches. At the local level, the "smart cities" movement has led to significantly
increased adoption of data as central to all types of decision making processes (Batty, 2013).
This adoption is exceeded in the corporate sector, where business after business has engaged
the tools and processes of big data. Data science in particular is a field of growing importance in
the job market. Entire industries have shifted to embrace data - capturing reams of it to support
process measurement, building data dashboards that drive more informed decision making,
aggregating and marketing data as a product in and of itself.
Amidst this backdrop of surging interest, and divergent goals, institutions of higher education
have created numerous pathways for students that want to build their data science skills.
Paralleling the historic responses to the emergence of new technologies, many educators began
by focusing on the skills-based acquisition learning strategies (Ryan et al., 2019). In the domain
of data this has typically involved computation-based tools like spreadsheets, programming
environments, and more. As the mythology of data science as a solution to every problem gives
way to a more nuanced engagement of the potential and peril of its application, university
educators are being pushed to embrace the questions of data origin, responsible and ethical
data use, and who is invited in and given permission to work on data problems. These more
fundamental issues connect to questions of interest for many academic researchers, and are
spilling into their teaching approaches and classroom offerings. Data is more than just a
computationally-mediated math; it is socially situated and impacts real lives.
Universities have taken many approaches to building offerings related to data science. Initial
steps in many educational institutions were taken by librarians, the traditional bastions of
cross-disciplinary resource offerings in many settings (Calzada & Marzal, 2013). As often is the
way in universities, various departments initially described data science as a specialized
application of their core domain. Statistics departments argued data science is a specialized
form of numeracy and/or statistical literacy. Computer science departments argued the
computational methodologies situate it as an applied digital literacy within their purview.
Business management schools focused on the decision support applications and argued data
science was a domain they had long-standing expertise in already. In fact these are all
applications of the large set of skills nebulously described as "data science".
In higher education settings this leaves us with a set of questions about how to best teach data
science to learners in multiple disciplines. In this chapter I showcase, and argue for, educational
approaches embracing data that matters. What matters is of course up to the particular cohort
and their instructor, but predisposing oneself towards even asking that question brings in
relevant history, practice, and focus on the social good. I begin with a short critical history of
data and its relation to power dynamics, move on to document ongoing trends towards more
thoughtful and engaged application of data science, and close by detailing a number of
approaches in higher education that embrace a more holistic and just pedagogy in relation to
teaching different types of data science work. Teaching data that matters is a growing practice
with numerous methodologies, all of it well suited to attract learners, innovate applications,
and impact the communities we live in.
A History of Power and Data
A Tool for Oppression
Large sets of quantitative data are often discussed as a rather new phenomenon, emerging
from the digital practices that enabled computation-based replication, reproduction, and
storage. Digging further back in history, one finds a set of data practices that long pre-date
current record-keeping practices. Myers West introduces this concept as "data capitalism",
arguing that data as a commodity "enables an asymmetric redistribution of power that is
weighted toward the actors who have access and the capability to make sense of information."
(West, 2019) History is littered with examples of numerical assessment of populations wielded
by those in power. Information is often discussed as power in and of itself. In South American
Andean cultures, Khipus are elaborate assemblages of knotted string used for millennia to
record extracted numerical data such as tax records and military obligations of the populace
(Medrano & Urton, 2018). On the other side of the world, from 2500BC the ancient Egyptian
cultures were creating census datasets in order to determine how much labor force could be
conscripted into the construction of pyramids for their pharaohs (Census-Taking in the Ancient
World, 2016). Fast forward to more recent times and you find examples from the global horrors
of the past few hundred years that we continue to suffer the ramifications on. This history
impacts us still today, embedded in our data practices and technologies. This history must be
acknowledged in order to create more just applications of data practices.
Figure 1: A page from the records of the Slave Compensation Commission.
We can begin with an example from British history, where data spreadsheets played a crucial
role in compensating slave owners for their "loss" after slavery was abolished there (Olusoga,
2015). While slavery was outlawed in the Commonwealth by the Abolition of the Slave Trade
Act in 1807, it wasn't until 1833 that the Slavery Abolition Act freed the almost 1 million
Africans who until then were legal property of British citizens. Included within that act was the
establishment of the Slave Compensation Commission, which was tasked with adjudicating how
to pay former slave owners for the "property" the government was taking from them. You read
that correctly - the slave owners were paid for each slave emancipated! What was the
Commission's solution? They created a point in time census of the British slave population in
1834; logging in meticulous detail the location, name, and slave holdings of 45,000 British
citizens making claims to the commission. This data was used to decide how much to
compensate slave owners, based on how many slaves they had owned. This massive transfer of
wealth from the commons to these private, formerly slave-owning, citizens amounted to
roughly 40% of the government's total expenditure for 1834. This further inflated their balloon
of wealth in a manner that has lasted until today. The paper spreadsheets, logged in familiar
columns and rows format, drove all of this; the data it contained was the official tally of the
horrors of the practice (see Figure 1). The dehumanizing manner the slaves were recorded
extends through today's data practices, as we log and count and distance so many datasets
from the people they represent.
Fast forward to the early part of the 20th century and we find another example of data driving
horrific subjugation in IBM's data counting machines, which were used to identify, process, and
track many of the millions killed by the Nazis in the era of WWII (Black, 2012; Delio, 2001). This
history is still contested, both in the court of law and public understanding. However, much of it
is indisputable:
IBM was a market leader in data processing and tabulation machines during that time;
IBM's German subsidiary, Dehomag, sold machines that the Third Reich used during the
1933 and 1939 censuses (Dehomag was nationalized before the start of the war);
IBM's chairman at the time, Thomas J. Watson, was awarded the Merit Cross of the
German Eagle Star by Hitler.
The Reich took a data-driven approach to the Holocaust; reducing people to numbers on a card
pales in comparison to the dehumanization they practiced daily. We can be fairly certain that
automation technologies made their unthinkable work more efficient and targeted. All of this
was known of and approved of by IBM's leadership, a point of contention that came back to
light recently when IBM decided to brand their cloud-based AI offerings under the moniker of
"Watson'' technology solutions. Watson was, of course, named after their corporate chairman
during that period of time who oversaw these sales.
Returning to our current times, typical data gathering efforts in the non-profit domain continue
to be extractive processes that seldom welcome oversight and ownership with the people
actually represented by the data. Assessments run by large NGOs in communities they work in
typically utilize 3rd party staff and non-native technologies; there is a whole economy of
companies offering precisely that service. Once these datasets about a community are
collected, the raw information is typically handed over to another organization outside of the
community for analysis, the results of which are again seldom shared with the community. This
cycle of data violence plays out across the globe, all in the name improving living conditions in
the targeted settings, yet seldom acknowledging the built-in power dynamic it creates. Without
oversight and control of the datasets collected about them, communities in aid-supported
settings cannot be authentically invited to the table to make decisions about how aid intended
for them is structured.
I choose these particularly evocative examples intentionally, to provoke feelings of outrage and
horror. They stand in stark contrast to the ongoing fetishization of data science practices and
techno-optimistic predictions of impact without negative intent, nor unintended secondary or
tertiary effects. Data has historically been used by those in power to amass and centralize their
power more. We must acknowledge this history and its legacy if we hope to use data for more
emancipatory purposes.
A Growing Critique of Data Practices
There is a growing critique of this history; an increasing contestation of the norms of data use in
our society. As "data analyst" gave way to "data scientist", the perceived prestige and value of
the position changed significantly. "Data Scientist" is often spoken of as a high-paying job
requiring significant expertise, while "data analyst" raises images with many of a low-skilled
worker that copies numbers from one spreadsheet to another. This newfound popularity has,
appropriately, brought increased attention to the impacts of those that work in the sector. An
engaged critique built around the impacts of data science practices has emerged, influencing
the techniques and norms of how it is performed and how it is taught.
Our current data practices and technologies embed the historical power dynamics introduced
in the previous section. Data scientists were the drivers of the "Big Data" revolution; they
argued that with enough data they could neatly provide data-driven insights for decision
support, or predict practically anything. The industry that has created extracts data from the
populace and monetizes it without their stewardship, engagement, nor awareness. Zuboff's
description of this as "surveillance capitalism" has gained significant traction - “an expropriation
of critical human rights that is best understood as a coup from above” (Zuboff, 2019). For our
purposes we can focus the critique of data science along 4 specific axes (D’Ignazio & Bhargava,
2015):
Lack of Transparency: The data about people’s interactions with the world is generally
collected with only token approval, if any at all, from the user. This denies the subject
awareness that their actions are being recorded at the time the actions occur.
Extractive Collection: The data is collected by third parties and is not meant for
observation or consumption by the people it is collected from (or about). This denies the
subject agency in the data collection mechanism and interaction opportunities with the
collector.
Technological Complexity: The data is analyzed with a variety of advanced algorithmic
techniques, and discussed with highly technical jargon. This denies the subject an
understanding of how any results were achieved, and how they might be critiqued.
Control of Impact: The data is used by the collector to make decisions that have
consequences for the subject(s). This denies the subject participation and agency in
decisions that affect them.
Measured along these axes, the early phase of "Big Data'' mythology has come and passed for
many. The popularity and impact of critical books such as Noble's Algorithms of Oppression and
O'Neil's Weapons of Math Destruction are hallmarks of this shift (Noble, 2018; ONeil, 2016). A
second phase of books, such as Criado-Perez's Invisible Women and D'Ignazio and Klein's Data
Feminism suggest ours is a time of data critics fighting to create a more reflective practice of
data science, one grounded in the lived experience of the peoples and behaviors the data
represents (DIgnazio & Klein, 2020; Perez, 2021). It is no accident that the authors of all four of
those books identify as female; a population that Criado-Perez's work demonstrates over and
over again is left out of the data driven design decisions all around us. Data-driven harms most
often exacerbate existing inequalities; further marginalizing the marginalized; further
oppressing the oppressed. Beyond the thorough critiques contained in these books lies an even
larger set of texts emerging from activists, organizers, and academics.
One response to the critiques has been the growth of the global open data movement. As data
came to be seen as a resource, the egalitarian idea of sharing it freely to empower innovation
took firm hold. Advocates argue that releasing data freely drives innovation and creates
markets, increasing economic growth and liberating the information to impact society
positively. This optimistic view is not without critique or unintentional harms. Two recent
incidents with negative impacts serve as illustrative examples. As a first, take Strava, a fitness
tracking mobile app. In 2018 Strava decided to release a map of the geographic routes of all
their users' exercise routines. This data, while not associated with a particular user, is hardly
anonymized. When combined with other data sources, it becomes quite easy to identify and
weaponize - news reports soon followed about how the maps gave away the location of secret
US army bases (Hern, 2018). A second example comes from 2014, when in response to a
Freedom of Information Act request the New York City Taxi Commission released metadata
about more than 150 million taxi trips. Though they attempted to anonymize the trip
information, online data experts were able to identify individual rides, and when combined with
other information sources, were able to extract information like personal addresses of movie
stars that took taxi rides (Gayomali, 2014). An example more focused on those typically at risk,
not the military or movie stars, can be found in an open data project from India. In 2001 the
state of Karnataka created the computerized "Bhoomi" system to digitize their land ownership
records (Benjamin et al., 2007). Instead of increasing efficiency and transparency, this attempt
to standardize and recognize property rights created new opportunities for bribery and
corruption. In addition, by centralizing and creating a data store of land ownership it made it
easier for predatory planners to seize large swathes of land and sell them off to developers.
This open data system directly led to livelihood loss and land seizures from small landowners in
the state; the digitized land dataset was weaponized against the people whose possessions it
recorded. These harms and risks parallel many of the stories shared in the previously
mentioned books, yet emerged unintentionally from poorly thought out data sharing practices
related to noble efforts to open datasets as a public resource. Releasing the data for social good
didn't necessarily result in that.
More broadly, these critiques of data science practices have spurred a movement often called
"data for good." Practitioners in this space have been laying down the groundwork for a pro-
social data science practice, with its own set of ethics, norms, and case studies. Data.org is a
recent example of this, created in 2020 by a joint US$50 million grant from the MasterCard
Center for Inclusive Growth and the Rockefeller Foundation. They primarily convene the field
and found award challenges to push the sector forward. Non-profit DataKind is another
example in this sector, working to bring volunteers together with nonprofits to work on
problems where data science skills can push forward their mission-driven work. DataKind
recruits data science experts who work in commercial settings and in parallel helps non-profits
define their data problems in a way the data scientists can understand. Their joint projects
create novel data-driven solutions that align with the missions of the organizations. Non-profit
Data4Change operates with almost the same model. DrivenData provides an online setting for
groups to list social good-related data challenges, and for experts or learners in the field of data
science to take those challenges up. The problems revolve around creating predictive models,
with teams competing for the highest score (i.e. the best predictions). Other groups, such as
Data 4 Black Lives (D4BL), are focused on those most impacted by the negative harms of data
science programs in public settings. In the US those burdens are historically borne by the
communities of Black, indigenous, and people of color (BIPOC). D4BL hosts multi-sector
convenings, runs programs to increase positive impact on their member communities, and
works in media channels to shift the narrative around data science work that has social impact.
Their recent work with Demos links current day big data practices to a long legacy of capitalist
technology put in use to suppress and subjugate populations (Milner & Traub, 2021). The
Algorithmic Justice League similarly works across multiple media platforms to impact data
related policies and products. In response to the Sustainable Development Goals described
above, the organization data2x argues for disaggregation in order to identify specific
subpopulations that might be left behind despite aggregate improvements overall. They
specifically call for collecting and analyzing data metrics broken down by gender and other
attributes; without which one cannot target interventions at the most at-risk sub groups who
are in dire need of help. These groups inspire, ignite, and unite the burgeoning "data for good"
sector around questioning who data is about, who owns it, and for whose data projects truly
serve.
The "data for good" label has been put to use so often recently it is hard to come to agreement
on a central definition and core goals (Hooker, 2018; National Academies of Sciences,
Engineering, and Medicine, 2020). There are many counter-critiques being made, including
concerns like:
volunteer-driven efforts are hard to sustain over time;
the most attractive problems to volunteers often the "cutting edge" ones, rather than
the mundane-but-impactful ones;
companies donating technologies can market this "good" service in a way that absolves
them of having to question their central work and its harms;
partnering doesn't help build capacity in the underserved areas that receive the benefits
of the projects;
separating data used for "good" suggests it is acceptable to use data not for good.
It is helpful to acknowledge these negative aspects while still pursuing the positive impacts of
the larger movement to put data in the service of social good. The movement is building
capacity for the typical subjects of data to own and analyze it themselves, and providing
pathways for data science learners that fully embrace responsible and ethical uses of data.
The "data for good" response has been picked up, and in some cases driven, by academic
practitioners seeking to focus on the social impacts of data science practices. "Data science for
social good" (or DSSG) is becoming a commonly known term in higher ed. Convenings like
Bloomberg Data for Good Exchange have brought together those in industry, non-profits, and
academia to build the movement. Members of prestigious academic groups such as the
Association for Computing Machinery (ACM) created tracks at ongoing events, and completely
new ones such as the Conference on Fairness, Accountability and Transparency (FAcct). These
gatherings have built community and validity for the idea that working on data science in
service of the social good is an acknowledged and accredited pathway within teaching and
learning at the university level.
Teaching Data in Higher Ed
The core missions of higher education, to create informed students that can innovate and
create a better world, is not well served by simply deploying the disempowering historical
dynamics I describe above in classroom settings. Universities are built on a foundation of
learning, growth, and betterment; these goals connect directly to introduce data science in
service of the social good. Merging these goals is a challenge many have taken up, creating
models for instruction that challenge the status quo of data work, engage learners in hard
questions about ethics and impact, and question how much we understand about the rhetorical
intent and power of data itself.
Few sectors are immune from this data "revolution", as it is often called in the press. This has
led to a raging debate on whether data literacy is a critical literacy that needs to be learned by
all, or a specialized application of statistical literacy and computational methods that need not
be broadly introduced. Either way, I argue that in educational settings those that look to enter
the field as practitioners should embrace the double meaning of critical employed here - both
strikingly important, and regularly reflected upon with doubt and inquiry. For the purposes of
this chapter, I will employ a 4-part definition of learning goals related to data (Bhargava &
D’Ignazio, 2015), encompassing:
1. Reading data: Understanding what data is, and what aspects of the world it represents.
2. Working with data: Creating, acquiring, cleaning, and managing data.
3. Analyzing data: Filtering, sorting, aggregating, comparing, and performing other analytic
operations on data.
4. Arguing with data: Using data to support larger narratives intended to communicate
some message to a particular audience in service of soma goal.
To define it simply, data education should build the ability to engage constructively with society
through and with data.
When discussing working with data as a new "literacy" made up of these components, it
behooves us to learn from debates throughout history about literacies and whom they
empower and disempower. Lévi Strauss argued that early reading literacy promotion was
actually in service of the power elites, not focused on empowering the subjugated masses (Levi-
Strauss & Wilcken, 2012). Another perspective on the underlying idea of literacy builds on the
empowerment-focused pedagogy of Freire, whose "popular education" work demonstrated
how reading literacy learning could be structured intentionally to build learners' ability to
challenge the power structures around them (Freire, 1968). Within our current times, most
literacies are thought of in service of learning, empowerment, and growth - most would agree
that they are positive improvements when a measure of any literacy goes up. Institutions of
higher education have embraced that sentiment, leading to a number of educational
approaches I wish to highlight that showcase challenges and responses to teaching data that
matters.
In the classroom, university educators are implementing the socially engaged methods
showcased at these convenings. They are building on rich traditions of service learning, an
approach that puts student classroom work in service of broader social efforts. Many
university-level data science programs engage datasets about the public good, partner with
groups to create impact from student work, and expose students to real work settings where
data can have impact. This stands in stark contrast, but not in opposition, to course offerings
that focus on skills acquisition for the jobs market. Methods of questioning impact, ownership,
and representation in data science projects are widely applicable in industry as well as the non-
profit sector. Significant business risk has come to be attached to unintended effects of poorly
thought through data projects. In short, this new path tries to pivot from the previously
described history, focusing on data projects that create real impact driving educational goals,
social goals, and business goals in service of the social good.
A Short Interlude on Pedagogy
A number of established pedagogies inform the design of courses and learning activities that
introduce students to creating impact on real world settings while learning data science. A
classical behaviorist pedagogy models the learner as an empty vessel, awaiting knowledge to be
poured in. Piaget's "constructivism" responds to this, arguing through work in child psychology
and epistemology that students are active beings that bring their pre-existing understanding of
the world to any learning setting (Piaget, 1952). He offers the concepts of "accommodation" to
describe how learners adjust their existing mental models to build new structures for
understanding while learning, and "assimilation" to describe the process of new information
being adapted to fit a learner's existing mental models. These well-understood theories served
as ground work for Papert's "constructionism", which argues that learning happens best when
creating projects that are driven by student passions and externalized outside their mind's eye
to allow for reflection and shared understanding (Papert, 1980).
Beyond these approaches, we can look to more socially-situated theories to inform bringing
students into the community of practice within the nascent field of data science. Vygotsky's
"zone of proximal development" (ZPD) posits that the presence of a domain expert, and
scaffolded social interactions with them, can help boost a learner through gaps in their learning
in order to increase their skill (Vygotsky, 1980). Lave and Wenger's "legitimate peripheral
participation" (LPP) further centers authentic learning environments and values the
apprenticeship-like relationships that can be formed and drive learning (Lave & Wenger, 1991).
The previously mentioned idea of service learning is well established within American
educational institutions. A service learning approach values experiential learning where
students performing collaborative work with external partners is the pathway to achieving both
learning objectives and community goals. This could include volunteering, community service,
internship within companies or organizations, and field education.
These methods of teaching underlie many of the examples that follow, playing out above and
below the surface in design of activities, courses, and collaborations. Constructivism,
constructionism, ZPD, LPP, and service learning are common threads cited by many of the
projects that follow.
Introducing Ethics and Responsibility
A first challenge taken up by instructors in higher education is that the still-nascent field of data
science has poorly defined ethical norms. This long history of data in the hands of the powerful
hasn't lent itself to easy introspection, despite the groups pushing back on it described above.
Numbers and charts convey authority, rigidity, and correctness in a fluid world; yet data itself is
a reductive capturing of real world phenomenon. Within this background, data science seems
to have borrowed its ethical norms from computer science, and university level computer
science courses seldom focus on applied ethics, responsible use, and industry norms (Quinn,
2006). The prevailing attitude in the domain is that technology is neutral; that responsibility lies
in the hand of the user of any technology not the creator of it. In our current era, the "tech
won't build it movement" pushes back against this. An example of their advocacy can be found
in a 2018 letter to the CEO of cloud service software giant Salesforce arguing that they should
rethink their work with the US Customs and Border Protection agency (ODonovan, 2018). The
rationale of the activists? They argue that "we cannot cede responsibility for the use of the
technology we create, particularly when we have reason to believe that it is being used to aid
practices so irreconcilable to our values." These workers disagree with the idea that tech
creators are not responsible for use, because they saw the tools they built being used to
separate parents and children along the US-Mexico border during the Trump presidency in the
US. This is one emerging example of pushback against the dominant perspective on
responsibility for the use of technology in the domain of computer science, in general that
counter to the norm doesn't gain traction.
Figure 2: The Tacoma Narrows bridge collapsing (Source: Wikipedia Commons)
Looking to other fields might have offered more numerous engaged models for responsible use
of their work. Civil engineering undergraduate education commonly showcases dramatic and
memorable failures of their field's work - shocking video footage of the Tacoma Narrows Bridge
oscillating and falling in high winds is not soon to be forgotten by young students in the field
(see Figure 2). The medical field goes even further, with a longstanding oath to "do no harm"
front and center in their educational journey. In computer science, the story of a clock drift bug
on the US military's Patriot Missile System leading to 28 soldier deaths isn't routinely known by
young computer science students (Wong et al., 2017). Nor is the story of an incorrect software
check on X-ray strength in the Therac-25 leading to the deaths of 4 patients. Borrowing
educational norms on ethics from computer science has left data science learners poorly
served, with a scarcity of examples being introduced in classroom settings despite so many
existing.
Fig 3: Stowage of the British Slave Ship Brookes Under the Regulated Slave Trade Act of 1788
(Source: Wikipedia Commons)
It doesn’t have to be this way - historical case studies of data being used for good do exist, and
set the stage for socially engaged work with impact. There is a set of historical examples that
could be brought into data science classrooms to help introduce a more responsible norm for
data science work in multiple fields. Returning to the time of the fight for the abolition of
slavery in Britain, we find one example in a pamphlet depicting slave transport on the Brookes
sea-born vessel. British abolitionists in the late 1700s created and spread this image of the
Brookes to support their advocacy to abolish the slave trade of the time (Conway, 2012) (see
Figure 3). This disturbing, evocative, and ultimately effective visualization begins in a
surprisingly familiar place - open data (Visualizing Information for Advocacy, 2014). A
parliamentary survey in 1788 established the accurate dimensions of ships used to transport
good for trade across the Atlantic. Abhorrent laws of the same era dictated how much space
was available to each person transported as a slave in this trade (the Regulated Slaves Trade
Act of 1788); in the case of the Brookes this would allow for a maximum of 454 slaves on the
ship. The artist engraving the piece could only fit in 400, and other open government records
showed it had previously carried more than 600 people stolen from their African homes. The
image, what we might now describe as an "infographic", was printed and circulated as part of
larger campaigns against the scourge of slavery. One imagines it was particularly designed to
conjure images of rule-breaking traders packing slaves on ships in inhumane conditions,
perhaps stirring even the hearts of those who choose to avoid thinking about the horrors of
slavery that underpinned much of their daily lives. This story is one of advocates creating an
infographic driven by open data in the 18th century! It would no doubt resonate with students
in data science classrooms today. In fact in my own teaching I use it repeatedly and it
consistently inspires students and draw them into to the history of data work on issues that
significantly impact lives. The story of the Brookes pamphlet motivates my students to work on
data that matters.
Fig 4: Diagram of the Causes of Mortality in the Armies of the East (Source: Wikipedia
Commons)
A second case study of positive impact can be found in the ground-breaking visualizations of
Florence Nightingale. In fact her story stands out as one that is quite often highlighted by
popular sources in the field of data visualization. The recently established, and quickly growing,
Data Visualization Society went so far as to name their online journal "Nightingale" after her.
Alberto Cairo, a leader in the field of data journalism, retells the story of her work in the
conclusion of a recent book (Cairo, 2020). Nightingale created her chart, "the Wedges" as she
called it, to persuade those in power to adopt the policies of the public health faction she
championed; namely to increase expenditure on sanitation policies such as sewers, clean water
supplies, and more. She was arguing with data. Showcasing Nightingale's work is important
along another dimension as well, using her data visualizations introduces a more balanced set
of examples to a field dominated by drawings made by European white men. The seminal book
"The Visual Display of Quantitative Information" by Edward Tufte, used as the bible for many a
data visualization course, includes 113 examples that were created before the year 1900. Just 4
of them were created by women (Borneman, 2020). Building a more representative set of
examples to use in teaching data science is an important piece of the use of Nightingale's
visualizations.
These examples showcase just two historical stories of data being used for social good. The lack
of adoption of ethics case studies in data visualization teaching in higher education is not for
lack of examples. These case studies, and the previous history, are critical for crafting an
informed and situated set of data ethics. There are projects actively working on this need. In
the non-profit and civic space, Data4Democracy has proposed the Global Data Ethics Pledge as
one response. Their primary commitments include "fairness, openness, reliability, trust, and
social benefit." (Data4Democracy/Ethics-Resources, 2017/2021) The Responsible Data
community is a convening in the humanitarian sector that seeks to "prioritise and respond to
the ethical, legal, social and privacy-related challenges that come from using data in new and
different ways in advocacy and social change". Their guidelines and activism are attempting to
build new norms within that sector for responsible data use.
Data science introduced outside of the context of impact does students and the broader field a
disservice. The cultural, economic, and social power of data science has grown so quickly that
learners are often unaware of the potential negative impacts of their work. Algorithms created
without any understanding of the historical contexts of their application are incredibly
dangerous - a crime prediction algorithm built on US policing data must engage the historically
racist nature of policing. Datasets used to inform civic policy statements and priority setting
that doesn't acknowledge its limits cause secondary harms - a survey of housing needs that
doesn't include the voice of homeless residents of a city ignores their plight. Ethics and
responsibility must be front and center when teaching data in service of the social good. Case
studies and examples like those I’ve Introduced serve as strong counterpoints to the historically
dominant uses of data.
Classrooms with Community Impact
The classroom is the most common setting for higher education degree programs. The idea of
the semester-long course can be described as the singular unit of thinking within the
educational realm; many topics are boiled down to the bare essentials based on thinking
through the question of "what can I fit into a semester?". Within these constraints, varying
pedagogies and institutional policies govern how any particular course is structured.
Accordingly, many have adapted the idea of the university-level course to include various
approaches to teaching data that matters. In this section I introduce a set of concrete examples
from university classroom settings, connecting them to the previous historical and pedagogical
discussions.
A fairly basic example can be found in how instructors select themes and sample datasets when
introducing the key elements of data literacy. Far too often introducing data analysis and
visualization utilizes generic sample data that don’t engage student interests, don’t have
impactful possibilities in the real work, and don’t respond to larger movements in society
outside the classroom walls. My own work teaching a "Data Storytelling" undergraduate and
graduate level course in various university settings to non-technical students takes a different
approach. Each semester I select a theme related to the social context of the times, striving to
create enough space for students to find a piece that resonates for them while also
constraining all their work to be related. Any experienced educator will be familiar with this
kind of approach to making sure the student doesn't start with the barrier of an "empty page".
This theme then plays out in the sample data, readings I select, community partnerships, and
student assignments. To date those have included themes such as food security, civic data,
climate change, and justice. Each of these is broad enough to support my role as an instructor
in finding data and related readings, and also allows students to identify sub sections that might
stir their own personal interests. This is built on Friere's idea of activating literacy learning to be
about the power struggles around us in society, and Papert's rush to create spaces that speak
to student passions.
Fig 5: Mural created by students for Food for Free (Source: author)
Another basic example can be found by looking at lightweight partnerships with community
and advocacy groups. Returning to my own teaching, I have often initiated small projects with
external groups to poke holes in the wall of my classrooms. Early in the semester I typically
introduce students to a process for moving from raw data, to asking questions, to finding
stories, to visually telling those stories for some purpose. To put that in practice one year I
partnered with Food for Free, a local group that distributes food to those in need. With a few of
their annual data reports in hand, I facilitated students analyzing the data about food sources
and distributions, sketching a visual design of a story they found in the data, and painting that
as a community mural on canvas (see Figure 5). This mural was presented to our partner group,
and subsequently displayed at public events for years afterwards (this put into practice a "data
mural" process I had previously developed) (Bhargava et al., 2016). The students applied their
just developed process knowledge, moving from data to story, and they saw firsthand the
impact that their newfound skills could have. This is one application of the service learning
approach.
Fig 6: Screenshot of a CRI spreadsheet filled in by a student (Source: author)
In a different semester I more actively partnered with the Northeastern University Health in
Justice Action Lab and the Massachusetts chapter of the American Civil Liberties Union (ACLU)
on another in-classroom lightweight collaboration. Health in Justice is a laboratory that works
to "advance criminal justice reform through a public health lens". In response to the ongoing
reckoning with structural racism in the United States, and specifically the "defund the police"
movement, they created the Carceral Resource Index (CRI), a measure of a city's fiscal
commitment to carceral systems versus health and social support systems. This index provides
a score that compares the relative amount of funding each of those sectors receives in a city's
official budget. In a module on data gathering and cleaning within my semester focused on the
theme of "justice", I assigned students to follow the CRI coding protocol and guide in order to
dig through official budget documents and fill in the provided spreadsheet that computes the
CRI (see Figure 6). Each student processed two city budgets, and then reviewed two
spreadsheets of a partner in the class. These were then submitted to the Health in Justice lab
for review, and ultimately provided to the MA ACLU to use in their advocacy work. This process
mirrors one used in newsrooms on investigative data projects, providing real-world experience
to students in a classroom setting. In addition, the collaboration with the Health in Justice lab is
an example of the pedagogy of LPP in action; students were side-by-side with researchers and
advocates working on the same project.
Another, more integrated, model for having impact while learning data science is to treat the
classroom as a boutique information visualization design firm with community clients. This can
look a number of different ways, from having group members joining class as embedded
students, to having students work on consultant-like teams that pitch and implement designs
against a project specification or statement of needs. These models of deeper integration
require extensive leg-work in advance to recruit and screen partners, and matchmake with
students groups effectively. They also requires significant scaffolding for students to learn how
to do business process management and client communication. Often service learning
departments within the university can help design, scaffold, and sustain those partnerships.
The University of Chicago runs such a course, called the "Civic Data & Technology Clinic". It
brings students learning data science and social good organizations together, specifically
focusing on groups working on questions of social and economic justice, sustainability, and
climate change. This course is offered as part of the Master of Science in Computational
Analysis and Public Policy degree, engaging around 15 students in that program per quarter.
Course projects have included apps to track deforestation, dashboards to predict sea level rise,
and alerting systems related to local air quality. The course was created to "partner our
incredible students and programs here with public interest organizations to leverage data
science, skills, and technology research, but with a real mission point of view, to press change
for good in social and environmental challenges" (New Clinic Leverages Data Science for Social
and Environmental Causes, n.d.). These students worked with messy data, collaborated with
mission driven organizations, and created projects that drove their goals forward in real world
settings.
Another example within the confines of a semester-long course can be found at Northeastern
University, where Professor Michelle Borkin runs a data visualization course that is a prime
model of how to expose students to community impact while learning data science. The teams
of instructors, student teaching assistants, and the on-campus Service Learning office, have
created a Design Study "Lite" Methodology that builds on well-accepted data visualization
design practice (Syeda et al., 2020). Based on existing partnerships the Service Learning office
maintains, they bring in community groups with data science problems in the domains of
education, neighborhood improvement, volunteering, and more. These groups contribute lived
experience, domain knowledge, and a set of relatively ready-to-use, "clean", datasets. Students
volunteer with the groups in order to get to know them better, and over the course of the
semester design and iterate on data visualization projects related to the community groups'
goals and mission.
This collaborative community learning process stands in contrast to the traditional techno-
centric introductions to data visualization in higher education. Typical graduate and
undergraduate level courses cover technical topics one by one over the course of a semester,
evaluating students against a checklist of features, rigor of analytics techniques employed, and
via studio-style critique sessions. Student outcomes differed as you might guess - one noting
that "my service has made me interested in the idea of using data science skills to prompt social
good". Similarly, the partner community organizations commented on how projects "helped
build the capacity to accomplish organizational goals." The course organizers argue that the
service learning methodology was particularly well suited to their design study "lite"
methodology due to three main criteria:
It provided real word data science experience - "I was able to learn how to work with
messy, real data that serves a real purpose";
Provides professional experience for students - "I understand how to communicate with
people in my community more";
Impacts students on a personal level - "being connected to an individual's needs was
powerful".
These testimonials show the power of turning a data science classroom's focus to the external
world, outside the walls of academia. The impacts for the learners, community partners, and
the educational institution itself are all significant.
These examples from in-classroom settings show how various approaches can be put into place
to create impact within traditional models of university education. Even lightweight modules,
like those I describe from my own teaching, can create positive feedback loops within student
learning and provide useful products for collaborating organizations. More deeply integrated
courses like the two described showcase how intertwined the outcomes for students and
partner groups can be. Student work in the classroom setting can significantly impact the
community, thereby introducing them to doing data work with impact as a positive approach to
be normed within the industry. This push back on the historical background described
previously with concrete examples of restructuring data process, ownership, and impact to
serve those in need.
Alternate Learning Paths
While central to most student experiences in higher education, classrooms are not the only
context available. Fellowships, teaching assistantships, postdoctoral research positions,
summer boot-camps, students clubs - these are all typical types of programs housed within
university settings that are integrated into undergraduate or graduate learning programs.
Instructors are leveraging these alternate learning paths in multiple domains, including data
science. Many of these types of programs, because of their non-classroom setting, are in fact
better suited towards engaging the more equitable and community-engaged definition of data
that matters described previously in this chapter. Here I share a number of those examples to
paint a better picture of the innovative programs addressing the subjugating history of data and
flipping it's processes to be in service of the social good.
A number of academic institutes of data science have created fellowship and scholarship
programs to support work in service of the social good. The model that appears to have taken
hold is based on the Data Science for Social Good fellowship started in 2013 at the University of
Chicago (subsequently moved to Carnegie Mellon University). Ten to twenty upper-class
students or graduate students in technical domains are offered sponsored fellowship positions
for a 8-12 week summer course, with an associated stipend. They are paired with technical and
faculty mentors to work on pre-vetted research projects in teams, typically based on data from
an organization that works in some social good setting. The project domains often touch on
education, health, public safety, transportation, international development and more. The
University of Washington, University of Virginia, and Stanford offer very similar fellowship
programs. Many of these efforts are now brought together by the Data Science for Social Good
non-profit organization founded in 2019.
Describing one of these fellowship programs in more detail can be illustrative for understanding
how the goals, logistics and outcomes differ from the in-class experience described previously.
Faidra Monachou, the current student-director of the Stanford DSSG program, provided some
insights and background in an interview to support this chapter. The Stanford program solicits
faculty projects long before the summer fellowship program begins. These faculty often bring
existing collaborations with external community groups, and datasets and research questions. A
team of PhD students, supervised by a postdoctoral student, serve as technical mentors that
meet with program fellows daily. The eight-week long program primarily recruits college
juniors, seniors, and graduate students that have a passion for putting data in service of the
social good. Monachou specifically pointed to her surprise at seeing just how many applicants
had experience in this domain via student clubs or existing research projects at their own
institutions. This suggests using data science for good is a growing trend with young learners.
The Stanford DSSG fellowship is structured to provide authentic and impactful research
experiences for the participants, with a specific focus on building their ability to think through
questions of impact and responsibility. Monachou noted that "we want to make sure they think
about these issues as much as possible." One aspect of the program involves inviting in guest
speakers from university and industry to expose participants to professional practices in the
field. This serves to validate putting these skills in service of the public good, and to create a
network of connections that can be leveraged going forward as students come out into the job
market. In addition, they bring the group together at regular intervals to discuss questions of
ethics, responsibility, and what "good" means. These efforts create a norm of discussing
questions of impact and morals, a critical piece of engaging the historical harms of data-driven
work.
The goals of the DSSG Fellowship at Stanford extend further, to drive more data scientists into
working in the social good sector and having sustained impact for the community partners.
Educationally, many of the participants are being introduced to research processes for the first
time, and join skills-based sessions to introduce them to approaches connected to their
projects. The large set of mentors is intentionally recruited to support this learning. On the
impact side, the partner organizations are supported via the faculty mentors to implement the
technologies built in sustainable ways. Without this support, most likely projects developed by
participants would likely sit on a shelf unused.
Focused fellowship programs like Stanford's DSSG offer an alternate learning path for
experiences building data projects that matter. Their intentional decisions in the design of the
program provide opportunities that look a lot more like the model of apprenticeship,
something not available in classroom settings. Offering students research experiences in the
summer fellowship format creates authentic learning in the setting of a real project that has
concrete goals; these are not lab-based simulations of data science projects.
Another approach to teaching data that matters in higher education settings can be found in
the idea of a "clinic" - a group that provides free expert assistance to community groups. This
model is most prevalent in the field of law, where clinics serve as a critical space for students to
practice their trade in service of those in need on a free "pro bono" basis (literally "for the
good" in Latin). In the US this practice dates back to founding times and is fairly central to the
field of law. The American Bar Association explains the norm, writing that "when society
confers the privilege to practice law on an individual, he or she accepts the responsibility to
promote justice and to make justice equally accessible to all people" (American Bar Association
Standing Committee on Pro Bono and Public Service, n.d.). This leads many who study law to
perform clinic-based services as an elective during their studies, and many institutions require
it. Within the larger field of law, this type of clinical work is celebrated with "Pro Bono Week"
every October.
The pro-bono clinic model offers another path for engaging students to perform data work with
impact. One example inspired by the model can be found at the Data Innovation Project,
offered at the University of Southern Maine's Muskie School of Public Service since 2016. Their
mission statement parallels our working definition of "impact" quite well - founded "in
response to the growing need for local organizations and government entities to use data in
meaningful ways" (Wurwarg, 2016). The program provides:
Free "Data Clinics" for local non-profits where they receive technical assistance;
A series of free workshops to build organizational capacity for finding and collecting
data, finding data-informed insights, and telling compelling data stories;
Public talks with domain experts;
Contract-based technical assistance on the topics of evaluation, data, and outcomes.
This mix of free and contract-based services is designed to improve community organization's
ability to utilize data-informed decision making processes in support of their overall impact.
Staff members have a variety of domain-specific and technology expertise, ranging from data
analysis, to visualization, to evaluation and research methodology.
University of Maine students are offered a funded Community Research Assistantship within
the Data Innovation Project. This engages Master of Public Health and Masters of Public Policy
Management students in work that has concrete community benefits as part of their master's
level degree program. They are trained to support community groups and partnered with local
organizations to work on specific projects over a yearlong fellowship. Focusing mostly on
applied research and evaluation projects, they produce concrete reports and insights for the
partner organizations. The program currently supports 4 graduate level students per year,
developing projects around performance measurement, evaluation processes, community
needs assessments, and analyzing program data with local non-profits..
A different model of bridging higher education and community settings can be found in the
Community Data Science project based at the Department of Communication at the University
of Washington (Hill et al., 2017). Project creators designed workshops and classes to increase
capacity for community members and students to do data work, informed and guided by a
pedagogy of inclusion and the goal of democratizing data science. They specifically position
their work as informed by a feminist pedagogy, and made numerous design decisions with the
goal of creating authentic settings that did not look like traditional classrooms. This embedded
pedagogy attempted to bridge from the learning setting to real world applications. In the 4-
weekend workshops setting, they invited community members to apply for the program that
would introduce programming in Python, basic data analysis libraries and processes, and data
visualization. The 10-week university graduate course covered roughly the same material, but
with more rigor and thoroughness due to the larger set of time. The coordinated co-
development of the community workshop and course offerings points to the underlying
potential gains to focus on an approach of teaching data that matters; classroom-based and
community-based instruction have significant learnings for each other. The authors specifically
cite the previously described socially-situated pedagogies such as LPP as inspiration for their
work.
The alternate learning paths laid out here suggest that many who teach data science in settings
of higher education are looking beyond the norms of in-class instruction. From the clinic model
we see how students can take on longer collaborative community data projects in partnership
with local groups as a central part of their degree program. From the community workshop
model we learn how empowerment and democratization of expertise can be central
educational goals embedded deep within a field itself, mirroring the underlying emancipatory
goal of education itself. These pathways need sustained support and further exploration.
Teach Data That Matters
Students in higher education, and society at large, are poorly served by introductions to data
literacy that privilege tools and technologies over process and impact. Scattered around us is a
litany of projects, collaborations, and companies showcasing data science that disempowers,
further marginalizes, and actively harms real people. A growing practice of teaching "data for
good" in universities provides us with an aspirational alternative; creating courses and projects
where students can cultivate a critical practice within the data sciences. These emerging
professionals will be well-primed to thoughtfully engage the history of extractive practices that
produce much of our raw data, and to understand that any dataset is just a point-in-time
snapshot of a larger phenomenon. This snapshot can still be used to drive productive and
impactful projects that matter in the world, but shouldn't be mistaken for some kind of
objective truth.
The best way to liberate our tools and processes from the oppressive history of data as a driver
of oppression is to intentionally build new processes. Educators are building new structures
within higher ed to poke holes in the walls, connecting data science students to global events
and local struggles. The historical examples of data in service of justice are a critical piece of
inspiring teaching data that matters. With these examples and inspirations in hand, I argue that
we should all work harder to teach data that impacts the social good; it is a critical piece of
ensuring the data-driven harms of the past don't grow and come to fruition yet again.
Bibliography
American Bar Association Standing Committee on Pro Bono and Public Service. (n.d.). A Guide
and Explanation of Pro Bono Services. American Bar Association. Retrieved May 18,
2021, from
https://www.americanbar.org/groups/legal_education/resources/pro_bono/
Batty, M. (2013). Big data, smart cities and city planning. Dialoges in Human Geography, 3(3).
https://doi.org/10.1177/2043820613513390
Benjamin, S., Bhuvaneswari, R., Rajan, P., & Manjunatha. (2007). Bhoomi: E-Governance, Or,
An Anti-Politics Machine Necessary to Globalize Bangalore? Dr. Solomon Benjamin, R
Bhuvaneswari, P. Rajan, Manjunatha Jan 2007 ,. Collaborative for the Advancement of
Studies in Urbanism through Mixed Media.
https://casumm.files.wordpress.com/2008/09/bhoomi-e-governance.pdf
Bhargava, R., & DIgnazio, C. (2015, April). Designing Tools and Activities for Data Literacy
Learners. Data Literacy Workshop. ACM Conference on Web Science, Oxford, UK.
Bhargava, R., Kadouaki, R., Bhargava, E., Castro, G., & DIgnazio, C. (2016). Data Murals: Using
the Arts to Build Data Literacy. The Journal of Community Informatics, 12(3).
Black, E. (2012). IBM and the Holocaust. Dialog Press.
Borneman, E. (2020). Data Visualizations for Perspective Shifts and Communal Cohesion. MIT.
Cairo, A. (2020). How Charts Lie: Getting Smarter about Visual Information. W. W. Norton &
Company.
Calzada, P. J., & Marzal, M. Á. (2013). Incorporating Data Literacy into Information Literacy
Programs: Core Competencies and Contents. Libri, 63(2), 123134.
https://doi.org/10.1515/libri-2013-0010
Census-taking in the ancient world. (2016, January 18). Office for National Statistics.
https://www.ons.gov.uk/census/2011census/howourcensusworks/aboutcensuses/cens
ushistory/censustakingintheancientworld
Conway, A.-M. (2012, June 13). Charts change minds. Eye Magazine: Blog.
http://www.eyemagazine.com/blog/post/charts-change-minds
Data4Democracy/ethics-resources. (2021). Data for Democracy.
https://github.com/Data4Democracy/ethics-resources (Original work published 2017)
Delio, M. (2001, February 12). Did IBM Help Nazis in WWII? Wired.
https://www.wired.com/2001/02/did-ibm-help-nazis-in-wwii/
DIgnazio, C., & Bhargava, R. (2015, September 28). Approaches to Building Big Data Literacy.
Bloomberg Data for Good Exchange 2015, New York, NY, USA.
DIgnazio, C., & Klein, L. F. (2020). Data Feminism. The MIT Press.
Freire, P. (1968). Pedagogy of the Oppressed.
Gayomali, C. (2014, October 2). NYC Taxi Data Blunder Reveals Which Celebs Dont TipAnd
Who Frequents Strip Clubs. Fast Company.
https://www.fastcompany.com/3036573/nyc-taxi-data-blunder-reveals-which-celebs-
dont-tip-and-who-frequents-strip-clubs
Hern, A. (2018, January 28). Fitness tracking app Strava gives away location of secret US army
bases. The Guardian. https://www.theguardian.com/world/2018/jan/28/fitness-
tracking-app-gives-away-location-of-secret-us-army-bases
Hill, B. M., Dailey, D., Guy, R. T., Lewis, B., Matsuzaki, M., & Morgan, J. T. (2017). Democratizing
Data Science: The Community Data Science Workshops and Classes. In S. A. Matei, N.
Jullien, & S. P. Goggins (Eds.), Big Data Factories (pp. 115135). Springer International
Publishing. https://doi.org/10.1007/978-3-319-59186-5_9
Hooker, S. (2018, July 22). Why data for good lacks precision. Towards Data Science.
https://towardsdatascience.com/why-data-for-good-lacks-precision-87fb48e341f1
Lave, J., & Wenger, E. (1991). Situated Learning: Legitimate Peripheral Participation. Cambridge
University Press.
Levi-Strauss, C., & Wilcken, P. (2012). Tristes Tropiques (J. Weightman & D. Weightman, Trans.;
Revised ed. edition). Penguin Classics.
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Hung-Byers, A. (2011).
Big data: The next frontier for innovation, compe- tition, and productivity. McKinsey
Gobal Institute.
https://www.mckinsey.com/~/media/McKinsey/Business%20Functions/McKinsey%20Di
gital/Our%20Insights/Big%20data%20The%20next%20frontier%20for%20innovation/M
GI_big_data_exec_summary.pdf
Medrano, M., & Urton, G. (2018). Toward the Decipherment of a Set of Mid-Colonial Khipus
from the Santa Valley, Coastal Peru. Ethnohistory, 65(1), 123.
https://doi.org/10.1215/00141801-4260638
Milner, Y., & Traub, A. (2021). Data Capitalism and Algorithmic Racism (p. 43).
https://www.demos.org/research/data-capitalism-and-algorithmic-racism
National Academies of Sciences, Engineering, and Medicine. (2020). Meeting #9: Motivating
Data Science Education Through Social Good. In Rountable on Data Science
Postsecondary Education: A Compliation of Meeting Highlights (p. 223). National
Academics Press. https://doi.org/10.17226/25804
New Clinic Leverages Data Science for Social and Environmental Causes. (n.d.). Center for Data
and Computing at the Univeristy of Chicago. Retrieved May 18, 2021, from
https://cdac.uchicago.edu/news/new-clinic-leverages-data-science-for-social-and-
environmental-causes/
Noble, S. U. (2018). Algorithms of Oppression: How Search Engines Reinforce Racism (Illustrated
edition). NYU Press.
ODonovan, C. (2018, June 26). Employees Of Another Major Tech Company Are Petitioning
Government Contracts. BuzzFeed News.
https://www.buzzfeednews.com/article/carolineodonovan/salesforce-employees-push-
back-against-company-contract
Olusoga, D. (2015, July 11). The history of British slave ownership has been buried: Now its
scale can be revealed. The Guardian.
http://www.theguardian.com/world/2015/jul/12/british-history-slavery-buried-scale-
revealed
ONeil, C. (2016). Weapons of Math Destruction: How Big Data Increases Inequality and
Threatens Democracy (1 edition). Crown.
Papert, S. (1980). Mindstorms: Children, computers, and powerful ideas. Basic Books.
Perez, C. C. (2021). Invisible Women: Data Bias in a World Designed for Men. Harry N. Abrams.
Piaget, J. (1952). The origins of intelligence in children (Vols. 8, No. 5). International Universities
Press.
Quinn, M. J. (2006). On teaching computer ethics within a computer science department.
Science and Engineering Ethics, 12(2), 335343. https://doi.org/10.1007/s11948-006-
0032-9
Ryan, L., Silver, D., Laramee, R. S., & Ebert, D. (2019). Teaching Data Visualization as a Skill. IEEE
Computer Graphics and Applications, 39(2), 95103.
https://doi.org/10.1109/MCG.2018.2889526
Syeda, U. H., Murali, P., Roe, L., Berkey, B., & Borkin, M. A. (2020). Design Study Lite
Methodology: Expediting Design Studies and Enabling the Synergy of Visualization
Pedagogy and Social Good. Proceedings of the 2020 CHI Conference on Human Factors in
Computing Systems, 113. https://doi.org/10.1145/3313831.3376829
Visualizing Information for Advocacy (Second). (2014). Tactical Technology Collective.
Vygotsky, L. (1980). Mind in society: The development of higher psychological processes.
Harvard university press.
West, S. M. (2019). Data Capitalism: Redefining the Logics of Surveillance and Privacy. Business
& Society, 58(1), 2041. https://doi.org/10.1177/0007650317718185
Wong, W. E., Li, X., & Laplante, P. A. (2017). Be more familiar with our enemies and pave the
way forward: A review of the roles bugs played in software failures. Journal of Systems
and Software, 133, 6894. https://doi.org/10.1016/j.jss.2017.06.069
Wurwarg, B. (2016). Data Innovation Project. University of Southern Maine.
https://usm.maine.edu/sites/default/files/research/DIP%20Info%20Sheet_2017_FINAL.
pdf
Zuboff, S. (2019). The Age of Surveillance Capitalism: The Fight for a Human Future at the New
Frontier of Power (1st edition). PublicAffairs.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
The interdisciplinary field of data science, which applies techniques from computer science and statistics to address questions across domains, has enjoyed recent considerable growth and interest. This emergence also extends to undergraduate education, whereby a growing number of institutions now offer degree programs in data science. However, there is considerable variation in what the field actually entails and, by extension, differences in how undergraduate programs prepare students for data-intensive careers. We used two seminal frameworks for data science education to evaluate undergraduate data science programs at a subset of 4-year institutions in the United States; developing and applying a rubric, we assessed how well each program met the guidelines of each of the frameworks. Most programs scored high in statistics and computer science and low in domain-specific education, ethics, and areas of communication. Moreover, the academic unit administering the degree program significantly influenced the course-load distribution of computer science and statistics/mathematics courses. We conclude that current data science undergraduate programs provide solid grounding in computational and statistical approaches, yet may not deliver sufficient context in terms of domain knowledge and ethical considerations necessary for appropriate data science applications. Additional refinement of the expectations for undergraduate data science education is warranted.
Article
Full-text available
Open data has potential value as a material for use in learning activities. However, approaches to harnessing this are not well understood or in mainstream use in education. In this research, early adopters from a diverse range of educational projects and teaching settings were interviewed to explore their rationale for using open data in teaching, how suitable activity designs could be achieved, and the practical challenges of using open data. A thematic analysis was conducted to identify patterns and relationships in these open data-based practices that have already emerged. A document analysis of teaching materials and other related artefacts was used to augment and validate the findings. Drawing on this, common approaches and issues are identified, and a conceptual framework to support greater use of open data by educators is described. This paper also highlights where existing concepts in education and educational technology research, including inquiry-based learning, authenticity, motivation, dialogue, and personalisation can help us to understand the value and challenges of using open data in education.
Article
Current efforts to build data literacy focus on technology-centered approaches, overlooking creative non-digital opportunities. This case study is an example of how to implement a Popular Education-inspired approach to building participatory and impactful data literacy using a set of visual arts activities with students at an alternative school in Belo Horizonte, Brazil. As a result of the project data literacy among participants increased, and the project initiated a sustained interest within the school community in using data to tell stories and create social change.
Article
As algorithmic decision-making and data collection become pervasive in higher education, how can educators make sense of the systems that shape life and learning in the twenty-first century? This paper outlines a systematic literature review that investigated gaps in the current framing of data and faculty development, and explores how these gaps prevent the formulation of potential pathways and principles for fostering educators’ data literacy. The analysis of 137 papers through classification by relevant categories and key word mapping shows that there is little attention on higher education teachers. It also makes clear that most approaches to educators’ data literacy address management and technical abilities, with less emphasis on critical, ethical and personal approaches to datafication in education. The authors conceptualize this situation as a ‘complicated’ approach to data literacy in the academic profession, as opposed to a complex vision which would bundle management and technical skills together with a critical, systemic approach to professional learning and data.
Article
This paper explores the meaning of the term "skill" in the context of information (data) visualization and its place in the labor market. It examines the visualization skills and software competencies that are in high demand in industry today, and the ramifications for teaching Data Visualization for professional students in higher education.
Book
As seen in Wired and Time A revealing look at how negative biases against women of color are embedded in search engine results and algorithms Run a Google search for “black girls”—what will you find? “Big Booty” and other sexually explicit terms are likely to come up as top search terms. But, if you type in “white girls,” the results are radically different. The suggested porn sites and un-moderated discussions about “why black women are so sassy” or “why black women are so angry” presents a disturbing portrait of black womanhood in modern society. In Algorithms of Oppression, Safiya Umoja Noble challenges the idea that search engines like Google offer an equal playing field for all forms of ideas, identities, and activities. Data discrimination is a real social problem; Noble argues that the combination of private interests in promoting certain sites, along with the monopoly status of a relatively small number of Internet search engines, leads to a biased set of search algorithms that privilege whiteness and discriminate against people of color, specifically women of color. Through an analysis of textual and media searches as well as extensive research on paid online advertising, Noble exposes a culture of racism and sexism in the way discoverability is created online. As search engines and their related companies grow in importance—operating as a source for email, a major vehicle for primary and secondary school learning, and beyond—understanding and reversing these disquieting trends and discriminatory practices is of utmost importance. An original, surprising and, at times, disturbing account of bias on the internet, Algorithms of Oppression contributes to our understanding of how racism is created, maintained, and disseminated in the 21st century.