Content uploaded by Marc Tadaki
Author content
All content in this area was uploaded by Marc Tadaki on Apr 19, 2018
Content may be subject to copyright.
Can big data tame a “naughty”world?
Jennifer Ann Salmond
School of Environment, University of Auckland
Marc Tadaki
Department of Geography, University of British Columbia
Mark Dickson
School of Environment, University of Auckland
Key Messages
The implications of the big data revolution for the environmental sciences are potentially significant
and require critical interrogation.
Thematic examination of big data definitions can encourage scientists to consider how big
environmental data may alter environmental scientific priorities.
In the environmental sciences, big data are most valuable when complementary to (and conversing
with) traditional data and approaches.
The big data revolution is changing the way data is produced, analyzed, and valued. In the environmental
sciences, big data has made it onto the agenda through calls to utilize the current data “deluge”more
effectively and a desire for more complete measurement. However, a wider philosophical and ethical critique
of big data is needed to assess its utility for environmental explanation. We distil three definitions relevant to
the environmental sciences, focusing on the characteristics that make data “big,”the methods of analysis
used, and the models of explanation favoured by big data analysts. We critically interrogate the new
priorities implicit within big environmental data, and for a historical analogue we compare the big data
moment in the environmental sciences to the period in the 1970s when systems theory was being invoked as
a paradigmatic shift. Like systems theory, big data is poised to become the new lingua franca of many fields
of scientific inquiry. Here we echo Barbara Kennedy’s caution that whilst new methods of analysis seem
fascinating and promissory, scientists must always be accountable to the “naughty”world in which we live,
rather than the clean abstractions that we seek to generate.
Keywords: Environmental science, big data, methodology, critical physical geography
Les m
egadonn
ees peuvent-elles apprivoiser un monde «r
efractaire »?
La r
evolution des m
egadonn
ees est en voie de modifier la mani
ere dont on produit, analyse et valorise le
savoir. Les sciences de l’environnement accordent aux m
egadonn
ees une attention prioritaire par la
promotion d’un emploi plus efficace du «d
eluge »de donn
ees et le souhait de se doter d’une mesure plus
exhaustive. C’est toutefois sur la base d’une critique philosophique et
ethique
elargie des m
egadonn
ees que
leur utilit
e en mati
ere environnementale peut ^
etre
etablie. Trois d
efinitions pertinentes pour les sciences de
l’environnement sont
elabor
ees
apartir des
el
ements qui distinguent les m
egadonn
ees, les m
ethodes
d’analyse utilis
ees et les mod
eles d’explication privil
egi
es par les chercheurs qui se servent des m
egadonn
ees.
Nous examinons d’un œil critique les nouvelles priorit
es issues des m
egadonn
ees environnementales et nous
comparons, en
etablissant un parall
ele historique, l’
epoque des m
egadonn
ees dans les sciences de
Correspondence to/Adresse de correspondance: Dr. Jennifer Salmond, School of Environment, University of Auckland, Private Bag 92019,
Auckland 1142, New Zealand. Email/Courriel: j.salmond@auckland.ac.nz
The Canadian Geographer / Le G
eographe canadien 2017, xx(xx): 1–12
DOI: 10.1111/cag.12338
© 2017 Canadian Association of Geographers / L'Association canadienne des g
eographes
The Canadian Geographer
Le Géographe canadien
The Canadian Geographer
Le Géographe canadien
l’environnement
a celle des ann
ees 1970 alors qu’on all
eguait que la th
eorie des syst
emes entra^
ınait un
changement paradigmatique.
Al’instar de la th
eorie des syst
emes, les m
egadonn
ees sont sur le point de
devenir la nouvelle langue v
ehiculaire de nombreux domaines d’activit
es scientifiques. Nous reprenons ici
l’avertissement
emis par Barbara Kennedy qui soulignait que bien que les nouvelles m
ethodes d’analyse
puissent para^
ıtre fascinantes et prometteuses, cela ne dispense pas les scientifiques de toujours rendre
compte du monde «r
efractaire »dans lequel nous vivons, plut^
ot que des abstractions pures auxquelles nous
aspirons.
Mots cl
es : sciences de l’environnement, m
egadonn
ees, m
ethodologie, g
eographie physique critique
Introduction: Big data, big claims?
The exponential increase in our ability to acquire,
store, transmit, and analyze data has led various
commentators to suggest that a world of “big data”
i
has arrived, a world in which research questions can
be answered by data directly, without reference to
theoretical frameworks (Miller and Goodchild 2015).
By covering massive spatial scales and diverse
scientific domains, big data and accompanying
analytical techniques offer the possibility of identi-
fying new patterns and predictors from the chaos
and complexity of human and environmental pro-
cesses (Death 2015;O’Sullivan and Manson 2015). Big
data, it is claimed, has the potential to provide
unprecedented insights into environmental systems
and human behaviour and offer an improved basis
for decision making in a new era of data-informed
policy (White et al. 2015). Big data has also been
touted as key to “solving the city”(Lehrer 2010) and
“revolutionizing”understandings of climate change
vulnerability (Ford et al. 2016), potentially leading to
“miraculous solutions to well-worn problems”
(Crang 2015, 351). With the data deluge upon us,
some claim that machine learning analyses of big
data may be able to succeed where “science has
failed”in solving complex environmental problems
(Death 2015, 595). In this emerging infrastructure
and analytics of big data, a fourth paradigm in
scientific thoughthas been proposed (Hey et al. 2009;
Kitchin 2014), heralding “the end of theory”(Ander-
son 2008) and requiring an across-the-board revalu-
ing of scientific practices (Elliott et al. 2016).
Amid this optimism, environmental scientists
have proceeded with collecting and analyzing big
data. Often such studies focus on case studies
which highlight the potential value of combining
social and biophysical data to enhance our under-
standing of complex environmental problems. A
range of different topics and approaches have
been explored, using passively and actively
acquired social and environmental datasets, ap-
plied to problems at a variety of different temporal
and spatial scales. For example, Fleming et al.
(2014) combine climate and health data to propose
new models to evaluate the impacts of climate
change on human populations and the work by
Chariton et al. (2016) shows how big data can be
used to analyze multiple environmental stressors
in aquatic environments. Other studies have
combined data from social media and biophysical
elements to generate information about how
humans interact with the environment, presenting
this information in ways that can directly support
decision making—for example, in conservation
(Levin et al. 2015; Verma et al. 2016); climate
change adaptation (Ford et al. 2016); air pollution
abatement (Steinle et al. 2013); and planning
(Dunkel 2015). More commonly, however, we
either see authors proposing new analytical tools
to organize environmental big datasets more
efficiently (e.g., Baumann et al. 2016; Chariton
et al. 2016), or proposing rules for organizing data
into categories that facilitate both wider use and
application (e.g., La Salle et al. 2016). Agenda-
setting discussions on the value of big data and
future research directions are also increasingly
common (e.g, Pimm et al. 2015; White et al. 2015;
Laurance et al. 2016 in ecology; Viles 2016 in
geomorphology). Across all of these applications
of big data in the environmental sciences, the
question needs to be asked: what makes these
applications distinctive (or new) approaches to
scientific inquiry? If big data is to become more
than just a buzzword with which to channel and
secure research funding, it demands careful
intellectual scrutiny.
i
We refer to big data as both a singular noun (big data as a
discourse), and as a descriptive plural (data that exhibit big
qualities). Sometimes the sentence subject is “big data”whereas in
other situations the (big) “data”themselves are the subject.
The Canadian Geographer / Le G
eographe canadien 2017, xx(xx): 1–12
2 Jennifer A. Salmond, Marc Tadaki, and Mark Dickson
In this paper, we synthesize and explore what big
data means (or might mean) for physical geography
and the environmental sciences. In some ways,
environmental data has already been “big”for some
time and there is no clear single moment in time or
space at which a step transition from the use of large
datasets to big datasets can be identified. Large
global climate datasets and reanalyses, for example,
have been used to make headlines since the 1980s
and have provided a major impetus for political
action on climate change. From this perspective, a
sceptical scientist might wonder whether big data
represents anything new at all, or whether it is
just another “fashion”in environmental science
(Sherman 1996). Therefore, in this paper, we aim to
distil what is really new and distinct about big data
as relates to the environmental sciences.
We proceed by distinguishing three thematic
definitions of big data that have been developed
in the (mostly social science) literature. We then
apply and explore these themes as they manifest in
the environmental sciences, in order to identify
and critically interrogate the value commitments,
assumptions, and challenges emerging with big
environmental data. We synthesize relevant cri-
tiques of big data from human geography, connect-
ing these insights to examples and concerns in the
environmental sciences. Then, we step back from
the big data revolution to situate ourselves as
conscious actors in power-laden processes that
are actively reconfiguring environmental science.
We excavate some disciplinary wisdom about para-
digm shifts in physical geography by drawing a
historical comparison with the putative shift in the
1970s towards systems theory as a unifying frame-
work for physical geography. By revisiting the
critique of systems theory posed by Barbara
Kennedy (1979), we consider how physical geogra-
phers might ground their responses to such para-
digm shifts.
What is big data?
Despite widespread usage of the term “big data”
across the sciences and humanities, industry, and
government, there appears to be little consensus as
to what constitutes “big”data (Graham and Shelton
2013; O’Sullivan and Manson 2015; Kitchin and
McArdle 2016). In the earth and environmental
sciences, big datasets often loosely refer to
automated data acquisition methods that are
incredibly fast and voluminous compared to tradi-
tional techniques, and which are analyzed using
data-driven methodologies (e.g., Li et al. 2015;
Moosavi et al. 2015; Gabrys 2016). Technological
advances in measurement and telemetry have made
automated mass data acquisition possible even in
the most dynamic and remote environments.
Unprecedented amounts of data are available to
describe the earth-atmosphere system and its
interaction with human activities across previously
unimaginable temporal and spatial scales (Hsu et al.
2015; Krause et al. 2015; Schroeder and Taylor 2015;
Ziegler et al. 2015; Viles 2016).
In the midst of this expansion in data gathering
activities the concept of big data appeared, yet
even in hindsight it is not obvious when this
happened or what makes data “big”rather than
simply “many.”For example, the huge volumes of
remote sensing data from satellites, seismic data,
turbulence data, or ecological data are not in any
sense “small”in scale, volume, or scope, but
neither could they be usefully considered “big
data.”Therefore, to understand what is at stake
when scientists and funding bodies invoke “big
data,”we should begin by clarifying its difference
from what we term “ordinary data.”
While big data does not enjoy a consensus
definition, it is often characterized by (i) the unique
qualities and inherent structural characteristics of
the data itself; (ii) the (new) processes and techni-
ques required to transform numbers into knowl-
edge or “actionable science”; and (iii) a particular
way of making claims about the world—a way of
doing science (Kitchin 2014). We discuss each in
turn.
i) Inherent characteristics of big data
Big data is frequently differentiated from ordinary
data by three V’s: volume, variety, and velocity
(Miller and Goodchild 2015). Big data is usually
characterized by continuous, often real-time, flows
of multiple data sources that are varied in complex-
ity, type, and origin, and represent multiple scales in
time and space. The increasing volumes of automat-
ically generated social data are now frequently being
harnessed and combined with physical data in
scientific studies. For example, Levin et al. (2015)
use satellite data and social media (in the form of
photos from Flickr) to inform conservation studies,
The Canadian Geographer / Le G
eographe canadien 2017, xx(xx): 1–12
Environmental big data 3
and Fleming et al. (2014) use social, economic, and
health data to analyze the relations between climate
change and human health. The inclusion of both
social and physical parameters often distinguishes
many “big”datasets. Further V’shavealsobeen
suggested (Baumann et al. 2016; Kitchin and McArdle
2016), including: viability, value, and veracity. These
pertain to the raw, unchecked, often uncalibrated
nature of data flows that prohibit standard quality
control mechanisms. These additional V’s are rele-
vant for environmentalapplications; for example, big
datasets that include air quality parameters may
include crowd-sourced data or modelled parameters
which have larger uncertainties than ordinary data
(Gura 2013; Mayer-Schonberger and Cukier 2013). In
summary, big data usually (a) contains information
about populations rather than samples, and (b) is
messy (unverified, uncalibrated, lacking quality
control) and lacks metadata (Mayer-Schonberger
and Cukier 2013).
Big datasets can also be characterized by their
source. Often big data is collected opportunistically
by third parties using automated data gathering
techniques and can therefore be multi-purpose,
non-disciplinary, multi-institutional, and multi-
national. Kitchin (2013) distinguishes between
specific sources of big data: directed sources
(measured digitally by a human operator); auto-
mated sources (inherent function of a device or
system such as a mobile phone); and volunteered
sources (gifted by users such as citizen scientists).
Environmental science applications have primarily
focused on direct sources, but the potential for the
automated, volunteer, and crowd data sources is on
the rise, such that environmental scientists require
an awareness of the nature and limitations of social
science datasets.
ii) How data are handled
Some definitions of big data emphasize the tools
and techniques required to put the data to work.
This approach draws attention to the (changing)
balance of decision making between human ana-
lysts and non-human technologies. In the face of
the sensor-driven data “deluge”it has become
apparent that traditional analytical tools and
techniques are insufficient (Death 2015; Elliott
et al. 2016). In their place, new methods of data
computation, storage, management, representa-
tion, and statistical analysis have emerged. One
set of computational technologies that distin-
guishes big data analysis is the increasing reliance
on statistics and non-linear systems identification
(e.g., genetic algorithms and machine learning) to
reveal relations and patterns, infer dependencies,
and predict outcomes (Goldberg and Holland
1988). Big datasets must be handled carefully as
the data may violate the underlying assumptions
of statistical tests. For example, many big datasets
are based on self-selected populations (e.g., social
media or cell phone users) and thus are not truly
random samples. New protocols for quantifying
the bias in information derived from user-gener-
ated and volunteered geographic data are required
to manage this resource effectively (Dickinson
et al. 2010; Goodchild and Li 2012).
iii) Big data as a way of doing science
While the above definitions of big data are useful
guides, they are not determinative of what counts as
big data, because the term itself possesses a kind of
social capital. What are the conditions that might
motivate the use of the “big data”label?
Human geographers have highlighted that big
data is often characterized by a mindset that values
certain types of knowledge claims (Graham and
Shelton 2013; Miller and Goodchild 2015). Such a
mindset prioritizes: (i) positivistic approaches to
data analysis unbiased by prior assumptions, expe-
rience, or theory; (ii) pseudo-empirical approaches
that assume that datascapes accurately and
completely represent the world; and (iii) a generalist
view of the world that can be reduced to large-scale
comparisons of multiple parameters and issues
across multiple scales. If we place big data into a
historical and disciplinary context, it could be
argued that it represents a paradigmatic shift in
scientific explanation analogous to systems theory
of the 1960s and 1970s (e.g., Kennedy and Chorley
1971); big data conjures images of a “world of data,”
presenting the promise of a global master dataset
where anything can be drawn out and compared to
anything else. When Anderson (2008) provocatively
declared “the end of theory,”he was referring to the
possibility of testing any conceivable scientific
hypothesis with big data, thus ending the need for
subject specialisms. In the environmental sciences,
Elliott et al. (2016) suggest that the era of theory-
driven hypothesis testing should give way to a
more iterative relationship with massive datasets.
The Canadian Geographer / Le G
eographe canadien 2017, xx(xx): 1–12
4 Jennifer A. Salmond, Marc Tadaki, and Mark Dickson
Whereas the systems theory of the 1970s called for
all experts to become systems experts, big data
suggests that we will no longer need environmental
scientists, only data scientists.
For the environmental sciences, this universalist
mindset may have particular appeal for areas of the
discipline suffering from “physics envy”(O’Sullivan
and Manson 2015). Prestigious science journals
appear to favour publications presenting global
datasets, where the emphasis is on “big”empirical
claims about as many geographical regions and as
many subject domains as possible. Funding agen-
cies are being urged to promote the collection of
datasets that can be shared, used for multiple
purposes, or (re)analyzed in multiple ways to ensure
value for money (Costello and Wieczorek 2014;
Specht et al. 2015). According to this third, socio-
logical definition of big data, the value of these
perceived differences in “big”datasets for geo-
graphical and environmental sciences lies in a
social, cultural, and political framework that values
objective, quantitative, controlled, replicable data
leading to universal claims and globalized knowl-
edge, over subjective and descriptive accounts and
local knowledge (Clifford 2009).
How might big data shape the study of
earth-atmosphere systems and their
human interactions?
How might these three thematic definitions of big
data affect the theoretical, methodological, and
institutional priorities of environmental science?
In this section, we engage in a critical physical
geographical analysis of big data, by describing and
interrogating how environmental big data may
be beginning to initiate significant changes to the
priorities and conduct of environmental science
(e.g., see Lane 2016, this issue). We invoke a broad
definition of critical physical geography as a
commitment to expose and interrogate the relation-
ships between values, science, and environmental
outcomes (Tadaki et al. 2015). This definition
extends the sub-disciplinary approach proposed
by Lave et al. (2014), which emphasizes explicit
linkages between social theory and biophysical
approaches. Our objectives are to identify and
critically interrogate the scientific practices associ-
ated with big data, and examine how “big data”as a
research framing may be involved in producing
particular societal values and outcomes. We con-
tribute from a scientific perspective to emerging
conversations about the politics of big data in the
environmental sciences (Gabrys 2016).
First, we consider how big data is poised to affect
the observational priorities of environmental sci-
ence, and what this means for the analysis and
understanding of environmental systems. Second,
we examine how a turn to big data is likely to
emphasize the value of (if not require) new analyti-
cal techniques, and we reflect on what the centrali-
zation of expertise implied by big data means for
subject area experts. Third, we consider how the
current focus on big data mimics the movement
towards systems theory in physical geography in
the 1970s. If big data represents a way of doing
science, historical reflection can help us situate and
understand what is at stake as we move towards a
new paradigm in science.
Changing observational priorities: Organizing
the world for big data
Most environmental science projects have tradition-
ally been question-driven, with data collected in
order to answer a specific place-based environmen-
tal question. However, with new technologies it is
tempting to measure everywhere and anywhere,
everything and anything, because it is cheap, easy,
possible, and might be useful for someone or at
some point. This transition may seem generic, and
perhaps harmless, but it raises the prospect of
(i) embedding a rupture between the scientist and
the environment they study; (ii) making the con-
ditions of data collection invisible or unavailable for
interrogation; and (iii) concealing the theoretical
choices underlying data collection behind questions
of technology and measurement. Instead of herald-
ing the “end of theory”(Anderson 2008), big data
could instead promote an ignorance of theory.
Instituting a rupture between scientist and
environment. The prospect of big data-driven
field campaigns has the potential to alter the
who/what/where/when/why of environmental
observation. For the earth and environmental
sciences these concerns are most tractable in
relation to fieldwork (Church 2013). Big data
collection in the field conveys an image of a team
of technicians entering the field with the mandate to
collect data about everything (or more realistically,
The Canadian Geographer / Le G
eographe canadien 2017, xx(xx): 1–12
Environmental big data 5
as much as possible given time and technological
constraints), or perhaps an army of citizen
scientists and volunteers providing a continuous
flow of data from low-cost sensors. Here, big data
collectors may not need to understand the system
they are studying, or make decisions about when,
where, or what to report; instead decisions might be
constrained by expert instruction (e.g., guidebooks),
available secondary sources (e.g., cell phones), or
standardized observational technologies.
However, Church (2013, 184) cautions that al-
though “[r]ecent technological developments have
enhanced our ability to comprehend the landscape
system,”the scientist’s efforts to understand com-
plex and emergent environmental processes “will
surely require comprehensive field experience if we
are to regain the whole landscape view of the early
field workers.”Change in financial, temporal, and
spatial constraints on data collection imply that the
experiences and theoretical frameworks tradition-
ally employed by experts to choose the study site,
timing of the field campaign, and parameters
measured will no longer be directly connected to,
or in iterative conversation with, the point of data
collection. The personal experiences of scientists
have previously acted to pre-emptively filter the data
collected, and while such experiences bias the data-
set, at least the bias is knowableand explicit as a part
of the inductive process.
A turn towards big data also implies an increasing
shift away from question-driven data collection
towards a situation where questions are modified
to suit the data. Shearmur (2015) observes that big
data are rarely suited to addressing a specific
environmental question. Instead, researchers may
have to ask different questions and adapt their
research to thedata available. That is, “big”databases
effectively subsidize research that can make use of
existing data. This could lead to scientific methods
increasingly being aligned with the data available,
rather than data collected to satisfy methodological
requirements or advance theoretical understanding
(Miller and Goodchild 2015). Research becomes more
opportunistic and responsive rather than planned. In
this way, the data can shape research and might be
constrained by (non-expert) data collectors in ways
not previously experienced (Shearmur 2015).
Concealing context of data acquisition. Unlike
datasets that are collected personally or by a group
of close colleagues, end-users of big datasets may
know very little about the data—who collected it,
how it was collected, when, where, why. If metadata
are lacking, a user might assume that all data within
a set are created equally and have the same
uncertainty. Big data approaches often emphasize
that uncertainty can be reduced by over-sampling,
that is, by collecting a sample of a large enough size
(Shearmur 2015). This can lead to the assumption
that big data lack observational bias, or that
observational bias is removed by over-sampling.
A consequence of the rupture between scientist
and environmental big data is that the consistency
of the data may be unknown, due to difficulties of
calibrating large numbers of instruments and/or
controlling for differences in data collection meth-
odologies. Whilst such concerns are not new to large
data projects with ordinary data—for example,
Specht et al. (2015) for discussion in ecology;
Soranno et al. (2015) in lake ecosystems; and Hsu
et al. (2015) in experimental geomorphology—the
issues are magnified in big datasets where physical
environments and social environments become
intertwined. Whilst developing a culture of more
transparent exchange and aggregation could en-
hance environmental science, we can also debate
whether a trend towards the homogenization of
environmental observation to suit big data formats
is even desirable.
It could be argued that a big data scattergun
approach to data collection improves the likelihood
of a positive outcome from the field experiment, by
reducing sampling bias in time and space and
reducing the probability that a critical parameter
was not measured. However, biases are not simply
removed, so much as shifted around. For instance,
new sampling biases are created by technological
limitations on data collection—for example, GPS
and cell phone coverage, flight paths for drones, and
assumptions about what can and should be
counted.
The illusion of theory-free data. Data
acquisition is always undertaken within some
theoretical framework (Odoni and Lane 2010;
Rhoads and Thorn 2011). Even “big”datasets
contain value-based assumptions that are made by
someone (e.g., environmental scientists, industry,
commercial analysts, operators) about what kinds
of data are valuable. Advances in low-cost sensors
and high-resolution observational technologies
reduce the need for the field scientist to choose
The Canadian Geographer / Le G
eographe canadien 2017, xx(xx): 1–12
6 Jennifer A. Salmond, Marc Tadaki, and Mark Dickson
between time and space, but this displaces rather
than removes decisions regarding the quality and
representativeness of data (e.g., Krause et al. 2015).
Changes in observational priorities implied by big
data can be described as a shift from strategic,
theory-justified measurement (data scarcity)
towards conditions of a massive flow of data (data
deluge) where scientists become receivers and
manipulators of data, rather than producers of
data. However, by rendering the theory of observa-
tion and measurement invisible or unavailable to
analysts and users, the overall external validity of
big data analyses may throw into doubt whether big
data really can (or indeed should) herald “the end of
theory.”
Changing analytical priorities: From landscapes
to datascapes
The discourse about big data emphasizes particular
kinds of large scale, quantitative, positivistic anal-
yses (Kitchin 2014), and this re-valuing has the
potential to change perceptions about what con-
stitutes rigorous analysis in the environmental
sciences (Elliott et al. 2016). As such, big data may
(i) push explanation towards particular modes and
(ii) frame analysis as a task for reductionist
computation rather than human judgement,
thereby reducing landscapes to datascapes, which
(iii) lead to a re-configuring of what constitutes
environmental expertise.
Big data for environmental analysis suggests a
shift (return) to positivist forms of explanation.
Users of big data would probably accept that big
data prioritizes large-n datasets as well as the
generation of broad claims about diverse popu-
lations, domains, and socio-biophysical contexts
(Kitchin 2014). Such a positivist empirical
approach enables new, perhaps unexpected, pre-
dictors to be identified from the apparent chaos
and complexity of a vast number of measure-
ments and possible relationships among variables
(e.g., Krause et al. 2015; Ziegler et al. 2015; Elliott
et al. 2016). Here, everything that can be codified
as data can be compared with everything else.
Instead of biasing our understanding through the
explicit use of pre-existing theories, all possible
correlations between variables can be tested and
quantified (Anderson 2008). However, the short-
comings of reductionist and positivist ap-
proaches have been identified and critiqued by
many scientists, including physical geographers
(Rhoads and Thorn 2011; Slaymaker 2016). Cor-
relation is not the same as causation, and most
physical geographers would agree that automated
pattern recognition run on environmental data is
no substitute for an embodied and theoretically
reflexive engagement with the biophysical envi-
ronment (Church 2013).
A second shift invoked by big data involves a
movement in environmental scientists’roles to-
wards the analysis of correlations automatically
highlighted by algorithms, the aim being to
identify patterns that make physical sense and
develop theoretical arguments and empirical hy-
potheses to test these (Peters et al. 2014).
Unguided, automated exploration of big datasets
is perceived as a productive way to analyze big
data and compare multiple variables across space
and time (e.g., Death 2015; Krause et al. 2015;
Pagano et al. 2016). However, results from auto-
mated techniques can be misleading if they are not
interpreted within the context of existing knowl-
edge frameworks and the limitations of the dataset
(O’Sullivan and Manson 2015).
Dickson and Perry (2016) make this point in the
context of a coastal landslide dataset. In their
example, a comparatively small dataset on coastal
landslides and potential landslide drivers was
extracted from digital elevation models, aerial
photographs, and fieldwork. Three machine-
learning approaches were then used to automati-
cally detect the likely controls on landsliding
failure. All methods agreed and overall suggest
potential to correctly predict a high proportion
(>85%) of landslides. This approach has consid-
erable potential for coastal management, and
similar analyses of big datasets are likely to yield
similar (apparent) success, but analysts need to
look more deeply when reporting results. Dickson
and Perry (2016) caution that important ques-
tions need to be asked about error sources,
including absent or missing data. For instance,
in coastal landslide studies, sites prone to
landsliding (e.g., cliffs undercut by strong wave
action) may also be the sites of most rapid
evidence removal, meaning that preservation
bias influences the dataset. This type of issue
could easily by overlooked in a burst of big data
collection and automated analysis.
A third shift in analytical priorities implied by big
data involves a reconfiguring of what constitutes an
The Canadian Geographer / Le G
eographe canadien 2017, xx(xx): 1–12
Environmental big data 7
environmental expert and what a valid and authori-
tative analysis looks like. If landscapes can be
converted into datascapes, then what is needed is
not analysis of landscapes but analysis of data-
scapes. Further, these changes in observational and
methodological priorities could lead towards the
prioritization of reanalysis of open source/public
datasets over field science, triggering the “rise of the
data scientist”(Levy 2015) and the death of the
environmental expert (Death 2015). There is no
doubt that technicians, volunteers, and citizen
scientists, together with industrial and commercial
data analysts, can each play valuable and important
roles in collection and analysis of big datasets, but
we agree with Pagano et al. (2016) that the human
judgment of the environmental scientist remains
critical to meaningful interpretation.
As a potential consequence of this reconfigura-
tion of environmental expertise, it is important to
consider how big data will affect research fund-
ing, data infrastructure (with associated
path-dependencies), and the centralization of
knowledge. Funding bodies such as governments,
industries, and civil society actors are demanding
information that is temporally and spatially
specific to particular problems but also “predic-
tive, prescriptive, and scalable”(Hampton et al.
2013, 156). Here the big data revolution could see
a channelling of investment into those projects,
disciplines, and investigators who convince fun-
ders that their approach is “big”and that “big is
good”(see Gabrys 2016). Already, some environ-
mental scientists are championing a move to-
wards big data taking particular (mandated)
forms (Baumann et al. 2016). Further, while big
data is already shaping the terrain of science
funding, this process is creating new inequalities
in environmental science at the global and local
scales. Any putative shift from one way of doing
things to another is going to have winners and
losers; the questions that scientists may need to
ask themselves are: “who is gaining scientific
authority from the big data revolution, who is
losing, and what does this mean?”Not only does
big data shift prestige and authority towards
generalist data experts, but this landscape of
expertiseisunevenacrosstheglobe.Data
experts, and the environmental experts they
may displace, may not be equally distributed,
nor are all data experts able to make equally
powerful or recognized claims with big data.
Big data as a new paradigm: A
disciplinary perspective
Sherman (1996, 89) contends that for scientists,
“disciplinary self-examination is especially critical
in times of fundamental uncertainty and change.”In
the face of the data deluge (Kitchin 2014), what
values might we espouse as geographers as we
encounter, join, resist, and transform the big data
revolution?
In search of disciplinary wisdom on this matter,
we find it instructive to consider how physical
geographers have responded to previous shifts in
priorities relating to the study of earth-atmosphere
systems and their interactions with human and non-
human life.
Perhaps the closest analogy in physical geography
occurred in the 1960s and 1970s as physical
geographers championed systems theory as a
universal approach to structure environmental
inquiry (Kennedy and Chorley 1971; Terjung
1976). In this period, debates raged about the merits
of reductionist (law-finding) environmental inquiry
as opposed to traditional descriptive accounts of
particular places and environments. Technological
advances in computational power promised to
replace the traditional modes of geographical
description with mathematical tools oriented
towards the control and prediction of environmen-
tal change through systems science (Kennedy and
Chorley 1971). Commenting on these developments
in 1979, Oxford geographer Barbara Kennedy
observed that in mainstream accounts of systems
science,
the geographer ... is urged to move at once into the
conversion of existing information [about landscapes]
into the mathematics thought to be most appropriate
for control and prediction. The object of interest is to
become a system of symbols; the tools of the new
trade are to be those of information theory and
control engineering. (Kennedy 1979, 550)
Despite her early involvement promoting a sys-
tems approach in physical geography (Kennedy and
Chorley 1971), Kennedy became critical of how
systems analysis had been subsequently framed
and pursued. She perceived an overemphasis on
abstraction and mathematical formalization, noting
that “our subject matter [i.e., landscape] almost
inevitably has a history and that history will frequently
prove very important indeed in determining future
The Canadian Geographer / Le G
eographe canadien 2017, xx(xx): 1–12
8 Jennifer A. Salmond, Marc Tadaki, and Mark Dickson
developments and non-developments”(Kennedy
1979, 551). The mathematical abstractions of systems
theory tended to treat landscapes as if they did not
have histories; they were merely mechanistic entities
that could be represented and controlled through
calculation. Instead, Kennedy maintained that the real
world was historical, multi-scaled, and fundamentally
complex. Real world environments were “naughty,”
and they did not behave in line with the abstractions of
human observers. She concluded, “By all means let the
mathematical modelling of the naughty world con-
tinue apace, but let us not confuse those models with
reality”(Kennedy 1979, 558).
Kennedy’s cautionary note resonates with our
current position, as internal and external influen-
ces push environmental scientists towards be-
coming data analysts more focused on datascapes
than actual environments. Big data are part of a
wider set of (often ungrounded) assumptions that
more data and more computing prowess
will “reveal all”and “solve all our problems”
(Shearmur 2015).
Wyly (2014) calls for geographers to become
critically engaged with the “new quantitative
revolution.”In this context, it is crucial that the
context of big datasets must be acknowledged:
who collected the data, how were they collected,
and why were data collected, or not, at those
points in time and space. At the same time,
geographers must continue to acknowledge the
existence and implications of “naughty”worlds,
both in principle and practice (Clifford 2001). Big
data may present eloquent and seemingly compre-
hensive accounts of our social and biophysical
environments, but let us not confuse these
representations with reality.
Big data for a “naughty”world
This article has synthesized some definitional
concepts of big data to help physical geographers
and environmental scientists understand what is at
stake as big data become ever more prominent in the
way we study earth-atmosphere systems and their
interaction with human activity. We have sought to
identify and interrogate the ways in which big
data may influence how environmental science is
conducted. Big data will open new and legitimate
scientific questions, but we live in a naughty world
of geographically specific and irreducibly complex
biophysical and human environments. So, how
should we approach big data?
It is worth re-emphasizing, as a starting point,
that there is significant value in generating and
using big datasets. New questions can be asked, new
patterns recognized, and new linkages identified
and tested (Elliott et al. 2016). New technologies
have transformed the temporal and spatial resolu-
tion of datasets and provided coverage in terrain
that was previously inaccessible owing to varying
social and physical constraints. Indeed, big data can
address some of the fundamental problems in
environmental science and geographical enquiry:
the “transcendence of scale, the complexity of
geographical phenomena and the origins of com-
plexity”(Clifford 2001, 387). However, there will
always be digital divides, uneven data shadows, and
bias in how technology is used and data reported.
Even if it was possible to measure everything
moving forward, data from the past can never be
recaptured in the same way. The answer to the old
question of “how much data is enough”is continu-
ally changing as it is negotiated through cultural
contexts. The jury is still out on whether it makes
sense to measure everything, how to avoid missing
key events, and the influence of confounding
variables.
Big data is a mode of producing data or a quasi-
objective collective “truth”about the environment
by representing the world as datascapes. To become
useful in the form of knowledge or actionable
science, big data must be stored, processed, and
analyzed within the context of the social, economic,
and political frameworks that created it. Critical
physical geography presents a pertinent (sub)disci-
plinary identification, as well as a “conversation
space”which can support reflexive physical geog-
raphers in identifying the human values and
institutions shaping the collection of the data. It is
as important to understand these institutional and
ethical aspects of big data research as it is to
understand the theoretical bias of the lone field
scientist observing a single case study (see Tadaki
2016).
As our collective storehouse and flow of data
continue to expand, there is a need to change the
types of questions we ask about the world and our
approaches to answering them; this requires more
than new tools and observations. Before jumping on
the bandwagon of ever more data acquisition it is
worth pausing to ask whether our new abilities to
The Canadian Geographer / Le G
eographe canadien 2017, xx(xx): 1–12
Environmental big data 9
measure take us towards our perceived goals (e.g.,
classification, understanding, control, prediction,
management) as scientists, academics, environmen-
tal planners, and decision makers. Arguably, being
able to describe an environment in more layers of
increased temporal and spatial detail doesn’t
necessarily translate into new ways of solving the
problem (Church 2013).
“Geography has had long-running conversations
and arguments about the role of quantitative
methods”(O’Sullivan and Manson 2015, 715).
Whilst it is not fair to assume that computational
methods have invalid epistemologies or are some-
how antithetical to critical research (Wyly 2014),
there is a danger that big data supports a
deterministic view of the world where the “pro-
posed solution to residual uncertainty is more data
and better computers”(Shearmur 2015, 966). It is
sobering to note that, despite the advances in
observation and computational power that have
resulted in an exponential increase in the amount
of meteorological data from observations and
models over the past few decades, the role of the
human meteorologist still adds value to the
forecast, improving the prediction by 10––25%
(Kreinovich and Ouncharoen 2015). This ratio
has not changed significantly with time, money,
improved theoretical understanding, computer
models, or data availability. Clearly, environmental
scientists add value through experience and
wisdom when interpreting the outputs of auto-
mated routines (Pagano et al. 2016). Explanation
may not be required for progress in predictive
performance, but it certainly enhances it and
data cannot replace the value of decision making
and sceptical human understanding (Miller and
Goodchild 2015).
Big data provide a valuable perspective on
human-environment processes and interactions,
but “big”perspectives benefit from standing in
complement to (and conversation with) traditional
data and approaches (Dunkel 2015). It is worth
considering how big data might be used in non-
reductionist ways (Shelton et al. 2015); the forms
this could take, however, remain loosely imagined
and there are not yet any simple or obvious
examples of how this might be achieved. Is big
data destined to be constrained by the limitations
of its roots in positivism? Can big data be
mobilized in ways that enrich (rather than sim-
plify) understanding, that level (rather than further
stratify) the global playing field of science and
knowledge, and that acknowledge (rather than
ignore) the naughtiness of our human and ecologi-
cal communities? Perhaps one important step
towards this goal might include re-centring the
agency of human actors (scientists) in generating,
collecting, and collating big data. By humanizing
big data, we might better account for the choices
underpinning our analyses, which can help us to
more meaningfully understand and distinguish
our representations from the naughty world.
References
Anderson, C. 2008. The end of theory. Wired Magazine 16(7): 2––6.
Baumann, P., P. Mazzetti, J. Ungar, R. Barbera, D. Barboni,
A. Beccati, L. Bigagli, et al. 2016. Big data analytics for earth
sciences: The EarthServer approach. International Journal of
Digital Earth 9(1): 3––29.
Chariton, A. A., M. Sun, J. Gibson, J. A. Webb, K. M. Y. Leung,
C. W. Hickey, and G. C. Hose. 2016. Emergent technologies
and analytical approaches for understanding the effects of
multiple stressors in aquatic environments. Marine and
Freshwater Research 67: 414––428.
Church, M. 2013. Refocusing geomorphology: Field work in
four acts. Geomorphology 200: 184––192.
Clifford, N. J. 2001. Editorial: Physical geography—The naughty
world revisited. Transactions of the Institute of British
Geographers 26(4): 387––389.
—— . 2009. Globalization: A Physical Geography perspective.
Progress in Physical Geography 33(1): 5––16.
Costello, M. J., and J. Wieczorek. 2014. Best practice for
biodiversity data management and publication. Biological
Conservation 173: 68––73.
Crang, M. 2015. The promises and perils of a digital geo-
humanities. Cultural Geographies 22(2): 351––360.
Death, R. G. 2015. An environmental crisis: Science has failed; let
us send in the machines. Wiley Interdisciplinary Reviews: Water
2(6): 595––600.
Dickinson, J. L., B. Zuckerberg, and D. N. Bonter. 2010. Citizen
science as an ecological research tool: Challenges and benefits.
Annual Review of Ecology, Evolution, and Systematics 41:
149––72.
Dickson, M. E., and G. L. W. Perry. 2016. Identifying the
controls on coastal cliff landslides using machine-learning
approaches. Environmental Modelling and Software 76:
117––27.
Dunkel, A. 2015. Visualizing the perceived environment using
crowdsourced photo geodata. Landscape and Urban Planning
142: 173––186.
Elliott, K. C., K. S. Cheruvelil, G. M. Montgomery, and P. M.
Sorrano. 2016. Conceptions of good science in our data-rich
world. BioScience 66(10): 880––889.
Fleming, L. E., A. Haines, B. Golding, A. Kessel, A. Cichowska,
C. E. Sabel, M. H. Depledge, et al. 2014. Data mashups:
Potential contribution to decision support on climate change
and health. International Journal of Environmental Research
and Public Health 11(2): 1725––1746.
The Canadian Geographer / Le G
eographe canadien 2017, xx(xx): 1–12
10 Jennifer A. Salmond, Marc Tadaki, and Mark Dickson
Ford, J. D., S. E. Tilleard, L. Berrang-Ford, M. Araos, R. Biesbroek,
A. C. Lesnikowski, G. K. MacDonald, et al. 2016. Big data has big
potential for applications to climate change adaptation.
Proceedings of the National Academy of Sciences USA 113(39):
10729––10732.
Gabrys, J. 2016. Practicing, materialising and contesting environmen-
tal data. Big Data & Society. doi: 10.1177/2053951716
673391.
Goldberg, D. E., and J. H. Holland. 1988. Genetic algorithms and
machine learning. Machine Learning 3(2): 95––99.
Goodchild, M. F., and L. Li. 2012. Assuring the quality of
volunteered geographic information. Spatial Statistics 1:
110––120.
Graham, M., and T. Shelton. 2013. Geography and the future of big
data, big data and the future of geography. Dialogues in
Human Geography 3(3): 255––261.
Gura, T.2013. Citizen science: Amateurexperts. Nature 496(7444):
259––261.
Hampton, S. E., C. A. Strasser, J. T. Tewksbury, W. K. Gram,
A. E. Budden, A. L. Batcheller, C. D. Duke, and J. H. Porter. 2013.
Big data and the future of ecology. Frontiers in Ecology and the
Environment (11): 156––162.
Hey, A. J. G., S. Tansley, and K. M. Tolle. 2009. The fourth
paradigm: Data-intensive scientific discovery. Redmond, WA:
Microsoft Research.
Hsu, L., R. L. Martin, B. McElroy, K. Litwin-Miller, and W. Kim. 2015.
Data management, sharing, and reuse in experimental
geomorphology: Challenges, strategies, and scientific oppor-
tunities. Geomorphology 244: 180––189.
Kennedy, B. A. 1979. A naughty world. Transactions of the
Institute of British Geographers 4(4): 550––558.
Kennedy, B. A., and R. J. Chorley. 1971. Physical geography:
A systems approach. London: Prentice-Hall.
Kitchin, R. 2013. Big data and human geography: Opportunities,
challenges and risks. Dialogues in Human Geography 3(3):
262––267.
—— . 2014. Big Data, new epistemologies and paradigm shifts. Big
Data & Society April––June: 1––12.
Kitchin, R., and G. McArdle. 2016. What makes Big Data, Big Data?
Exploring the ontological characteristics of 26 datasets. Big
Data & Society January–
–June: 1––10.
Krause, S., J. Lewandowski, C. N. Dahm, and K. Tockne. 2015.
Frontiers in real-time ecohydrology—A paradigm shift in
understanding complex environmental systems. Ecohydrology
8(4): 529––537.
Kreinovich, V., and R. Ouncharoen. 2015. Fuzzy (and interval)
techniques in the age of Big Data: An overview with
applications to environmental science, geosciences, engineer-
ing, and medicine. International Journal of Uncertainty
Fuzziness and Knowledge-Based Systems 23(Suppl. 1): 75––89.
Lane, S. N. 2016. Slow science, the geographical expedition
and critical physical geography. The Canadian Geographer.
doi: 10.1111/cag.12329
La Salle, J., K. J. Williams, and C. Moritz. 2016. Biodiversity
analysis in the digital era. Philosophical Transactions of the
Royal Society B 371: 20150337.
Laurance, W. F., F. Achard, S. Peedell, and S. Schmitt. 2016. Big
data, big opportunities. Frontiers in Ecology and the Environ-
ment 14(7): 347.
Lave R., M. W. Wilson, E. Barron, C. Biermann, M. Carey, M. Doyle,
C. Duvall, et al. 2014. Intervention: Critical physical geography.
The Canadian Geographer 58(1): 1––10
Lehrer, J. 2010. A physicist solves the city. New York Times
Magazine, 19 December, MM46.
Levin, N., S. Kark, and D. Crandall. 2015. Where have all the
people gone? Enhancing global conservation us ing night
lights and social media. Ecological Applications 25(8): 2153––
2167.
Levy, J. 2015. The rise of the data scientist. Business 2 Community:
Technology & Innovation. http://www.business2community.
com/big-data/rise-data-scientist-
01282878#jiv7JBYivPDOAKoX.99.
Li, Y., Y. Zhu, W. Yin, Y. Liu, G. Shi, and Z. Han. 2015. Prediction of
high resolution spatial-temporal air pollutant map from Big
Data sources. In Big Data computing and communications,
ed. Y. Wang, H. Xiong, S. Argamon, X. Y. Li, and J. Z. Li. Basel,
Switzerland: Springer International Publishing, 273––282.
Mayer-Schonberger, V., and K. Cukier. 2013. Big Data: A
revolution that will change how we live, work and think.
London, UK: John Murray.
Miller, H. J., and M. F. Goodchild. 2015. Data-driven geography.
Geojournal 80(4): 449––461.
Moosavi, V., G. Aschwanden, and E. Velasco. 2015. Finding
candidate locations for aerosol pollution monitoring at street
level using a data-driven methodology. Atmospheric
Measurement Techniques 8(9): 3563––3575.
O’Sullivan, D., and S. M. Manson. 2015. Do physicists have
geography envy? And what can geographers learn from it?
Annals of the Association of American Geographers 105(4):
704––722.
Odoni, N. A., and S. N. Lane. 2010. Knowledge-theoretic models in
hydrology. Progress in Physical Geography 34(2): 151––171.
Pagano, T. C., F. Pappenberger, A. W. Wood, M.-H. Ramos, A.
Persson, and B. Anderson. 2016. Automation and human
expertise in operational river forecasting. WIREs Water.
doi: 10.1002/wat2.163V.
Peters, D. P. C., K. M. Havstad, J. Cushing, C. Tweedie, O. Fuentes,
and N. Villanueva-Rosales. 2014. Harnessing the power of big
data: Infusing the scientific method with machine learning to
transform ecology. Ecosphere 5(6): 1––15.
Pimm, S. L., S. Alibhai, R. Bergl, A. Dehgan, C. Giri, Z. Jewell,
L. Joppa, R. Kays, and S. Loarie. 2015. Emerging technologies to
conserve biodiversity. Trends in Ecology & Evolution 30(11):
685––696.
Rhoads, B. L., and C. E. Thorn. 2011. The role and character of
theory in geomorphology. In The SAGE Handbook of
Geomorphology, ed. K. J. Gregory and A. S. Goudie. London,
UK: Sage, 59––77.
Schroeder, R., and L. Taylor. 2015. Big data and Wikipedia
research: Social science knowledge across disciplinary divides.
Information Communication & Society 18(9): 1039––1056.
Shearmur, R. 2015. Dazzled by data: Big Data, the census and
urban geography. Urban Geography 36(7): 965––968.
Shelton, T., A. Poorthuis, and M. Zook. 2015. Social media and the
city: Rethinking urban socio-spatial inequality using user-
generated geographic information. Landscape and Urban
Planning 142: 198––211.
Sherman, D. J. 1996. Fashion in geomorphology. In The scientific
nature of geomorphology: Proceedings of the 27th Binghamton
Symposium in Geomorphology, ed. B. L. Rhoads and C. E.
Thorn. Chichester, UK: John Wiley & Sons Ltd., 87––114.
Slaymaker, O. 2016. Physical geographers’understanding of
the real world. The Canadian Geographer. doi: 10.1111/
cag.12334
The Canadian Geographer / Le G
eographe canadien 2017, xx(xx): 1–12
Environmental big data 11
Soranno, P. A., E. G. Bissell, K. S. Cheruvelil, S. T. Christel,
S. M. Collins, C. E. Fergus, C. T. Filstrup, et al. 2015. Building a
multi-scaled geospatial temporal ecology database from
disparate data sources: Fostering open science and data reuse.
Gigascience 4. doi: 10.1186/s13742-015-0067-4.
Specht, A., S. Guru, L. Houghton, L. Keniger, P. Driver, E. G. Ritchie,
K. Lai, and A. Treloar. 2015. Data management challenges in
analysis and synthesis in the ecosystem sciences. Science of
the Total Environment 534: 144––158.
Steinle, S., S. Reis, and C. E. Sabel. 2013. Quantifying human
exposure to air pollution—Moving from static monitoring to
spatio-temporally resolved personal exposure assessment.
Science of the Total Environment 443: 184––193.
Tadaki, M. 2016. Rethinking the role of critique in physical
geography. The Canadian Geographer. doi: 10.1111/cag.12299.
Tadaki, M., G. Brierley, M. Dickson, R. Le Heron and J. A. Salmond.
2015. Cultivating critical practices in physical geography. The
Geographical Journal 181(2): 160––171.
Terjung, W. H. 1976. Climatology for geographers. Annals of the
Association of American Geographers 66(2): 199––220.
Verma, A., R. van der Wal, and A. Fischer. 2016. Imagining
wildlife: New technologies and animal censuses, maps and
museums. Geoforum 75: 75––86.
Viles, H. 2016. Technology and geomorphology: Are
improvements in data collection techniques transforming
geomorphic science? Geomorphology 270(1): 121––133.
White, R. L., A. E. Sutton, R. Salguero-G
omez, T. C. Bray,
H. Campbell, E. Cieraadn, N. Geekiyanage, et al. 2015.
The next generation of action ecology: Novel approaches
towards global ecological research. Ecosphere 6(8): 134.
Wyly, E. 2014. The new quantitative revolution. Dialogues in
Human Geography 4(1): 26––38.
Ziegler,C.R.,J.A.Webb,S.B.Norton,A.S.Pullin,andA.H.
Melcher. 2015. Digital repository of associations between
environmental variables: A new resource to facilitate
knowledge synthesis. Ecological Indicators 53: 61––69.
The Canadian Geographer / Le G
eographe canadien 2017, xx(xx): 1–12
12 Jennifer A. Salmond, Marc Tadaki, and Mark Dickson