ArticlePDF Available
This article was downloaded by: [McGill University Library]
On: 16 June 2015, At: 11:56
Publisher: Routledge
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered
office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
Click for updates
Urban Geography
Publication details, including instructions for authors and
subscription information:
Dazzled by data: Big Data, the census
and urban geography
Richard Shearmur
School of Urban Planning, McGill University, Room 400
Macdonald-Harrington Building, 815 rue Sherbrooke Ouest,
Montréal, Québec H3A OC2, Canada
Published online: 10 Jun 2015.
To cite this article: Richard Shearmur (2015): Dazzled by data: Big Data, the census and urban
geography, Urban Geography, DOI: 10.1080/02723638.2015.1050922
To link to this article:
Taylor & Francis makes every effort to ensure the accuracy of all the information (the
“Content”) contained in the publications on our platform. However, Taylor & Francis,
our agents, and our licensors make no representations or warranties whatsoever as to
the accuracy, completeness, or suitability for any purpose of the Content. Any opinions
and views expressed in this publication are the opinions and views of the authors,
and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content
should not be relied upon and should be independently verified with primary sources
of information. Taylor and Francis shall not be liable for any losses, actions, claims,
proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or
howsoever caused arising directly or indirectly in connection with, in relation to or arising
out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any
substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,
systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &
Conditions of access and use can be found at
Downloaded by [McGill University Library] at 11:56 16 June 2015
Dazzled by data: Big Data, the census and urban geography
Richard Shearmur*
School of Urban Planning, McGill University, Room 400 Macdonald-Harrington Building, 815 rue
Sherbrooke Ouest, Montréal, Québec H3A OC2, Canada
About five years ago I wrote an editorial describing the role that the census (and other
data sources that are authoritative, open to scrutiny, representative of the entire population,
and resting on slowly evolving and relatively consensual definitions) play in building a
shared imaginary and vocabulary that can be used to describe society, highlight changes ,
point out injustices, and identify geographic regularities, exceptions, and peculiarities
(Shearmur, 2010). At the time the Canadian long-form census, the data-gathering exercise
which had allowed Canadian social scientists to track changes over the years, had been
abandoned. Five years later, the results of the 2011 National Hous ehold Survey are in, and
they are virtually unusable to describe Canadian society, or to document trendssuch as
the suspected spatial and social polarization of incomeswhich have (probably) been
I, and many other social science researchers, have voiced these concerns on and off
over the last five years. A common response has been to consider our concerns anachro-
nistic: why on earth worry about the loss of census data when Big Data are here? The
marvels, infinite possibilities and sheer newness of Big Data are contrasted with the staid
and limited informat ion thatit is thoughtcan be gleaned from the census. For exam-
ple, Facebook can use its information to track the formation and dissolution of networks
in real time, and cell phone companies can map the movements of their customers: can the
census do that? New York City analysts, by searching their massive databases for
correlations, can predict which drain covers are likely to blow up (Mayer-Schönberger
& Cukier, 2014): can the census do that? Our security agencies are probably better
informed than we are about where we will be tomorrow: can the census do that? A
group of students has analysed Foursquare check-ins to estab lish neighbourhood zones
within cities
: can the census do that? Furthermore, we are told that things will only get
better: researchers at Google are building computers based on neural networks that can
digest almost infinite masses of Big Data and use them to learn new things (i.e. the
computers learn how to learn, following the probabilistic and hierarchical logic of neural
networks) and to think. A brave new world is upon us: more data and more computing
capacity are going to reveal all and solve our problems, hubris reminiscent of the post-war
cyberneticists (Mirowski, 2002), data-driven regional scientists of the 1960s and some
GIS analysts of the 1990s. The Big Data vision takes us right back to Laplaces
positivistic demon, the imaginarybut now, we are told, realisableentity which,
armed with perfect information, will be able to predict the future, taking all humanness,
imperfection and doubt out of our lives (Laplace, 1814). The Big Data view of the world
Urban Geography, 2015
© 2015 Taylor & Francis
Downloaded by [McGill University Library] at 11:56 16 June 2015
is one of absolute determinism, and the proposed solution to residual uncertainty is more
data and better computers.
Of course, these new data and computational techniques are powerful tools, and they
allow us to better apprehend and analyse the thing world. Businesses can use them to
analyse, influence and predict (in a limited way) the market behaviour of their customers.
Transport agencies can better manage flows and congestion, and combinations of topo-
graphical, vegetation and other featu res can be analysed and visualised far better than
before. Most of these technologies have useful applications, but also have limitations: they
are just as error-prone as any other system that bases its decisions on past trends and
correlations, and, being complex and inscru table, errors are often difficult to pinpoint.
Individualssuch as innocent travellers algorithmically selected to be on no-fly listsget
crushed, and whilst administrative systems have always done this, Big Data will only
exacerbate, magnify and accelerate the problem. Likewise, no amount of Big Data can
predict entirely novel phenomenasomething that humans with imagination have some-
times been able to do, and, indeed, to creat e.
But do Big Data and neural computers tell us much about society? To what extent can
the most powerful neural network ima ginable, which possesses all the Big Data in the
world, really grapple with human desires, with issues of justice, and with deliberative and
conflictual processes where there is no correct answer, no rabbit to pull out of a hat, but
clashes of will, persuasion, emotions, life and death? Furthermore, apart from learning
about these things second-hand, what can a computereven one that has taught itself to
learn from Big Datareally know about fear, joy, hate or even, more prosaically, about
lying in the grass with the sun shining on its face?
There are two major flaws to the Big Data vision, flaws that have been discussed for
the last 200 years, but which are worth recalling given the role now assigned to Big Data.
First, Big Data can only be about codifiable and digitised information. Big Data and
powerful computerseven those capable of learning”—abstract totally from the human
corporeal and emotional experience which is alien to them. Since society is not a machine
pace Laplaces demonno amount of information and computing can be equated with
understanding. Understandingunless one alters its definition to mean correlations and
predictions based on codified past experienceis intrinsically human and calls upon both
remembering and forgetting, calls upon choices, values and theories which carry meaning
for people. There are many understandings of social phenomena, not a singl e under-
standing, and these differing perspectives are irreconcilable with the determinism implicit
behind Big Data rhetoric. It is only if Big Data become society, peripheralising the human,
that their claim to know all will have some (inhuman) validity.
Second, and in line with critiques of the surveillance socie ty, Big Data may appear to
successfully understand and predict things, but only because powerful interests behind
them are shaping the world. Thus, for instance, if Big Data are used to identify terrorists,
but if the notion of terrorist is defined by the algorithms and ideologies fed into Big
Data computers, thentautologicallyBig Data will appear successful at this task.
Likewise, if Big Data pick up trends, and are then used to facilitate those trends, then
the data cease to be a tool for understanding the world but rather one for shaping it.
Given these two fundamental points, I will now return to the more limited question
which motivates this editorial: can Big Datathe type currently being generated and used
tell us more about society than census-type data? There are no doubt divergent opinions
on this, but I suggest that Big Data can reveal different things, can generate some useful
ideas, but cannot repla ce census -type data (see Graham & Shelton, 2013).
2 R. Shearmur
Downloaded by [McGill University Library] at 11:56 16 June 2015
The reasons for this are straightforward: however big the data, Big Data are not about
society, but about users and markets. They are therefore inherently biased in that they do
not track people who fall outside the particular markets or activities being tracked. This is
why these data are incre dibly useful to operators, but only in narrow areas despite their
size. Furthermore, Big Data collected passivelycannot provide complex cross-tabula-
tions linking individuals to a wide variety of attributes, to families, households, neigh-
bourhoods and jobs. They can often be used to infer relationships, gathering imperfect
information from a wide variety of sources, but such inferences run into common
statistical problems such as the ecological fallacy, non-representative sampling bias and
self-selection bias.
Another limitation of Big Data is fluidity of categories and of definitions: the
concepts tha t un de rpin Big Data collection make short-t erm operational sense to the
data gathe rer s, but do n ot reflect cate gor ies, classifications and concepts developed
through slow public deliberation and dialogue over a number of yearscenturies in
some ca ses when it comes to censuses (Shearmur, 2010). Thus, the concepts and
definitions that structure Big Data are rarely what researchers need: rather, researchers
may a dapt their conce pts to the data availablean oppo rtu nistic way of conducting
research that may lead to interesting observations but that w il l often bypass ideas
important to academic and social debate. The concepts will have been implicitly
shaped and constrained by the data gatherers and providers , a further way in which
Big Data sha pe, rather tha n reflect, society.
Having said this, the power of Big Data should not be underestimated. They can
reveal new dynamics, can allow for the study of certain processes in real time and can
highlight relationships and correlations that may pass unnoticed using classical methods
and data. As such, for the purposes of social science (and urban geography in particu-
lar), t hey can serve to generate hypotheses inductively: but however useful inductive
reasoning may be, theory and imagination are necessary to understand and interpret
observations. Furthermore, even inferential reasoning requires unbiased data: for infer-
ences that extend beyond the particular markets or operations for which data are
gathered, well-selected samples of the target population remain necessary. To under-
stand how society as a whole is structured and evolves, census-type data are inescapably
essential: and, of course, census data provide a sampling framework against which the
biases of Big Data can be estimated. Without the census there is no such thing as a
population against which sampling and representativity can be assessed.
Big Data are a welcome and interesting new tool which will provide new insights into
urban geographic and other social processesif they become widely accessible to
researchers. However, given that they are generated by and for businesses, transport
authorities and similar bodies for operational purposes, their principal applications will
remain within the confines of those operations. The danger is that they are dazzling, they
are big and they lookto governments such as the Canadian one, whisper ed to by Big
Business, which understands the power of Big Datapretty much like other data except
there are more of them and they are cheaper to collect.
The potentiality and limits of Big Data (and of associated advances in computing)
need to be critically explored and understood. Likewise, the continued importance of
gathering census-type data in order for society to be imagined, tracked and apprehended
as a wholenot as a series of superimposed markets and operationsneeds to be
reaffirmed. And finally, the messy, human, emotion-driven, political nature of social
processes, which neither census, Big Data nor computers of any stripe can capture, should
Urban Geography 3
Downloaded by [McGill University Library] at 11:56 16 June 2015
be remembered every time hubristic projects such as Googles DeepMindthe aim of
which is to solve intelligence (Simonite, 2014)are invoked.
2. Set against this hubristic discourse, there is a growing body of work that looks critically at Big
Data. For instance, Boyd and Crawford (2011) present six provocations foror limitations of
Big Data which overlap some of the points being made in this editorial.
3. Mahrt and Scharkow (2013) discuss many of the limitations of using Big Data in social science
research, and Graham and Shelton (2013) provide a thoughtful discussion on the potentials and
pitfalls of Big Data in human geography.
Boyd, Danah, & Crawford, Kate (2011). Provocations for a cultural, technological and scholarly
phenomenon. Information, Communication and Society, 15(2), 662679.
Graham, Mark, & Shelton, Taylor (2013). Geography and the future of Big Data, Big Data and the
future of geography. Dialogues in Human Geography, 3(3), 255261.
Laplace, Pierre-Simon (1814). Essai philosophique sur les probabilités. Paris: Courcier. Retrieved
Mahrt, Merja, & Scharkow, Michael (2013). The value of Big Data in digital media research.
Journal of Broadcasting & Electronic Media, 57(1), 2033.
Mayer-Schönberger, Victor, & Cukier, Kenneth (2014). Big Data. New York, NY: Mariner Books.
Mirowski, Philip (2002). Machine dreams: Economics becomes a cyborg science. Cambridge:
Cambridge University Press.
Shearmur, Richard (2010). EditorialA world without data? The unintended consequences of
fashion in geography. Urban Geography, 31(8), 10091017.
Simonite, Tom (2014, December 2). Googles intelligence designer. MIT Technology Review.
Retrieved from
4 R. Shearmur
Downloaded by [McGill University Library] at 11:56 16 June 2015
... O presente trabalho foi realizado com apoio da Coordenação de Aperfeiçoamento de Pessoal de Nível Superior -Brasil (CAPES) -Código de Financiamento 001. Bass, M. S., Finer, M., Jenkins, C. N., et al. (2010) (SHEARMUR, 2015). Ao tratar do uso destes dados em estudos urbanos, Feitosa (2020) ressalta que tal característica pode restringir nossa visão de "urbano", assim como de sociedade. ...
Conference Paper
A deep learning model (U-Net) was trained to map forest degradation using Planet imagery (4.77 m resolution) in the Jamari National Forest at the Brazilian Amazon. Preliminary results showed an overall accuracy of 66%.
... The link between the BD and the territorial disciplines was subsequent. A first contribution involving the themes of urban geography and BD analytics highlighted the rigidity of the use of the census, and the fact that, thanks to the use of BD, new phenomena and interrelations place-human behaviours can be found [13]. In describing the key role that big data can play in supporting urban development, researchers start to suggest the adoption of smart city concept along with the implementation of big data applications to reach a required level of sustainability and improve the living standard [14]. ...
... Multi-source data refers to data sets with various sources, characteristics, properties, and structures that are complex, heterogeneous, dynamic, and widely dispersed [6][7][8]. Current urban research primarily consists of the organic fusion of traditional data [9] and emerging data [10]. Typically, to create a multi-source heterogeneous database, academics combine a variety of emerging data sources with traditional data. ...
Full-text available
Due to a lack of guidance in urban systems thinking, China’s rapid urbanization has intensified the interactions and coercive effects between the various urban space subsystems. As a result, “urban diseases” such as environmental pollution, frequent earthquakes, and unbalanced urban–rural development have spread. As a complex giant system, the exploration of urban resilience enhancement is critical to ensuring the joint spatial development of cities and towns. Based on the PSR model, this study screens 38 indicators in five levels of the natural-material-economic-social-intelligent regulation subsystem of the Three Gorges Reservoir Area urban giant system, and constructs a multi-source data resilience assessment framework. Likewise, it employs the Geodetector model to investigate the key factors impacting the resilience mechanism. The results demonstrate that: (1) between 2011 and 2020, the overall resilience in the Hubei section of the Three Gorges Reservoir Area increased from low to high and the coupled characterization of the “pressure-state-response” increased at different rates, with the state layer increasing the most; (2) the frequency of geological hazards, urbanization rate, and total number of early warning and monitoring of geological hazards are the key factors that contribute to changes in spatial resilience; (3) enhanced resilience is the result of the synergistic effects of different driving factors. Our model is used to assess the resilience of the urban system, assisting decision-makers in planning strategies to respond to urban system problems effectively and improve urban resilience.
... Usually, data are available for administrative geographic units such as "census tracks", "neighborhoods", and "wards". There is lack of agreement regarding the geographic boundaries of these fundamental units of study (Shearmur, 2015). What is the best delimitation of a "neighborhood", defined as a small and relatively homogeneous area? ...
... For example, some scholars have expressed concerns about efforts to use alternative sources of data in lieu of census data. Richard Shearmur (2015) argues that many researchers are "dazzled by data, " meaning that they have bought into a utopian narrative that claims newly available digital traces of human activity, better known as "Big Data, " will help to answer previously unanswerable questions about the world: ...
Full-text available
This article presents the Data Science Ethos Lifecycle, a tool for engaging responsible workflow developed by an interdisciplinary team of social scientists and data scientists working with the Academic Data Science Alliance. The tool uses a data science lifecycle framework to engage data science students and practitioners with the ethical dimensions of their practice. The lifecycle supports practitioners to increase awareness of how their practice shapes and is shaped by the social world and to articulate their responsibility to public stakeholders. We discuss the theoretical foundations from the fields of Science, Technology and Society, feminist theory, and critical race theory that animate the Ethos Lifecycle and show how these orient the tool toward a normative commitment to justice and what we call the “world-making” view of data science. We introduce four conceptual lenses—positionality, power, sociotechnical systems, and narratives—that are at work in the Ethos Lifecycle and show how they can bring to light ethical and human issues in a real-world data science project.
... Usually, data are available for administrative geographic units such as "census tracks", "neighborhoods", and "wards". There is lack of agreement regarding the geographic boundaries of these fundamental units of study (Shearmur, 2015). What is the best delimitation of a "neighborhood", defined as a small and relatively homogeneous area? ...
Urban scholars have made great advances to understand the reciprocal relations between households and their immediate environments as a means for the creation of efficient urban administrative systems. However, from an urban management perspective, reliance on geographical areas fixed for long periods of time as basic data collection constitutes a problem. Modern urban areas are in a permanent state of flux because of changing preferences, willingness to pay, location choices, and physical development. In this constantly changing context, what is the most appropriate delimitation of a “neighborhood”, defined as a small and relatively homogeneous area in a certain (and temporary) urban configuration? This paper contributes to the growing literature on the use of data analytic tools in urban studies and neighborhood delimitation in housing sub-markets, exploiting big data on real-estate transactions in England and Wales during a long period of time. The results shed light on the importance of organic urban features and the drawbacks of rigid geometric definitions. They also highlight the importance of the usage of deep Machine Learning (ML) tools such as Artificial Neural Network (ANN), alongside with traditional methods. The paper's contribution to urban governance is the suggestion of a smart and dynamic system aimed at defining the most appropriate areas for urban management given a specific period and situation. The suggested framework can be implemented periodically, helping to define homogeneous spatial units (neighborhoods) with large variances among them, allowing for designing urban policies tailored to each one of them.
... Neighborhood characteristics were derived from data collected during the 2006 Statistics Canada Census. The 2006 Census was used as it was collected just before the start of the AOF study and is generally considered to be more reliable than the 2011 Census [63][64][65]. ...
Full-text available
Depression is a major public health concern among expectant mothers in Canada. Income inequality has been linked to depression, so interventions for reducing income inequality may reduce the prevalence of maternal depression. The current study aims to simulate the effects of government transfers and increases to minimum wage on depression in mothers. We used agent-based modelling techniques to identify the predicted effects of income inequality reducing programs on maternal depression. Model parameters were identified using the All Our Families cohort dataset and the existing literature. The mean age of our sample was 30 years. The sample was also predominantly white (78.6%) and had at least some post-secondary education (89.1%). When income was increased by just simulating an increase in minimum wage, the proportion of depressed mothers decreased by 2.9% (p < 0.005). Likewise, simulating the Canada Child Benefit resulted in a 5.0% decrease in the prevalence of depression (p < 0.001) and Ontario’s Universal Basic Income pilot project resulted in a simulated 5.6% decrease in the prevalence of depression (p < 0.001). We also assessed simulated changes to the mother’s social networks. Progressive income policies and increasing social networks are predicted to decrease the probability of depression.
Transformation of the Earth's social and ecological systems is occurring at a rate and magnitude unparalleled in human experience. Data science is a revolutionary new way to understand human-environment relationships at the heart of pressing challenges like climate change and sustainable development. However, data science faces serious shortcomings when it comes to human-environment research. There are challenges with social and environmental data, the methods that manipulate and analyze the information, and the theory underlying the data science itself; as well as significant legal, ethical and policy concerns. This timely book offers a comprehensive, balanced, and accessible account of the promise and problems of this work in terms of data, methods, theory, and policy. It demonstrates the need for data scientists to work with human-environment scholars to tackle pressing real-world problems, making it ideal for researchers and graduate students in Earth and environmental science, data science and the environmental social sciences.
Urban functional area (UFA) is a core scientific issue affecting urban sustainability. The current knowledge gap is mainly reflected in the lack of multi-scale quantitative interpretation methods from the perspective of human-land interaction. In this paper, based on multi-source big data include 250 m × 250 m resolution cell phone data, 1.81 × 105 Points of Interest (POI) data and administrative boundary data, we built a UFA identification method and demonstrated empirically in Shenyang City, China. We argue that the method we built can effectively identify multi-scale multi-type UFAs based on human activity and further reveal the spatial correlation between urban facilities and human activity. The empirical study suggests that the employment functional zones in Shenyang City are more concentrated in central cities than other single functional zones. There are more mix functional areas in the central city areas, while the planned industrial new cities need to develop comprehensive functions in Shenyang. UFAs have scale effects and human-land interaction patterns. We suggest that city decision makers should apply multi-sources big data to measure urban functional service in a more refined manner from a supply-demand perspective.
Full-text available
In our study, we do not intend to present the distribution of religious and ecclesiastical affiliation of the Hungarian population based on the most recent census data, as this has been done before by excellent Hungarian authors. Our main goal is to examine the census questions that illustrate the population's affiliation with religions, churches, and denominations - using recent national and international examples. Our work is also motivated by the media's reflections (and misinterpretations) of the large discrepancy between the results of the last two Hungarian censuses, in 2001 and 2011. In our work, we attempt to examine the census indicators of denominational affiliation, in more detail and provide insights into the problematic nature of international censuses that serve the same purposes.
Full-text available
The era of Big Data has begun. Computer scientists, physicists, economists, mathematicians, political scientists, bio-informaticists, sociologists, and other scholars are clamouring for access to the massive quantities of information produced by and about people, things, and their interactions. Significant questions emerge. Will large-scale search data help us create better tools, services, and public goods? Or will it usher in a new wave of privacy incursions and invasive marketing? Will data analytics help us understand online communities and political movements? Or will it be used to track protesters and suppress speech? Will it transform how we study human communication and culture, or narrow the palette of research options and alter what 'research' means? Given the rise of Big Data as a socio-technical phenomenon, we argue that it is necessary to critically interrogate its assumptions and biases. In this article, we offer six provocations to spark conversations about the issues of Big Data: a cultural, technological, and scholarly phenomenon that rests on the interplay of technology, analysis, and mythology that provokes extensive utopian and dystopian rhetoric.
The work of the Marquis de Laplace (1749–1827) was enormously influential in the development of mathematical physics, astronomy and statistics. Educated in Normandy, he moved to Paris on obtaining a letter of introduction to d'Alembert, who acted as his mentor while he undertook teaching and independent research in probability, statistics and astronomy. Laplace survived the turmoil of the French Revolution, the rise of Napoleon and the restoration of the Bourbons by a series of manoeuvres which gave him a reputation for insincerity and hypocrisy even among his peers who could correctly assess his contributions to science. His Essai philosophique sur les probabilités, first published in 1814, and of which the fifth edition, revised by the author, is presented here, is a fundamental work which establishes six principles of probability in mathematical terms.
As digital social data have become increasingly ubiquitous, many have turned their attention to harnessing these massive data sets in order to produce purportedly more accurate and complete understandings of social processes. This intervention addresses the relationships between geography and big data and their intertwined futures. We focus on the impacts of an age of big data on the discipline of geography and geographic thought and methodology, as well as how geography might provide a useful lens through which to understand big data as a social phenomenon in its own right. Ultimately, we see significant potential in big data, but remain skeptical of the prevalent discourses around it, as they tend to obscure, more than reveal, the complexity of social and spatial processes.
This is the first cross-over book into the history of science written by an historian of economics. It shows how 'history of technology' can be integrated with the history of economic ideas. The analysis combines Cold War history with the history of postwar economics in America and later elsewhere, revealing that the Pax Americana had much to do with abstruse and formal doctrines such as linear programming and game theory. It links the literature on 'cyborg' to economics, an element missing in literature to date. The treatment further calls into question the idea that economics has been immune to postmodern currents, arguing that neoclassical economics has participated in the deconstruction of the integral 'self'. Finally, it argues for an alliance of computational and institutional themes, and challenges the widespread impression that there is nothing else besides American neoclassical economic theory left standing after the demise of Marxism.
This article discusses methodological aspects of Big Data analyses with regard to their applicability and usefulness in digital media research. Based on a review of a diverse selection of literature on online methodology, consequences of using Big Data at different stages of the research process are examined. We argue that researchers need to consider whether the analysis of huge quantities of data is theoretically justified, given that it may be limited in validity and scope, and that small-scale analyses of communication content or user behavior can provide equally meaningful inferences when using proper sampling, measurement, and analytical procedures.
discuss many of the limitations of using Big Data in social science research, and Graham and Shelton (2013) provide a thoughtful discussion on the potentials and pitfalls of Big Data in human geography
  • Scharkow Mahrt
Mahrt and Scharkow (2013) discuss many of the limitations of using Big Data in social science research, and Graham and Shelton (2013) provide a thoughtful discussion on the potentials and pitfalls of Big Data in human geography.