PreprintPDF Available

Demography in the Digital Era: New Data Sources for Population Research

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

The spread of digital technologies and the increased access to the internet has contributed to the production and accumulation of unprecedented quantities of data about human behavior. Demographers, who have a long-standing interest in issues related to data and data quality, are in an ideal position to make sense of this new information. This paper discusses three ways in which the Data Revolution has created novel sources of data for demographic research. It discusses the unique technical and ethical challenges posed by these data sources and the opportunities they provide for understanding historical and contemporary demographic dynamics around the world.
Content may be subject to copyright.
Demography in the Digital Era: New Data
Sources for Population Research
Demografia nell’era digitale: nuovi fonti di dati per gli
studi di popolazione
Diego Alburez-Gutierrez, Samin Aref, Sofia Gil-Clavel, Andr´
e Grow, Daniela V.
Negraia, Emilio Zagheni
Abstract The spread of digital technologies and the increased access to the internet
has contributed to the production and accumulation of unprecedented quantities of
data about human behavior. Demographers, who have a long-standing interest in
issues related to data and data quality, are in an ideal position to make sense of
this new information. This paper discusses three ways in which the Data Revolution
has created novel sources of data for demographic research. It discusses the unique
technical and ethical challenges posed by these data sources and the opportunities
they provide for understanding historical and contemporary demographic dynamics
around the world.
Abstract La diffusione di tecnologie digitali e la crescita nell’accesso ad inter-
net hanno contribuito ad una produzione ed accumulo senza precedenti di dati sul
comportamento umano. I demografi, che hanno avuto un interesse di lunga data
su questioni relative a dati e qualit`
a dei dati, sono in una posizione ideale per es-
trarre significato da queste nuove informazioni. Questo articolo discute tre modi in
cui la ‘Data Revolution’ ha creato nuove fonti di dati per la ricerca demografica.
L’articolo discute le sfide tecniche ed etiche create da questi dati e le opportunit`
a
che offrono per comprendere le dinamiche demografiche storiche e contemporanee.
Key words: demography, digital data, social media, data revolution
Diego Alburez-Gutierrez, Max Planck Institute for Demographic Research (MPIDR), e-mail:
alburezgutierrez@demogr.mpg.de
Samin Aref, MPIDR, e-mail: aref@demogr.mpg.de
Sofia Gil-Clavel, MPIDR, e-mail: gil@demogr.mpg.de
Andr´
e Grow, MPIDR, e-mail: grow@demogr.mpg.de
Daniela V. Negraia, MPIDR, e-mail: negraia@demogr.mpg.de
Emilio Zagheni, MPIDR, e-mail: zagheni@demogr.mpg.de
1
D. Alburez-Gutierrez, S. Aref, S. Gil-Clavel, A. Grow, D.V. Negraia, E.
Zagheni. (2019). 'Demography in the Digital Era: New Data Sources for Popu-
lation Research' in Arbia G., Peluso S., Pini A., Rivellini G. (eds.) Book
of Short Papers SIS2019. Pearson.
2 Diego Alburez-Gutierrez et al.
1 The Data Revolution: A New Data Paradigm in Demography?
Demography, the systematic study of population dynamics and the causes and con-
sequence of compositional changes in populations, has always been a data-driven
discipline. Administrators have used censuses to count (and tax) populations since
ancient times. In modern societies, an interest in data characterized the development
of the discipline of demography. For example, John Graunt identified London’s 16th
century ‘Bills of Mortality’ as a potential source of data for demographic analysis,
ultimately resulting in the creation of life tables. We argue that demography is at
the gates of a new data paradigm defined by the increased availability of population
data produced or made available by digital technologies and the internet [6, 8]. The
shift is part of the Data Revolution, the process through which the transition from
analogue to digital electronic technologies has resulted in the accumulation of vast
amounts of individual-level data (see Figure 1 for an illustration). The spread of
the internet, the World Wide Web, and the Internet of Things, have accelerated this
process, producing unprecedented data on society and human behavior [29].
This paper presents three innovative sources of data that have been made pos-
sible by the Data Revolution and explores their potential for conducting ground-
breaking demographic research. First, digitization has helped improve access to
existing data, such as censuses and population registers [33, 20], and bibliometric
databases [12, 27]. Similarly, the advent of online peer-to-peer collaboration has cre-
ated new resources, such as massive online genealogical databases [21], that can be
used for studying intergenerational demographic processes. Second, demographers
can now analyze digital traces left by internet users in platforms like Twitter [35]
and Facebook (FB) [15] to study population dynamics. Finally, the Data Revolution
has created new opportunities for collecting primary data using devices connected
to the internet. Examples discussed in this paper include online surveys [7, 1], apps
for registering time-use data [28], and internet advertising platforms [37, 9].
The Data Revolution
Spread of new
technologies and
platforms
Internet & WWW
Accumulation of
individual-level data
Internet of Things
Online adverts
Social media sites
Demography
Economics
Environment
Society
entails
such
as
which
affected
resulting in
useful for
understanding
Fig. 1 The Data Revolution and new sources of data for demographic analysis.
Demography in the Digital Era: New Data Sources for Population Research 3
1.1 Digitized and Crowd-sourced Data
Demographers and statistical agencies were quick to recognize the importance of
digitizing paper-based demographic data. The digitization of censuses and pop-
ulation registers was pioneered by the Integrated Public Use Microdata Series
(IPUMS), which now hosts the world’s largest collection of demographic micro-
data.1In time, digitization enabled the creation of crucial data repositories for de-
mographic research (e.g. the Human Mortality and the Human Fertility Databases2
or digital national population registers). Nordic registers, for example, have been
used to study intergenerational processes in fertility [22], health [5], mortality [4],
and migration [33]. Most of the existing research focuses on Europe, but researchers
increasingly acknowledge the potential of other population registers for conducting
demographic research (e.g. East Asia [13] or North America [18]).
Bibliometric databases, such as Scopus [2], Web of Science [27], and Dimen-
sions [32], are other examples of digitized sources with potential for demographic
research. These databases contain data on millions of scientific publications pro-
duced each year, including author affiliation and addresses. Affiliation data can be
used for analyzing scientific collaboration and mobility of researchers across coun-
tries [24, 3, 12]. Yet, using these data sources for migration research has limitations
which require a careful interpretation of the results [2, 27]. Changes on author af-
filiation, for example, are not a perfect proxy for mobility since conducting and
publishing research can be a lengthy process. Migration of researchers is likely to
be underestimated because some movements are not represented in publications in-
dexed in bibliometric databases. This calls for future research integrating bibliomet-
ric data with complementary data sources to resolve some of the methodological
issues. Despite these limitations, bibliometric data sources offer substantial benefits
[12, 27] compared to traditional data sources like surveys. These resources make re-
search on migration of research-active scientists more cross-disciplinary, scalable,
longitudinal, contemporary, and comprehensive.
Demographic data can also be crowd-sourced. Platforms like Geni.com and Wik-
iTree have allowed thousands of amateur genealogists to collaborate in building
large-scale online genealogical databases such as the Familinx database, which in-
cludes 86 million individual records from around the globe, with data that go back as
far as the 17th century [21]. This particular database was scraped from Geni.com, a
collaborative social network that allows users to find and verify family relations. On-
line genealogies are a promising resource because they cover long historical periods
and are not restricted by national boundaries - on the downside, they are not repre-
sentative samples and underrepresent Low- and Middle-Income countries (LMIC).
Despite their potential, sound demographic research using these data is still miss-
ing, including methodologies for addressing systematic biases and generalizing the
findings to larger populations [16].
1www.ipums.org; www.international.ipums.org, accessed 28.02.2019.
2www.mortality.org; www.humanfertility.org, accessed 28.02.2019.
4 Diego Alburez-Gutierrez et al.
1.2 Digital Traces from Social Media
About half of the world’s population are active internet users and many use social
media platforms like FB and Twitter.3Demographic information on the users of
these platforms can be used to perform demographic research in a timely manner.
Social media data can also be used to study populations that would otherwise re-
main entirely out of reach [26]. Researchers can access FB and Twitter data using the
platforms’ Application Programming Interfaces (API), some of which have been de-
signed for advertising purposes. The FB Marketing API gives access to aggregated
population data (e.g., the number of FB users by sex and age in a given country who
share certain interests), but not individual-level user data. Unfortunately, FB does
not provide much detail about how these aggregate figures are estimated (e.g., how
users are classified according to their interests, behavior, and demographic char-
acteristics). Twitter, in addition to a Marketing API for aggregate-level summary
information, allows researchers to query individual-level data from ‘public tweets’
(i.e. tweets not protected by the user).4Still, researchers can access information that
users have agreed to share, including text and images from tweets, user names, and
tweet locations. Having access to individual-level Twitter posts gives researchers
the freedom to design and test different models and algorithms using primary data.
Previous studies have collected data using APIs to study contemporary social and
demographic processes. FB data have been used to study access to digital technolo-
gies [15, 17], immigrant cultural assimilation [14], and to estimate migrant stocks
[37]. Twitter data have been used to study migration flows [35], and monitor popu-
lation health [10] and natural disasters [19]. The use of the FB and Twitter data has
clear advantages, but also important drawbacks. A notable limitation is that social
media data are generally not representative of the entire population. Recent studies
have attempted to overcome this limitation by combining social media data, sta-
tistical models, and representative surveys [36, 37]. Another limitation is the lack
of individual-level demographic data for Twitter users. Studies have addressed this
limitation by using pattern recognition techniques to infer the demographic charac-
teristics of users [34]. Nevertheless, there are clear benefits in using this new source
of data. For instance, demographers and sociologists have been able to reach and
study new populations, while statisticians and computer scientists have had the op-
portunity to test new models and algorithms. These examples show how the internet
has created research opportunities that were unimaginable when social networking
platforms were initially conceived, over 20 years ago.
3www.itu.int/en/ITU-D/Statistics/Pages/stat/default.aspx, accessed 28.02.2019.
4help.twitter.com/en/safety-and-security/public-and-protected-tweets, accessed 28.02.2019.
Demography in the Digital Era: New Data Sources for Population Research 5
1.3 New opportunities for collecting primary data
The Data Revolution has also created new opportunities for collecting primary data
via the internet. Several studies have recruited participants for online surveys us-
ing social networking sites (e.g., FB and Twitter) and online labor markets (e.g.,
Amazon Mechanic Turk and Craigslist) [7]. Such platforms tend to have wide reach
and often allow the targeting of individuals based on specific demographic char-
acteristics, interests, and behaviors. This makes them attractive for both drawing
convenience samples and recruiting members of hard-to-reach populations, usually
at a lower cost than would be possible with traditional probability samples [1]. Of
the existing platforms, FB has been the most popular, arguably because it (currently)
has the largest number of users and the widest international coverage, and because
it provides detailed information about user characteristics that can be used for tar-
geting participants [7, 9]. In the existing studies using FB, recruitment usually takes
place via ads that can be shown to users at various places of a webpage. Such ads
consist of one or more pictures accompanied by a short study description and a link
to an external site hosting the survey (see [26] for an example).
In addition to new platforms for survey research, internet-enabled devices (e.g.,
mobile phones and activity trackers) can revolutionize current research practice.
One example comes from the area of time-use research. The ways in which people
use their time (i.e., the quantity and quality of time; whether they spend it alone or
interacting with other people or with machines) has implications for their health and
wellbeing. Information about people’s time use and wellbeing has mainly been col-
lected using (1) recall or (2) real-time techniques. In recall techniques, interviews
are typically conducted over the phone or via paper-and-pencil self-administered
diaries, where respondents report back about what they did during that day or the
previous day, the duration of each activity, where they were, and who they inter-
acted with. This method affords coverage and detail of what participants did during
the previous day and the sequence in which activities took place, but it is not very
precise in estimating the hour and the minute in which an activity took place [25].
Furthermore, such recall diaries can be lengthy and burdensome for the respondent,
which is why most national representative recall-diary surveys are cross-sectional
and only cover one or two diary-days for each respondent. ‘Real-time’ techniques
that rely on applications or instant messages received on personal mobile phone have
the potential to capture what people are doing ‘right now’, and are likely to provide
a more precise picture of the exact activity [11, 28]. Such techniques also have the
advantage of being faster and less fatiguing for the respondent, allowing longitudi-
nal or repeated measures. Additionally, data collection via cellphone applications
could provide much needed insight into what people from various socioeconomic
contexts and political regimes do during day-to-day life and how those activities
are then linked to various measures of wellbeing. Nevertheless, assessing time-use
and subjective wellbeing in real time has its own reliability and validity challenges,
stemming particularly from the fact that asking respondents to evaluate their current
behavior and/or emotional wellbeing ‘right now’, may change the very behavior
and/or emotions we are trying to measure [23, 31].
6 Diego Alburez-Gutierrez et al.
2 What’s next for Demography?
This paper highlighted new opportunities for demographic research created by the
Data Revolution. The review of new data sources, however, is not exhaustive and
researchers will continue to find new ways of making sense of our social world with
the help of the internet and electronic devices. This concluding section considers the
unique technical and ethical challenges of digital data and discusses how addressing
them can contribute to the advancement of the demographic discipline.
Demographers using digital data face particular issues related to access, rep-
resentativity, and ethics. Researchers often ‘depend on the kindness of strangers’
for accessing data since internet companies, unlike governments, are not obliged to
share data from their platforms. This creates uncertainty as the conditions of access
may change in the future. There are important attempts to address this issue. The
Opal Project, for example, has proposed protocols for private companies to willingly
share anonymized data on a regular basis to inform public policy and academic re-
search.5Furthermore, digital sources are rarely representative of larger populations
in the way that randomized surveys are (even if, as this paper has shown, digital
technology can enhance the collection of primary survey data). Coverage can also
be an issue, as access to the internet is more restricted in LMIC. Nevertheless, digital
trace data can be used to show some of these global inequalities in access to digital
technologies [15, 17].6The issue has motivated research on generalizing from non-
representative samples to larger populations [36, 37]. This is a promising area of
methodological development with wide applications, especially as survey response
rates continue to decline around the world. The availability of online data has also
led researchers to think long and hard about data security, privacy and informed con-
sent in the digital era [30]. Ethical considerations must be a primary concern when
designing demographic studies using digital or internet data. Social scientists need
to adhere to ethical and transparent research practices, particularly as the privacy of
users is constantly threatened in the online world [38].
Finally, it is important to note that while innovative sources of data provide ex-
citing opportunities for new research, they are unlikely to make ‘traditional’ demo-
graphic sources obsolete in the near future (e.g., surveys, censuses). Rather, the Data
Revolution has the potential to complement and augment these existing data sources.
Traditional population data, for example, are crucial for identifying systematic bias
in online sources and calibrating estimates made from these data [37]. Social media
data can be used to estimate important demographic measures in contexts where tra-
ditional survey data are not available. The Data Revolution has already changed the
way we do demography, as evidenced by the digitization of historical censuses and
populations registers, and the creation of large-scale and open-access repositories
of demographic data. The pace of this changes is likely to increase in the future as
more researchers engage in ground-breaking research using digital data sources.
5www.opalproject.org/, accessed 28.02.2019.
6This work has resulted in efforts to ‘nowcast’ the digital gender gap in internet and mobile access
using real-time big data: https://www.digitalgendergaps.org/, accessed 28.02.2019.
Demography in the Digital Era: New Data Sources for Population Research 7
References
1. Antoun, C., Zhang, C., Conrad, F.G., Schober, M.F.: Comparisons of online recruitment strate-
gies for convenience samples: Craigslist, Google AdWords, Facebook, and Amazon Mechan-
ical Turk. Field Methods 28(3), 231–246 (2016)
2. Appelt, S., van Beuzekom, B., Galindo-Rueda, F., de Pinho, R.: Which factors influence the
international mobility of research scientists? In: A. Geuna (ed.) Global Mobility of Research
Scientists, pp. 177–213. Academic Press, San Diego (2015)
3. Aref, S., Friggens, D., Hendy, S.: Analysing scientific collaborations of New Zealand institu-
tions using Scopus bibliometric data. In: Proceedings of the Australasian Computer Science
Week Multiconference, p. 49. ACM (2018)
4. Baranowska-Rataj, A., Barclay, K., Kolk, M.: The effect of number of siblings on adult mor-
tality: Evidence from Swedish registers for cohorts born between 1938 and 1972. Population
Studies 71(1), 43–63 (2017)
5. Barclay, K.J., Kolk, M.: The long-term cognitive and socioeconomic consequences of birth
intervals: a within-family sibling comparison using Swedish register data. Demography 54(2),
459–484 (2017)
6. Billari, F.C., Zagheni, E.: Big data and population processes: A revolution? In: A. Petrucci,
R. Verde (eds.) Proceedings of the Conference of the Italian Statistical Society, pp. 167–178.
Firenze University Press (2017)
7. Boas, T.C., Christenson, D.P., Glick, D.M.: Recruiting large online samples in the United
States and India: Facebook, Mechanical Turk, and Qualtrics. Political Science Research and
Methods pp. 1–19 (2018)
8. Cesare, N., Lee, H., McCormick, T., Spiro, E., Zagheni, E.: Promises and pitfalls of using
digital traces for demographic research. Demography 55(5), 1979–1999 (2018)
9. Chu, J.L., Snider, C.E.: Use of a social networking web site for recruiting Canadian youth for
medical research. Journal of Adolescent Health 52(6), 792–794 (2013)
10. Cocos, A., Fiks, A.G., Masino, A.J.: Deep learning for pharmacovigilance: Recurrent neural
network architectures for labeling adverse drug reactions in Twitter posts. Journal of the
American Medical Informatics Association 24(4), 813–821 (2017)
11. Csikszentmihalyi, M., Larson, R.: Validity and reliability of the experience-sampling method.
In: Flow and the Foundations of Positive Psychology: The Collected Works of Mihaly Csik-
szentmihalyi, pp. 35–54. Springer, Dordrecht (2014)
12. Czaika, M., Orazbayev, S.: The globalisation of scientific mobility, 1970–2014. Applied Ge-
ography 96, 1–10 (2018)
13. Dong, H., Campbell, C., Kurosu, S., Yang, W., Lee, J.Z.: New sources for comparative social
science: Historical population panel data from East Asia. Demography 52(3), 1061–1088
(2015)
14. Dubois, A., Zagheni, E., Garimella, K., Weber, I.: Studying migrant assimilation through Face-
book interests. In: S. Staab, O. Koltsova, D.I. Ignatov (eds.) Social Informatics, Lecture Notes
in Computer Science, pp. 51–60. Springer International Publishing (2018)
15. Fatehkia, M., Kashyap, R., Weber, I.: Using Facebook ad data to track the global digital gender
gap. World Development 107, 189–209 (2018)
16. Fire, M., Elovici, Y.: Data mining of online genealogy datasets for revealing lifespan patterns
in human population. ACM Trans. Intell. Syst. Technol. 6(2), 28:1–28:22 (2015)
17. Garcia, D., Kassa, Y.M., Cuevas, A., Cebrian, M., Moro, E., Rahwan, I., Cuevas, R.: Analyzing
gender inequality through large-scale Facebook advertising data. Proceedings of the National
Academy of Sciences 115(27), 6958–6963 (2018)
18. Gauvin, H., Moreau, C., Lefebvre, J.F., Laprise, C., V´
ezina, H., Labuda, D., Roy-Gagnon,
M.H.: Genome-wide patterns of identity-by-descent sharing in the French Canadian founder
population. European Journal of Human Genetics 22(6), 814–821 (2014)
19. Ghahremanlou, L., Sherchan, W., Thom, J.A.: Geotagging Twitter messages in crisis manage-
ment. The Computer Journal 58(9), 1937–1954 (2015)
8 Diego Alburez-Gutierrez et al.
20. Hall, P.K., McCaa, R., Thorvaldsen, G., Group, I.M.A. (eds.): Handbook of international his-
torical microdata for population research: A project of IMAG, The International Microdata
Access Group. Minnesota Population Center, Minneapolis, Minn (2000)
21. Kaplanis, J., Gordon, A., Shor, T., Weissbrod, O., Geiger, D., Wahl, M., Gershovits, M.,
Markus, B., Sheikh, M., Gymrek, M., Bhatia, G., MacArthur, D.G., Price, A.L., Erlich, Y.:
Quantitative analysis of population-scale family trees with millions of relatives. Science
360(6385), 171–175 (2018)
22. Kolk, M.: Multigenerational transmission of family size in contemporary Sweden. Population
Studies 68(1), 111–129 (2014)
23. Ludwigs, K., Lucas, R., Burger, M., Veenhoven, R., Arends, L.: How does more attention to
subjective well-being affect subjective well-being? Applied Research in Quality of Life 13(4),
1055–1080 (2018)
24. Moed, H.F., Halevi, G.: A bibliometric approach to tracking international scientific migration.
Scientometrics 101(3), 1987–2001 (2014)
25. Phipps, P.A., Vernon, M.K.: Twenty-four hours: an overview of the recall diary method and
data quality in the American time use survey. In: R.F. Belli, F.P. Stafford, D.F. Alwin (eds.)
Calendar and Time Diary: Methods in Life Course Research, pp. 109–128. Sage Publications,
Thousand Oaks (2009)
26. P¨
otzschke, S., Braun, M.: Migrant sampling using Facebook advertisements: A case study of
Polish migrants in four European countries. Social Science Computer Review 35(5), 633–653
(2017)
27. Robinson-Garcia, N., Sugimoto, C.R., Murray, D., Yegros-Yegros, A., Larivi`
ere, V., Costas,
R.: The many faces of mobility: Using bibliometric data to measure the movement of scien-
tists. Journal of Informetrics 13(1), 50–63 (2019)
28. Roessger, K.M., Greenleaf, A., Hoggan, C.: Using data collection apps and single-case designs
to research transformative learning in adults. Journal of Adult and Continuing Education
23(2), 206–225 (2017)
29. Salath´
e, M., Bengtsson, L., Bodnar, T.J., Brewer, D.D., Brownstein, J.S., Buckee, C., Camp-
bell, E.M., Cattuto, C., Khandelwal, S., Mabry, P.L., Vespignani, A.: Digital Epidemiology.
PLoS Computational Biology 8(7), e1002,616 (2012)
30. Salganik, M.J.: Bit by bit: Social research in the digital age. Princeton University Press,
Princeton (2018)
31. Sasaki, W., Nakazawa, J., Okoshi, T.: Comparing ESM timings for emotional estimation
model with fine temporal granularity. In: Proceedings of the 2018 ACM International Joint
Conference and 2018 International Symposium on Pervasive and Ubiquitous Computing and
Wearable Computers, pp. 722–725. ACM (2018)
32. Thelwall, M.: Dimensions: A competitor to Scopus and the Web of Science? Journal of Infor-
metrics 12(2), 430–435 (2018)
33. Thorvaldsen, G., Østrem, N.O.: Migration and the historical population register of Norway.
Journal of Migration History 4(2), 237–248 (2018)
34. Yildiz, D., Munson, J., Vitali, A., Tinati, R., Holland, J.A.: Using Twitter data for demographic
research. Demographic Research 37(46), 1477–1514 (2017)
35. Zagheni, E., Garimella, V.R.K., Weber, I., State, B.: Inferring international and internal mi-
gration patterns from Twitter data. In: Proceedings of the 23rd International Conference on
World Wide Web - WWW ’14 Companion, pp. 439–444. ACM Press, Seoul, Korea (2014)
36. Zagheni, E., Weber, I.: Demographic research with non-representative internet data. Interna-
tional Journal of Manpower 36(1), 13–25 (2015)
37. Zagheni, E., Weber, I., Gummadi, K.: Leveraging Facebook’s advertising platform to monitor
stocks of migrants. Population and Development Review 43(4), 721–734 (2017)
38. Zuboff, S.: Big other: Surveillance capitalism and the prospects of an information civilization.
Journal of Information Technology 30(1), 75–89 (2015)
... Demographic estimates based on nontraditional data have been the object of recent but sustained interest to address the shortcomings of traditional data sources (Alburez-Gutierrez et al. 2019;Alexander, Polimis, and Zagheni 2020;Billari and Zagheni 2017;Cesare et al. 2018;Lee and Van der Klauw 2010;Penner and Dodge 2019;Williams et al. 2015;Zagheni et al. 2018;Zagheni et al. 2014;Zagheni and Weber 2012). These nontraditional data sources are generally not collected through surveys for the purpose of creating population estimates. ...
... However, consumer data -like other nontraditional sources -are not designed to be representative of populations and have known and presumed coverage biases. 3 The development of demographic measures based on nontraditional data has been limited by the biases inherent to these data sources and concerns about the validity of the estimates produced (Alburez-Gutierrez et al. 2019;Cesare et al. 2018;Phillips 2020;Zagheni et al. 2018). As a result, the potential of these data to generate demographic estimates to complement existing survey-based measures has not been fully realized. ...
... Technological changes, such as improvements in information storage and processing, have not only improved access and granularity of traditional sources of demographic data (e.g. individual-level census data across historical and geographical contexts), but have also increasingly provided researchers with new forms of data not originally meant for research but which nevertheless speak to important demographic outcomes, including data sources as wide-ranging as bibliometric and genealogical databases, social media data, and archived newspapers (Alburez-Gutierrez et al., 2019;Kashyap, 2021). Many of these new data sources are created as by-products of the use of digital technologies, such as web and social media, which are increasingly salient spaces for social interaction and expression. ...
... There is a long history of studies using different sources of data to study scholars as a population, including surveys of scholars (Cañibano et al., 2020;Franzoni et al., 2014), interviews (Cole & Zuckerman, 1987;Schaer et al., 2020), administrative and census data (Fenton et al., 2000;Shauman & Xie, 1996), and, more recently, online sources (e.g., LinkedIn data or website of universities) (Park et al., 2019;Yuret, 2017). With the widespread digitization of scholarly databases, bibliometric data are increasingly used as a new source (Alburez-Gutierrez et al., 2019). Bibliometric data include information that is extracted from scholarly publications in scientific journals. ...
Preprint
Digital and computational demography explores demography in relation to the digital revolution – the rapid technological improvements in digitized information storage, computational power and the spread of the internet and mobile technologies since the turn of the new millennium. We cover three ways in which the digital revolution touches upon demography. First, we discuss how digital technologies, through their impacts on daily lives and in shifting how individuals access information, communicate and access services, have implications for demographic outcomes linked to health and mortality, fertility and family, and migration. Second, we discuss how the digital revolution has created a wide range of new data sources such as digital trace and geospatial data that can be repurposed for demographic research, and enabled respondent recruitment across the world via the internet and social media. Third, we discuss how improvements in computational power have facilitated the use of computational methods such as microsimulation and agent-based modelling as well as machine learning techniques for demographic applications. We conclude by discussing future opportunities and challenges for digital demography.
... Часто исследовательские вопросы сформулированы так, что исправление данных вообще не требуется, например, когда объектом изучения является виртуальное, а не реальное население. Разрабатываются методы по согласованию данных из цифровых следов с другими источниками данных, в том числе микроданными переписей (Alburez-Gutierrez et al. 2019). Цифровые исследования сталкиваются и с большим числом этических проблем (Taylor, Floridi, van der Sloot 2017). ...
Article
Благодаря цифровизации экономики и общественной жизни расширяются возможности изучения населения с использованием цифровых следов – информации, которая накапливается в цифровой среде. В статье с помощью цифровых следов исследуется миграция населения российской Арктики – огромного макрорегиона, который на протяжении последних десятилетий испытывает значительный отток населения. Обобщен опыт применения цифровых следов в демографических исследованиях, названы их преимущества и ограничения. Для изучения населения российской Арктики использовали данные нескольких цифровых платформ. Анализ профилей пользователей социальной сети «ВКонтакте» позволил изучить миграционное движение, а данные сервиса по продаже билетов Туту.ру дали сведения об авиационных и железнодорожных перемещениях. С применением методов сетевого анализа изучены миграционные и транспортные потоки в российской Арктике на муниципальном уровне, выявлены хабы и кластеры в сетях перемещений. Определены особенности миграционных и транспортных сетей Арктики: низкая плотность, большие расстояния между узлами, высокая относительная подвижность при малом числе перемещений в абсолютном выражении, высокий удельный вес хабов в миграционном обмене. Миграционные потоки классифицированы по направлениям перемещений и типам муниципальных образований. Показано, что связность арктических территорий остается низкой, а положительный миграционный баланс имеют в основном региональные столицы, либо города за пределами Арктики. Полученные результаты позволят улучшить понимание миграционных процессов на Севере и в Арктике, повысить качество демографических прогнозов благодаря более точному моделированию миграционных потоков.
Conference Paper
The article considers migration and transport flows in the Russian Arctic using the tools of network science. Based on the data of digital platforms, networks of migration, air and railroad passenger flows were constructed. The paper reveals the characteristics of networks, identifies migration and transport hubs in the Russian Arctic. The article assesses the scale of movements in directions and the connectivity of the Arctic territories. The results can be used in forecasting demographic processes and territorial planning.
Chapter
Full-text available
Situated at the intersection of the computational and demographic sciences, digital and computational demography explores how new digital data streams and computational methods advance the understanding of population dynamics, along with the impacts of digital technologies on population outcomes, e.g. linked to health, fertility and migration. Encompassing the data, methodological and social impacts of digital technologies, we outline key opportunities provided by digital and computational demography for generating policy insights. Within methodological opportunities, individual-level simulation approaches, such as microsimulation and agent-based modelling, infused with different data, provide tools to create empirically informed synthetic populations that can serve as virtual laboratories to test the impact of different social policies (e.g. fertility policies, support for the elderly or bereaved people). Individual-level simulation approaches allow also to assess policy-relevant questions about the impacts of demographic changes linked to ageing, climate change and migration. Within data opportunities, digital trace data provide a system for early warning with detailed spatial and temporal granularity, which are useful to monitor demographic quantities in real time or for understanding societal responses to demographic change. The demographic perspective highlights the importance of understanding population heterogeneity in the use and impacts of different types of digital technologies, which is crucial towards building more inclusive digital spaces.
Article
CSmoothing allows an analyst to use the so-called Controlled Smoothing technique to estimate trends in a time series framework. In this Web-tool (Shiny), the analyst may apply the methodology to at most 3 mortality time series simultaneously, as well as to other kind of time series individually. Likewise, this smoothing approach allows the analyst to establish one, two or three segments in order to take into account possible changes in variance regimes. For estimating trends it uses different amounts of smoothness, both globally for the total data set and through some partial indices for each selected segment. It is also possible to endogenously fix the points where the segments start and end (the cutoff points) with continuous joints. Additionally, intervals of different standard deviations for their respective trends are given. Particular emphasis is placed on a big data set of log mortality rates, log(qx), taken from period life tables of the Human Mortality Database (HMD) (University of California Berkeley (USA) and and Max Planck Institute for Demographic Research (Germany)), 2021 University of California Berkeley (USA), and Max Planck Institute for Demographic Research (Germany). 2021. Human mortality database (HMD). www.mortality.org;www.humanmortality.de, data downloaded on 10/10/21. [Google Scholar]). In all cases, dynamic graphs and several statistics related to the Controlled Smoothing technique are illustrated.
Article
Full-text available
Ending poverty in all its forms everywhere is the number one Sustainable Development Goal of the UN 2030 Agenda. To monitor the progress toward such an ambitious target, reliable, up-to-date and fine-grained measurements of socioeconomic indicators are necessary. When it comes to socioeconomic development, novel digital traces can provide a complementary data source to overcome the limits of traditional data collection methods, which are often not regularly updated and lack adequate spatial resolution. In this study, we collect publicly available and anonymous advertising audience estimates from Facebook to predict socioeconomic conditions of urban residents, at a fine spatial granularity, in four large urban areas: Atlanta (USA), Bogotá (Colombia), Santiago (Chile), and Casablanca (Morocco). We find that behavioral attributes inferred from the Facebook marketing platform can accurately map the socioeconomic status of residential areas within cities, and that predictive performance is comparable in both high and low-resource settings. Our work provides additional evidence of the value of social advertising media data to measure human development and it also shows the limitations in generalizing the use of these data to make predictions across countries.
Article
Full-text available
Social scientists increasingly use Facebook's advertising platform for research, either in the form of conducting digital censuses of the general population, or for recruiting participants for survey research. Both approaches depend on the accuracy of the data that Facebook provides about its users, but little is known about how accurate these data are. We address this gap in a large-scale, cross-national online survey (N = 137,224), in which we compare self-reported and Facebook-classified demographic information (sex, age and region of residence). Our results suggest that Facebook's advertising platform can be fruitfully used for conducting social science research if additional steps are taken to assess the accuracy of the characteristics under consideration.
Article
Poor air quality is a pressing global challenge contributing to adverse health impacts around the world. In the past decade, there has been a rapid proliferation of air quality information delivered via sensors, apps, websites or other media channels in near real-time and at increasingly localized geographic scales. This paper explores the growing emphasis on self-monitoring and digital platforms to supply informational interventions for reducing pollution exposures and improving health outcomes at the individual level. It presents a technological case study that characterizes emerging air quality information communication mechanisms, or ‘AQ channels’, while drawing upon examples throughout the literature. The questions are posed: which air quality channels are ‘freely’ available to individuals in London, UK, and when and where are they accessed? Digital trace data and metadata associated with 54 air quality channels are synthesized narratively and graphically. Results reveal air quality channels derive air pollution estimates using common data sources, display disparate messaging, adopt variable geographic scales for reporting ‘readings’ and maintain psychosocial barriers to access and adoption of exposure-reducing behaviours. The results also point to a clear association between the publication of a high-profile news article about air pollution and increased air quality channel access. These findings illuminate a need for greater transparency around how air quality channels generate personalized air pollution exposure estimates and tailor messaging. The paper concludes by calling for air quality channel developers to exercise co-creative methods that can support sustainable, democratic data and knowledge production around air quality, while critically approaching disproportionate patterns of both pollution and information exposure.
Article
The nature and intensity of migration processes are constantly changing. Demographic statistics are not suitable for obtaining up-to-date information and making timely decisions in the field of demographic and social policy. Thus, digital demography is becoming increasingly important, as this area of population research uses new methods and data sources resulting from the Internet expansion and the digitalisation of society. Using digital demography methods and emerging data sources, the study aims to identify current migration trends in Russia at the municipal level. The duality of the object (real and virtual population) and methods (demographic and data science methods) of digital demography is demonstrated. Digital data sources for studying migration and relevant processing methods were considered. Further, it was proposed to assess migration flows by examining social network information and graphs of migration routes. The analysis of data obtained from the “Virtual population of Russia” project for 2356 urban and municipal regions revealed the features of intermunicipal migration and the centres of migration attraction in the country. An indicator for assessing the potential of future migrations based on the graphs of migration routes was presented. The analysis results show that balanced spatial development of Russia requires the stimulation of human capital development in local centres characterised by high migration potential. These include regional capitals, “second” cities in terms of population, and some research and industrial centres. The study findings can be used to consider demographic processes at the municipal level and elaborate strategic documents in the field of regional spatial development. Further digitalisation and the implementation of digital economy projects will lead to the accumulation of population data at the national level. Therefore, future research should focus on improving digital demography methods for studying and forecasting demographic processes.
Conference Paper
Full-text available
Towards wellbeing-awareness in computing, researches for estimating users' emotions using smartphone sensor data have been actively conducted as smartphones are getting more and more ubiquitous. Most studies have constructed emotion estimation models based on machine learning with contextual data from the smartphones and user's self-reporting ground truth label often collected via Experience Sampling Method (ESM). However, snice our emotion changes frequently in our daily lives, trying to collect the ground truth of such volatile emotions leads a storm of ESMs which could be burden to the users. In order to find better ESM methods, we propose and compare 3 ESMs, namely Randomized ESM that executes in randomly timings, Trigger ESM that executes when the user's behavior changes, and Unlocking ESM that sets up ESM on the unlocking screen. We constructed various emotional estimation models with four types of time granularity (1 day, 1/3 day, 3 hours, 1 hour) in four weeks experience with eight persons. As for the response rate, Unlocking ESM was the highest. In addition, it was clear that Unlocking ESM had the highest estimation accuracy in most cases.
Chapter
Full-text available
Migrant assimilation is a major challenge for European societies, in part because of the sudden surge of refugees in recent years and in part because of long-term demographic trends. In this paper, we use Facebook data for advertisers to study the levels of assimilation of Arabic-speaking migrants in Germany, as seen through the interests they express online. Our results indicate a gradient of assimilation along demographic lines, language spoken and country of origin. Given the difficulty to collect timely migration data, in particular for traits related to cultural assimilation, the methods that we develop and the results that we provide open new lines of research that computational social scientists are well-positioned to address.
Article
Full-text available
Gender equality in access to the internet and mobile phones has become increasingly recognised as a development goal. Monitoring progress towards this goal however is challenging due to the limited availability of gender-disaggregated data, particularly in low-income countries. In this data sparse context, we examine the potential of a source of digital trace ‘big data’ – Facebook’s advertisement audience estimates – that provides aggregate data on Facebook users by demographic characteristics covering the platform’s over 2 billion users to measure and ‘nowcast’ digital gender gaps. We generate a unique country-level dataset combining ‘online’ indicators of Facebook users by gender, age and device type, ‘offline’ indicators related to a country’s overall development and gender gaps, and official data on gender gaps in internet and mobile access where available. Using this dataset, we predict internet and mobile phone gender gaps from official data using online indicators, as well as online and offline indicators. We find that the online Facebook gender gap indicators are highly correlated with official statistics on internet and mobile phone gender gaps. For internet gender gaps, models using Facebook data do better than those using offline indicators alone. Models combining online and offline variables however have the highest predictive power. Our approach demonstrates the feasibility of using Facebook data for real-time tracking of digital gender gaps. It enables us to improve geographical coverage for an important development indicator, with the biggest gains made for low-income countries for which existing data are most limited.
Article
Full-text available
Significance We present the Facebook Gender Divide, an inexpensive, real-time instrument for measuring gender differences in Facebook access and activity in 217 countries. The Facebook Gender Divide captures standard indicators of Internet penetration and gender equality indices in education, health, and economic opportunity. We find that the tendency of countries to approach economic gender equality is negatively associated with a high Facebook Gender Divide. Our results suggest that online social networks, while suffering gender imbalance, may lower information access barriers for women and narrow the economic gender gap.
Article
This paper presents a methodological framework for developing scientific mobility indicators based on bibliometric data. We identify nearly 16 million individual authors from publications covered in the Web of Science for the 2008-2015 period. Based on the information provided across individuals' publication records, we propose a general classification for analyzing scientific mobility using institutional affiliation changes. We distinguish between migrants - authors who have ruptures with their country of origin - and travelers - authors who gain additional affiliations while maintaining affiliation with their country of origin. We find that 3.7% of researchers who have published at least one paper over the period are mobile. Travelers represent 72.7% of all mobile scholars, but migrants have higher scientific impact. We apply this classification at the country level, expanding the classification to incorporate the directionality of scientists' mobility (i.e., incoming and outgoing). We provide a brief analysis to highlight the utility of the proposed taxonomy to study scholarly mobility and discuss the implications for science policy.
Article
The digital traces that we leave online are increasingly fruitful sources of data for social scientists, including those interested in demographic research. The collection and use of digital data also presents numerous statistical, computational, and ethical challenges, motivating the development of new research approaches to address these burgeoning issues. In this article, we argue that researchers with formal training in demography—those who have a history of developing innovative approaches to using challenging data—are well positioned to contribute to this area of work. We discuss the benefits and challenges of using digital trace data for social and demographic research, and we review examples of current demographic literature that creatively use digital trace data to study processes related to fertility, mortality, and migration. Focusing on Facebook data for advertisers—a novel “digital census” that has largely been untapped by demographers—we provide illustrative and empirical examples of how demographic researchers can manage issues such as bias and representation when using digital trace data. We conclude by offering our perspective on the road ahead regarding demography and its role in the data revolution.
Article
The Historical Population Register (hpr) of Norway gives rise to new research opportunities on a large array of topics spanning medicine, social sciences and humanities. This introductory article outlines the contents of the register, the periods it covers, and its use, particularly with respect to the study of geographic mobility. This article introduces the articles in this issue, which concentrate on the emigration to the us and the returnee emigrants.
Article
This article examines online recruitment via Facebook, Mechanical Turk (MTurk), and Qualtrics panels in India and the United States. It compares over 7300 respondents—1000 or more from each source and country—to nationally representative benchmarks in terms of demographics, political attitudes and knowledge, cooperation, and experimental replication. In the United States, MTurk offers the cheapest and fastest recruitment, Qualtrics is most demographically and politically representative, and Facebook facilitates targeted sampling. The India samples look much less like the population, though Facebook offers broad geographical coverage. We find online convenience samples often provide valid inferences into how partisanship moderates treatment effects. Yet they are typically unrepresentative on such political variables, which has implications for the external validity of sample average treatment effects.
Article
This article provides an empirical assessment of global scientific mobility over the past four decades, based on bibliometric data. We find (i) an increasing diversity of origin and destination countries integrated in global scientific mobility, with (ii) the centre of gravity of scientific knowledge production and migration destinations moving continuously eastwards by about 1300 km per decade, (iii) an increase in average migration distances of scientists reflecting integration of global peripheries into the global science system, (iv) significantly lower mobility frictions for internationally mobile scientists compared to non-scientist migrants, (v) with visa restrictions establishing a statistically significant barrier affecting international mobility of scientists hampering the global diffusion of scientific knowledge.