ArticlePDF Available

Estimating individual employment status using mobile phone network data

Authors:

Abstract and Figures

This study provides the first confirmation that individual employment status can be predicted from standard mobile phone network logs externally validated with household survey data. Individual welfare and households vulnerability to shocks are intimately connected to employment status and professions of household breadwinners. At a societal level unemployment is an important indicator of the performance of an economy. By deriving a broad set of novel mobile phone network indicators reflecting users financial, social and mobility patterns we show how machine learning models can be used to predict 18 categories of profession in a South-Asian developing country. The model predicts individual unemployment status with 70.4 percent accuracy. We further show how unemployment can be aggregated from individual level and mapped geographically at cell tower resolution, providing a promising approach to map labor market economic indicators, and the distribution of economic productivity and vulnerability between censuses, especially in heterogeneous urban areas. The method also provides a promising approach to support data collection on vulnerable populations, which are frequently under-represented in official surveys.
Content may be subject to copyright.
Estimating individual employment status
using mobile phone network data
1Pål Sundsøy, 1Johannes Bjelland, 1Bjørn-Atle Reme, 2Eaman Jahani, 3,4Erik Wetter, 3Linus Bengtsson
1Telenor Group Research, 2MIT Media Lab, 3Flowminder Foundation, 4Stockholm School of Economics
ABSTRACT
This study provides the first confirmation that individual
employment status can be predicted from standard mobile phone
network logs externally validated with household survey data.
Individual welfare and households’ vulnerability to shocks are
intimately connected to employment status and professions of
household breadwinners. At a societal level unemployment is an
important indicator of the performance of an economy. By
deriving a broad set of novel mobile phone network indicators
reflecting users’ financial, social and mobility patterns we show
how machine learning models can be used to predict 18 categories
of profession in a South-Asian developing country. The model
predicts individual unemployment status with 70.4% accuracy.
We further show how unemployment can be aggregated from
individual level and mapped geographically at cell tower
resolution, providing a promising approach to map labor market
economic indicators, and the distribution of economic
productivity and vulnerability between censuses, especially in
heterogeneous urban areas. The method also provides a promising
approach to support data collection on vulnerable populations,
which are frequently under-represented in official surveys.
Keywords
Big-Data Development, machine learning, unemployment, socio-
economic indicators, mobile phone metadata, profession
1. INTRODUCTION
Unemployment is a key indicator of labor market performance
[1,2]. When workers are unemployed, their families also get
affected, while the nation as a whole loses their contribution to the
economy in terms of the goods and services that could have been
produced [3]. Unemployed workers also lose their purchasing
power, which can lead to the unemployment for other workers,
creating a cascading effect that ripples through the economy [4].
Additionally, unemployment has been shown to be a driver of
interregional migration patterns [5].
Counting each and every unemployed person on a monthly basis
would be a very expensive, time-consuming and impractical
exercise. In many countries, such as US, a monthly population
survey is run to measure the extent of unemployment in the nation
[6]. In developing countries such surveys often tend to have a low
spatial and temporal frequency [7]. Lacking statistics may lead to
higher uncertainties in economic outlook, lower purchasing
capacity and higher burden of debt. The problems of
unemployment and poverty have always been major obstacles to
economic development [8], and proper background statistics is
important to change this trend.
The increasing availability and reliability of new data sources, and
the growing demand of comprehensive, up-to-date international
employment data are therefore of high priority. Specifically,
privately held data sources have been shown to hold great promise
and opportunity for economic research, due to both high spatial
and temporal granularity [9].
One of the most promising rich new data sources is mobile phone
network logs [10], which have the potential to deliver near real-
time information of human behavior on individual and societal
scale [11]. The prediction from mobile phone metadata are vast
given that more than half of the world’s population now own a
mobile phone. Several research studies have used large-scale
mobile phone metadata, in the form of call detail records (CDR)
and airtime purchases (top-up) to quantify various socio-economic
dimensions. On aggregated level mobile phone data have shown
to provide proxy indicators for assessing regional poverty levels
[12,13], illiteracy [14], population estimates [15], human
migration [16,17] and epidemic spreading [18]. On individual
level mobile phone data have been used to predict, among others,
socio-economic status [19,20], demographics [21,22] and
personality [23].
Two previous papers analyze employment trends through cell
phone data. The work by [24] argue that unemployment rates may
be predicted two-to-eight weeks prior to the release of traditional
estimates and predict future rates up to four months ahead of
official reports accurately than using historical data alone. The
other study [25] shows that mobile phone indicators are associated
with unemployment and this relationship is robust when
controlling for district area, population and mobile penetration
rate. The results of these analyses highlight the importance of
investigating the relationship between mobile phone data and
employment data further.
Our work separates from [24,25] in several ways:
(1) Bottom-up approach: we focus on predicting employment
status on the individual level to be able get a clean view of
the main drivers and ability to discover global predictors that
are useful across various employment groups. This is of
relevance also as previous research has uncovered non-linear
relationships between worker flows and job flows at the
micro level, indicating a more complex relationship between
the micro and macro levels of employment statistics than
simple aggregation [26].
(2) Geography: We focus on a low HDI South-Asian
developing country where employment statistics is almost
non-existing and highly in demand.
(3) Method: Individually matched mobile phone data and large-
scale survey data allow us to carefully optimize machine-
learning models for multiple professions.
(4) Data: In addition to CDR data we also include airtime
purchases, as financial proxy, for our analysis.
The rest of this paper is organized as follows: In section 2 we
describe the methodological approach, including the features and
modeling approach. In section 3 we describe the results. In
chapter 4 we discuss the limitations from a holistic perspective,
while we finally draw our conclusions.
2. APPROACH
2.1 Data
Household survey data: Data from two nationally representative
cross-sectional household surveys of 200,000 individuals in a
low-income South Asian country was analyzed. The data was
collected for business intelligence purposes at time Q114 and
Q214 by an external survey company commissioned by the
operator. The survey discriminated between 18 types of
professions for the head of household, including currently being
retired and unemployed. The head of household’s was asked for
his or her most frequently used phone number. 87% of households
in the country has at least one mobile phone.
Mobile phone data: Mobile phone logs for 76 000 of the
surveyed 200 000 individuals belonging to the leading operator
were retrieved from a period of six months and de-identified by
the operator. Individual level features were built from the raw
mobile phone data and was subsequently coupled with the
corresponding de-identified phone numbers from the survey. The
social features were subsetted from a graph consisting of in total
113 million subscribers and 2.7 billion social ties. No content of
messages or calls were accessible and all individual level data
remained with the operator.
The following sub-sections describe the features and the machine
learning algorithm used for our prediction.
2.2 Features
Independent variables: The independent variables are built
entirely from the cell phone datasets. A structured dataset
consisting of 160 novel mobile phone features are built from the
raw CDRs and airtime purchases, and categorized into three
dimensions: (1) financial (2) mobility and (3) social features
(Table 1). The features are customized to be predictive of
employment status and include various parameters of the
corresponding distributions such as weekly or monthly median,
mean and variance. In addition to basic features such as incoming
and outgoing MMS, voice, SMS, internet and video calls we
investigate more customized features such as the consumption rate
of airtime purchases (spending speed), the amount time spent on
each base station, the size of social circle, the time spent on
different contacts and features related to the phone type owned by
the customer.
Dependent variables: The dependent variables were built from
the survey data. Since our aim is to separate one specific
profession/employment status from the others we build 18 binary
classifiers one for each profession (student/non-student,
employed/unemployed and so forth). These classifiers are then
trained separately.
Table 1. Sample of independent features from mobile phone
metadata used in model
Dimension
Features
Financial
Airtime purchases: Recharge amount per
transaction, Spending speed, fraction of
lowest/highest recharge amount, coefficient of
variation recharge amount etc
Revenue: Charge of outgoing/incoming SMS,
MMS, voice, video, value added sevices,
roaming, internet etc.
Handset: Manufacturer, brand, camera enabled,
smart/feature/basic phone etc
Mobility
Home district/tower, radius of gyration, entropy
of places, number of places visited etc.
Social
Social Network: Interaction per contact, degree,
entropy of contacts etc.
General phone usage: Out/In voice duration,
SMS count, Internet volume/count, MMS count,
video count/duration, value-added services
duration/count etc.
2.3 Deep Learning Models
We tested our features against several algorithms such as gradient
boosted machines (GBM), random forest (RF), support vector
machines (SVM) and K-nearest neighbors (kNN). Based on the
performance of individual algorithms we propose a standard
multi-layer feedforward neural network architecture where the
weighted combination of the n input signals is aggregated, and an
output signal f(α) is transmitted by the connected neuron. The
function f used for the nonlinear activation is rectifier f(α)
log(1+eα ) . To minimize the loss function we apply a standard
stochastic gradient descent with the gradient computed via back-
propagation. We use dropout as a regularization technique to
prevent over-fitting [27]. Dropout secures that each training
example is used in a different model which all share the same
global parameters. For the input layer we use the value of 0.1 and
0.2 for the hidden layers. In total 18 models are built one for
each pre-classified profession type.
To compensate class imbalance, the minority class in the training
set is up-sampled. The minority class is then randomly sampled,
with replacement, to be of the same size as the majority class. In
our set-up, each model is trained and tested using a 75%/25%
(training/testing) split. Commonly used performance metrics for
classification problems [28], including overall accuracy,
sensitivity and specificity, are reported for the test-set.
3. RESULTS
3.1 Individual employment status
The average prediction accuracy for all 18 profession groups were
67.5%, with clerk being the easiest to predict with accuracy of
73.5%, and skilled worker the most difficult (accuracy: 61.9%)
(Fig. 1a).
Our unemployment model predicted whether phone users were
unemployed with an accuracy of 70.4% (95% CI:70.1-70.6%).
The accuracy difference between the training and test-set was
3.6%, which indicate our trained model has good generalization
power. The true positive rate (sensitivity/recall) was 67% and true
negative rate (specificity) 70.4%. Given the original baseline of
2.1%, the model predicts unemployment on average 30 times
better than random.
Each cross-validated model was subsequently restricted to use its
20 most important predictors (applying Gedeon method [29]). An
investigation of the five most important predictors for each
profession is given in Fig 2b. This network shows how the
professions are linked together via common predictors. We
observe that several features are predictive across multiple
professions indicated by high in-degree in the network.
Predictors that are superior across multiple professions include the
most frequently used cell tower (longitude and latitude): in the
case of unemployed (red node) this signal indicate that the model
may catch regions of low economic development status, e.g. slum
areas where unemployment is high. Other cross-profession
predictors include number of visited places, the radius of gyration
(how far the person usually travels from his home tower) and
recharge amount per transaction. These indicators have earlier
been shown to be important financial proxy indicators for
household income in underdeveloped Asian markets [20].
Unemployed people tend further to have few interactions with
their friends, generate more voice calls at night (when calls are
(A) (B)
Figure 1: A) Test-set performance of predicting individual profession from mobile phone data. For each profession accuracy ,
sensitivity and specificity are reported. Top behavioral predictors are indicated with colors (excluding most used tower (lon,lat) )
B) Network of professions linked via common predictors. Each profession links to their 5 most important predictors. The link width
is proportional to the scaled relative importance of the current predictor. For example the most important predictors for being
unemployed (highlighted in red node color) are most used cell tower (lon,lat), interaction per contact, nocturnal voice (%) and number
of voice calls.
cheaper) and make less voice calls. They also tend to top-up with
the lowest recharge amount per transaction (sixth predictor) - a
feature that also occurs as a predictor of low household income
[20].
As seen in the figure, students have the most unique predictive
signal, when it comes to few overlapping predictors with other
groups. Interestingly they are not the easiest group to predict,
which indicate the Deep Learning models have found stronger
non-linear relationships in other categories of profession.
3.2 Geographical employment mapping
A natural next step is to move from individual employment to
geographical distribution of employment rates and professions.
Large Asian cities are typically covered by thousands of mobile
phone towers opening up to the possibility of providing a detailed
spatial understanding of differences in employment rates and
profession types. In figure 2 we have mapped out the predicted
geographical distribution of employment rate per home tower, in
one of the large cities for six different groups. The individual
employment rates are here calculated using the test set, aggregated
Figure 2 Geographical distributions of employment status and profession categories per base station in one of the larger Asian cities
with over 1,500 cell towers and 18 million people. Employment rates are calculated by using the out-of-sample test set, aggregated and
averaged to their respective home tower. Individual prediction accuracy and top three predictors are given to the right for A) Unemployed
B) Teachers and students C) Landlords D) Retired E) Clerks.
and averaged to the tower from which most calls were made
between 7pm and 5am (defined as home tower). Fig 2a shows that
the unemployed population are spatially spread out across the city,
indicating many pockets of unemployment that traditional surveys
would not easily pick up. The retired population (Fig 2d) and
landlords (Fig 2c) are spatially more concentrated.
By investigating the top 3 most important behavioral predictors
we interestingly observe that the physical width of handset is the
most important for the retired people. The elderly in this market
tends to have older and smaller phones than the younger
generation. This in contrast to Scandinavian markets where larger
handsets are marketed especially towards elderly people. We also
notice that SMS and mobile Internet consumption are the best
indicators for being a student (Fig 2b). The most important
predictor of Clerks (Fig 2e) is a low radius of gyration (or
mobility radius reflecting static office jobs). They also tend to
have more advanced consumption patterns (more variation in top-
up refill types) and longer voice duration. We also notice that
airtime purchase information is among top predictors for several
professions, stressing the importance to include such datasets in
future research.
4. DISCUSSION
This study shows how employment status can be predicted from
mobile phone logs, purely by investigating users’ metadata. By
deriving economic, social and mobility features for each mobile
user we predict individual employment status with up to 73.5%
accuracy. We address how various profession groups relates in a
network via industry standard mobile network indicators, and
further show how individual employment can be aggregated and
mapped geographically with high spatial resolution on cell tower
level. The geospatial indicators available in mobile network data
additionally allow promising avenues for research on two major
topics related to the economic impacts of employment status; the
increased productivity effects that have been found to be an
outcome of increased spatial density of employment [30], and the
interregional migration patterns that are related to unemployment
[5].
One general concern in such studies is always the sampling
selection bias. A large data set may make the sampling rate
irrelevant, but it doesn’t necessarily make it representative. The
fact that the people who use mobile phone are not necessarily a
representative sample of the larger population considered. This
issue is especially of high relevance when considering how
mobile phone data may be used for monitoring, economic
forecasting and development. Research studies are often based
purely on data from one mobile operator, and depending on the
type of data one can expect individuals to be represented
disproportionally with respect to certain characteristics. These
problems persist even if data from all operators in a country were
available, nearing the total population. In our study we consider
data from one large operator, where customers have to own a
mobile phone to be counted. We argue, however, that there should
be a high correlation between population employment and sample
employment, since our sample is large, and most people in the
country own a mobile phone (more than 85% penetration rate).
Additional sources for external validation were not easily
obtainable.
An important policy application of this work is the prediction of
regional and individual employment rates in developing countries
where official statistics is limited or non-existing.
5. REFERENCES
1.
Lovati, J.: The unemployment rate as an economic indicator.
Federal reserve bank of st.louis (1976)
2.
Keynes, Maynard: The General Theory of Employment,
Interest and Money. Basingstoke, Hampshire: Palgrave
Macmillan. ISBN 0-230-00476-8. (2009)
3.
International Labour Organization: Global Unemployment
Trends. (2013)
4.
Garegnani: Heterogeneous Capital, the Production Function
and the Theory of Distribution. Review of Economic Studies.
37 (3): 407436 (1970)
5.
Pissarides, C., Wadsworth, J.: The Flow Approach to Labor
Markets: New Data Sources and Micro-Macro links. The
Journal of Economic Perspectives, Vol. 20, No. 3, pp. 3-
26(24) (1989)
6.
U.S. Bureau of Labor Statistics: How the Government
Measures Unemployment. (2014)
7.
ILO: World Employment Social Outlook. (2015)
8.
Economy Watch: Unemployment and Poverty.
http://www.economywatch.com/unemployment/poverty.html
(2010)
9.
Einav, L., Levin, J.: Economics in the age of big data.
Science, Vol 346(6210) DOI: 10.1126/science.1243089
(2014)
10.
Lokanathan, Gunaratne: Behavioral insights for development
from Mobile Network Big Data: enlightening policy makers
on the State of the Art. In : Available at:
http://dx.doi.org/10.2139/ssrn.2522814 (2014)
11.
Lazer, Pentland, Adamic, Aral, Barabasi, Brewer, Christakis:
Computational Social Science. Science. Vol. 323, Issue 5915,
pp. 721-723 (2009)
12.
Blumenstock, Cadamuro: Predicting poverty and wealth from
mobile phone metadata. Science VOL 350 ISSUE 6264 1
(2015)
13.
Steele, J.: Predicting poverty using cell phone and satellite
data. In submission (2016)
14.
Sundsøy, P.: Can Mobile Usage predict illiteracy in a
developing country? arXiv Preprint: 1607.01337 (2016)
15.
Deville, Linard, Martin, Gilbert, Stevens, Gaughan, Blondel,
Tatem: Dynamic population mapping using mobile phone
data. PNAS 1588815893, doi: 10.1073/pnas.1408439111
(2014)
16.
Lu, X.: Detecting climate adaptation with mobile network
data in Bangladesh: anomalies in communication, mobility
and consumption patterns during cyclone Mahasen. Climatic
Change Volume 138, Issue 3, pp 505519 (2016)
17.
Lu, X.: Unveiling hidden migration and mobility patterns in
climate stressed regions: A longitudinal study of six million
anonymous mobile phone users in Bangladesh. Global
Environmental Change Volume 38, May 2016, Pages 17
(2016)
18.
19.
20.
21.
22.
23.
24.
25.
Signatures of Unemployment. arXiv:1609.01778 [cs.SI]
(2016)
26.
Faberman, D., Haltiwanger, J.: The Flow Approach to Labor
Markets: New Data Sources and Micro-Macro links. The
Journal of Economic Perspectives, Vol. 20, No. 3, pp. 3-
26(24) (2006)
27.
Dahl, G.: Improving Deep Neural Networks for LVCSR using
Rectified Linear Units and Dropout. ICASSP, 8609-8613.
(2013)
28.
Koyejo, O.: Consistent Binary Classification with Generalized
Performance Metrics. NIPS (2014)
29.
Gedeon, T.: Data Mining of inputs: analysing magnitude and
functional measures. International Journal of Neural Systems,
vol 8, no. 2, pp. 209-218. (1997)
30.
Ciccone, A., Hall, R.: Productivity and density of economic
activity. American Economic Review, Vol. 86, No. 1, pp. 54-
70. (1996)
31.
OECD: Main Economic Indicators. (2016)
... While there is a broad literature on remittances, little is published in the literature about airtime top-up transfers, due to the difficulty of procuring data on the subject. Individual airtime purchases can be obtained from a mobile telecommunications company, and were used for developing fine-grained indicators of wealth [22], socioeconomic segregation [15], food consumption [18] and employment [48]. In some countries, such as North Korea or Nigeria, airtime topups are used for settling small sums and treated as a proxy for cash within the county [26,49]. ...
Article
Full-text available
International airtime top-up transfers enable prepaid mobile phone users to send top-ups and data bundles to users in other countries, as well as make payments, in real time. These are heavily used by migrants to financially assist their families in their home countries and consequently could be a valuable source of information for migration and mobility analysis. However, top-up transfers are understudied as a form of money remittance in migration. In this paper, we explore the determinants and the potential of top-up transactions to complement remittance and migration statistics. Our results show that such data can provide insights into migrant groups, particularly for irregular migration and for estimating the real-time distribution of migrant groups for a given country.
... Another growing body of research suggests that more fine-grained behavioural and contextual data that can be collected with of-the-shelf smartphones allow for similarly accurate predictions of psychological phenomena with much smaller samples (Stachl, Au, Schoedel, Gosling, Harari, Buschek, Völkel, Schuwerk, Oldemeier, Ullmann, Hussmann, Bischl, & Bühner, 2020;Panicheva, Mararitsa, Sorokin, Koltsova, & Rosso, 2022). Computational inferences from mobile sensing data include a number of psychologicallyrelevant individual differences including demographics (Koch, Romero, & Stachl, 2022;Malmi & Weber, 2016;Sundsøy, Bjelland, Reme, Jahani, Wetter, & Bengtsson, 2016), moral values (Kalimeri, Delfino, Matteo Raleigh, & Cattuto, 2019), and personality traits . Latest work, has started to explore the feasibility to predict clinical depression levels using messaging texts and sensor readings from smartphones (Liu et al., 2022;Müller et al., 2021). ...
Preprint
Full-text available
Decisions such as which movie to watch next, which song to listen to, or which product to buy online, are increasingly influenced by recommender systems and user models that incorporate information on users' past behaviours, preferences, and digitally created content. Machine learning models that enable recommendations and that are trained on user data may unintentionally leverage information on human characteristics that are considered vulnerabilities, such as depression, young age, or gambling addiction. The use of algorithmic decisions based on latent vulnerable state representations could be considered manipulative and could have a deteriorating impact on the condition of vulnerable individuals. In this paper, we are concerned with the problem of machine learning models inadvertently modelling vulnerabilities, and want to raise awareness for this issue to be considered in legislation and AI ethics. Hence, we define and describe common vulnerabilities, and illustrate cases where they are likely to play a role in algorithmic decision-making. We propose a set of requirements for methods to detect the potential for vulnerability modelling, detect whether vulnerable groups are treated differently by a model, and detect whether a model has created an internal representation of vulnerability. We conclude that explainable artificial intelligence methods may be necessary for detecting vulnerability exploitation by machine learning-based recommendation systems.
... Toole et al. showed that mobile phone activity patterns revealed valuable indicators of the socio-economical status of geographical regions (Toole et al. 2015). At the same time, changes in the calling behaviours were also found useful when forecasting macro unemployment rates (Sundsøy et al. 2016). ...
Article
Full-text available
Historically, policymakers and practitioners relied exclusively on survey and census data to design and plan for assistive interventions; now, social media offer a timely and cost-effective way to reach out to populations otherwise unobserved. This study was designed to address the needs of a non-for-profit organisation to reach out to the young unemployed individuals in Italy with educational and job opportunities via communication channels that are more likely to appeal to younger generations. To this extend, we developed an ad-hoc Facebook application which administers questionnaires while gathering data about the Likes on Facebook Pages. Then, we developed a machine learning framework that successfully predicts the unemployment status of an unseen individual (.74 AUC). However, blindly delegating to the machine learning model the communication intervention may lead to digital discrimination on the basis of socio-demographic characteristics. Here, we propose a framework that aims to optimising both for the prediction performance as well as the most adequate fairness metric. Our framework is based on an adaptive threshold for gender, while we show that it can be expanded for other socio-demographic attributes and generalised for other interventions of assistive character. We present a doubly cross-validated setting that achieves out-of-sample stability and generalisability of results. We compare the behaviour of models that infer on different sets of data and provide an indepth discussion on the most predictive features, demonstrating that the “fairness through unawareness” approach does not suffice to achieve a fair classification since sensitive demographic information can be inferred not only via other sociodemographic attributes but also from behavioural digital patterns. Finally, we thoroughly assess the behaviour of the adaptive threshold approach and provide an in-depth discussion on the advantages but also the implications of such models offering actionable insights. Our results show that careful assessment of fairness metrics should be considered, primarily when AI models are employed for policymaking.
... With surveys and covariates derived from CDR data available for multiple time intervals, there is the potential for multitemporal mapping to measure progresses towards meeting the SDGs at fine spatial disaggregation. Recent work investigating whether indicators extracted from mobile phone usage can reveal information about mobile phone users, focused on deriving gender (Jahani et al., 2017;Bosco et al., 2019), employment status (Almaatouq et al., 2016;Sundsøy et al., 2016), education (Sundsøy, 2016), household wealth (Šćepanović et al., 2015), and individual income (Blumenstock et al., 2015;Sundsøy et al., 2016a). ...
Preprint
Full-text available
With the consolidation of the culture of evidence-based policymaking, the availability of data has become central to policymakers. Nowadays, innovative data sources offer an opportunity to describe demographic, mobility, and migratory phenomena more accurately by making available large volumes of real-time and spatially detailed data. At the same time, however, data innovation has led to new challenges (ethics, privacy, data governance models, data quality) for citizens, statistical offices, policymakers and the private sector. Focusing on the fields of demography, mobility, and migration studies, the aim of this report is to assess the current state of data innovation in the scientific literature as well as to identify areas in which data innovation has the most concrete potential for policymaking. Consequently, this study has reviewed more than 300 articles and scientific reports, as well as numerous tools, that employed non-traditional data sources to measure vital population events (mortality, fertility), migration and human mobility, and the population change and population distribution. The specific findings of our report form the basis of a discussion on a) how innovative data is used compared to traditional data sources; b) domains in which innovative data have the greatest potential to contribute to policymaking; c) the prospects of innovative data transition towards systematically contributing to official statistics and policymaking.
... With surveys and covariates derived from CDR data available for multiple time intervals, there is the potential for multitemporal mapping to measure progresses towards meeting the SDGs at fine spatial disaggregation. Recent work investigating whether indicators extracted from mobile phone usage can reveal information about mobile phone users, focused on deriving gender (Jahani et al., 2017;Bosco et al., 2019), employment status (Almaatouq et al., 2016;Sundsøy et al., 2016), education (Sundsøy, 2016), household wealth (Šćepanović et al., 2015), and individual income (Blumenstock et al., 2015;Sundsøy et al., 2016a). ...
Book
Full-text available
With the consolidation of the culture of evidence-based policymaking, the availability of data has become central for policymakers. Nowadays, innovative data sources have offered opportunity to describe more accurately demographic, mobility- and migration- related phenomena by making available large volumes of real-time and spatially detailed data. At the same time, however, data innovation has brought up new challenges (ethics, privacy, data governance models, data quality) for citizens, statistical offices, policymakers and the private sector.Focusing on the fields of demography, mobility and migration studies, the aim of this report is to assess the current state of utilisation of data innovation in the scientific literature as well as to identify areas in which data innovation has the most concrete potential for policymaking. For that purpose, this study has reviewed more than 300 articles and scientific reports, as well as numerous tools, that employed non-traditional data sources for demographic, human mobility or migration research.The specific findings of our report contribute to a discussion on a) how innovative data is used in respect to traditional data sources; b) domains in which innovative data have the highest potential to contribute to policymaking; c) prospects for an innovative data transition towards systematic contribution to official statistics and policymaking.
... Other studies ((M et al., 2021)) have used mobile phone data also to provide a timely and large scale picture on the effects of the mobility restrictions, imposed by the Italian authorities to fight the COVID-19 pandemic, on human mobility and their economic impact. Moreover, the literature has shown that the use of mobile phone data can be expanded to study and provide prediction regarding the employment levels (Almaatouq et al., 2016;Sundsøy et al., 2016;Toole et al., 2015). ...
Preprint
Full-text available
The COVID-19 pandemic has created a sudden need for a wider uptake of home-based telework as means of sustaining the production. Generally, teleworking arrangements have direct effect on worker's efficiency and motivation. The direction of this impact, however, depends on the balance between positive effects of teleworking (e.g. increased flexibility and autonomy) and its downsides (e.g. blurring boundaries between private and work life). Moreover, these effects of teleworking can be amplified in case of vulnerable groups of workers, such as women. The first step in understanding the implications of teleworking on women is to have timely information on the extent of teleworking by age and gender. In the absence of timely official statistics, in this paper we propose a method for nowcasting the teleworking trends by age and gender for 20 Italian regions using mobile network operators (MNO) data. The method is developed and validated using MNO data together with the Italian quarterly Labour Force Survey. Our results confirm that the MNO data have the potential to be used as a tool for monitoring gender differences in teleworking patterns. This tool becomes even more important today as it could support the adequate gender mainstreaming in the `Next Generation EU' recovery plan and help to manage related social impacts of COVID-19 through policymaking.
... Individual movement is closely related to the attributes of the visited locations and individual profiles (Siła-Nowicka et al., 2016;Wang et al., 2018;Yang et al., 2018;Demissie et al., 2019;Spyratos et al., 2019). To date, detailed human mobility analyses are still relatively sparse, and due to data ownership and privacy issues, there is a lack of comprehensive datasets toward labeling individual profiles (Sundsøy et al., 2016;Kim et al., 2018;Zufiria et al., 2018). Nevertheless, the exploration of human mobility from different views has grown rapidly with the fast growth of various Information and Communication Technologies (ICTs). ...
Article
Human mobility patterns have been investigated on a macroscale ranging from intra-city and intercity to intra-country based on mobile phone data. However, few studies have been conducted from a micro-view to characterize group-level human mobility behavior with respect to a point of interest (POI). In this paper, we intend to explore the differences in mobility patterns across those groups of community members at a specific POI. First, an appearance probability estimation algorithm is proposed to detect individual frequent locations for each user, and thereafter mobile users are classified into POI-related categories for further analysis of group-level mobility behavior. A hospital experiment is described based on a mobile phone dataset collected from Hangzhou City, China. An evaluation of this model illustrates the good performance of our scheme. Moreover, the mobility pattern analysis exhibits differences between groups with respect to frequent locations, radius of gyration, and population spatial distribution. The results of the radius of gyration distributions show that medical workers, out-patients, and passersby all follow an exponentially truncated power-law distribution, while in-patients present an exponential-law distribution.
Technical Report
Full-text available
Across the globe, sex-disaggregated data to track gender equality and women’s empowerment remain scarce as they cover few countries and are collected irregularly. There has been a growing interest in identifying alternative data sources that are common across countries and can provide higher spatio-temporal coverage to measure and monitor progress on women’s empowerment and gender equality. This study explores one such data source: mobile phone usage data, also called call detail records (CDRs). We use CDRs of mobile phone users in Uganda combined with data from a phone survey to train machine-learning models to predict the sex of the mobile phone user and several indicators of economic empowerment such as ownership of a house and land, occupation, and decision-making over household income. The most accurate of the models predicts the sex of the mobile phone user with 78% accuracy. The different indicators of economic empowerment are predicted with accuracies ranging from 57% to 61%. We also predict users’ sex and economic empowerment jointly. When we first predict the sex of the user and then economic empowerment, no noticeable improvements occur in the predictive accuracies over the separate predictions for the five indicators. However, when we predict economic empowerment and then the sex of the user, we achieve high accuracy rates ranging from 81% to 87%. Mobile phone usage data hold potential for gender research although they are not without limitations.
Technical Report
Full-text available
While the social status and position of women and men, girls and boys in Nepal - as elsewhere - is cut through by geography, social class, race, ethnicity, and age (life-stage), historically women and girls have been disproportionately subject to gender-based disadvantages, both legally enshrined and institutionalised as social norms and expectations (Matinga et al., 2019). In recent years, the Government of Nepal has sought to address major sites of gender-based disadvantage, introducing a series of legal and regulatory provisions to strengthen women’s position in society and advance gender equality. The 2015 Constitution mandated that women occupy a third of parliamentary seats, and introduced a raft of new rights previously withheld from women. Newly available rights include: rights to inheritance (lineage), to reproductive and maternal health provision, and equal rights in property and family matters (Government of Nepal, 2015). There followed a series of measures to address gender-based inequalities in educational attainment and in legally recognised use-rights over land (at a time when under 20% of women had land registered in their name (IOM, 2016). Despite these recent moves to diminish gender-based inequalities, women and girls in Nepal - as elsewhere - continue to be disproportionately subject to gender-based disadvantages, both legally enshrined and institutionalised as social norms and expectations (Care, 2015). Against this backdrop, this study investigated the potential for novel digital data sources to support gender-equitable development across Nepal. The study was organised around two work packages. In the first, we combined nationally representative, geo-located survey data with satellite imagery and mobile phone data, to model and map spatial variations and gender-based inequalities for three, key development indicators (literacy, agriculture-based-occupations, and births in health facilities) across Nepal. The results obtained for work package one demonstrate the power of modern and robust statistical methods to exploit geolocated survey data in new and innovative ways, so permitting the geographical scale of survey estimates to be greatly refined. We discuss the data requirements underpinning good model performance, contrasting, for example, the weaker results obtained for male literacy rates with results for the best-performing indicators. Notwithstanding the potential for results to be improved through the inclusion of additional information, we suggest that the showcased techniques can (potentially) be applied to a wide variety of development indicators. We outline the practical relevance of the study outputs for the design, implementation, and monitoring of gender-equitable development in Nepal. The second work package sought to leverage de-identified mobile phone data to produce robust, frequently updatable, information on gendered mobility and migration patterns, trajectories, and dynamics within Nepal. This entailed the development of methods to predict gender for a ‘population’ of mobile phone subscribers. As part of this workstream, we administered a primary survey to validate gender for a representative sample of subscribers. To our knowledge, this study is the first time that a rigorous assessment of SIM-card (Subscriber Identification Module-card) sharing has been undertaken and incorporated into model architectures for demographic prediction. The study findings indicate that it is common for individuals to use one another’s SIM-cards, despite (overall) high rates of individual mobile phone ownership in Nepal. Our results suggest that the ‘single-SIM/single subscriber’ assumption (which has, to date, underpinned demographic prediction models) is untenable in the study setting. The uncertainty introduced by widespread SIM sharing in this setting is higher than traditionally allowed for by ‘classic methods’. The extent to which the pattern observed for Nepal holds in different settings is an empirical question. Ultimately, it may be necessary to reassess the performance of ‘classic methods’ to predict demographics from CDR data in light of previously undetected sources of uncertainty. This will depend on further research to assess the extent of (unacceptable) uncertainty posed by SIM use and sharing in different settings. Seeking to compensate for the uncertainty introduced by reported widespread SIM-sharing, we applied state-of-the-art semantic array programming - a robust, modular modelling approach - to model women’s and men’s mobility and migration patterns. While the model results are encouraging, indicating that analysis of individual CDR data can enhance our understanding of the spatial variation and temporal dynamics of sex and gender-based inequalities, more work is needed to unravel the implications of SIM sharing for gender (and more broadly, demographic) prediction models. We make a number of recommendations in this regard. ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ ▷⠀𝗛𝗼𝘄⠀𝘁𝗼⠀𝗰𝗶𝘁𝗲:⠀⠀⠀⠀⠀⠀ Bosco, C., Watson, S., Game, A., Brooks, C., de Rigo, D., Qader, S., Greenhalgh, J., Nilsen, K., Ninneman, A., Wood, R., Bengtsson, L., 2019. Towards high-resolution sex-disaggregated dynamic mapping. Flowminder Foundation, Stockholm, Sweden. https://doi.org/10.13140/RG.2.2.12800.79360 ◁
Article
Full-text available
Mobile phones are one of the fastest growing technologies in the developing world with global penetration rates reaching 90%. Mobile phone data, also called CDR, are generated everytime phones are used and recorded by carriers at scale. CDR have generated groundbreaking insights in public health, official statistics, and logistics. However, the fact that most phones in developing countries are prepaid means that the data lacks key information about the user, including gender and other demographic variables. This precludes numerous uses of this data in social science and development economic research. It furthermore severely prevents the development of humanitarian applications such as the use of mobile phone data to target aid towards the most vulnerable groups during crisis. We developed a framework to extract more than 1400 features from standard mobile phone data and used them to predict useful individual characteristics and group estimates. We here present a systematic cross-country study of the applicability of machine learning for dataset augmentation at low cost. We validate our framework by showing how it can be used to reliably predict gender and other information for more than half a million people in two countries. We show how standard machine learning algorithms trained on only 10,000 users are sufficient to predict individual’s gender with an accuracy ranging from 74.3 to 88.4% in a developed country and from 74.5 to 79.7% in a developing country using only metadata. This is significantly higher than previous approaches and, once calibrated, gives highly accurate estimates of gender balance in groups. Performance suffers only marginally if we reduce the training size to 5,000, but significantly decreases in a smaller training set. We finally show that our indicators capture a large range of behavioral traits using factor analysis and that the framework can be used to predict other indicators of vulnerability such as age or socio-economic status. Mobile phone data has a great potential for good and our framework allows this data to be augmented with vulnerability and other information at a fraction of the cost.
Article
Full-text available
Large-scale data from digital infrastructure, like mobile phone networks, provides rich information on the behavior of millions of people in areas affected by climate stress. Using anonymized data on mobility and calling behavior from 5.1 million Grameenphone users in Barisal Division and Chittagong District, Bangladesh, we investigate the effect of Cyclone Mahasen, which struck Barisal and Chittagong in May 2013. We characterize spatiotemporal patterns and anomalies in calling frequency, mobile recharges, and population movements before, during and after the cyclone. While it was originally anticipated that the analysis might detect mass evacuations and displacement from coastal areas in the weeks following the storm, no evidence was found to suggest any permanent changes in population distributions. We detect anomalous patterns of mobility both around the time of early warning messages and the storm’s landfall, showing where and when mobility occurred as well as its characteristics. We find that anomalous patterns of mobility and calling frequency correlate with rainfall intensity (r = .75, p < 0.05) and use calling frequency to construct a spatiotemporal distribution of cyclone impact as the storm moves across the affected region. Likewise, from mobile recharge purchases we show the spatiotemporal patterns in people’s preparation for the storm in vulnerable areas. In addition to demonstrating how anomaly detection can be useful for modeling human adaptation to climate extremes, we also identify several promising avenues for future improvement of disaster planning and response activities.
Article
Full-text available
The present study provides the first evidence that illiteracy can be reliably predicted from standard mobile phone logs. By deriving a broad set of mobile phone indicators reflecting users financial, social and mobility patterns we show how supervised machine learning can be used to predict individual illiteracy in an Asian developing country, externally validated against a large-scale survey. On average the model performs 10 times better than random guessing with a 70% accuracy. Further we show how individual illiteracy can be aggregated and mapped geographically at cell tower resolution. Geographical mapping of illiteracy is crucial to know where the illiterate people are, and where to put in resources. In underdeveloped countries such mappings are often based on out-dated household surveys with low spatial and temporal resolution. One in five people worldwide struggle with illiteracy, and it is estimated that illiteracy costs the global economy more than 1 trillion dollars each year. These results potentially enable costeffective, questionnaire-free investigation of illiteracy-related questions on an unprecedented scale
Article
Full-text available
Climate change is likely to drive migration from environmentally stressed areas. However quantifying short and long-term movements across large areas is challenging due to difficulties in the collection of highly spatially and temporally resolved human mobility data. In this study we use two datasets of individual mobility trajectories from six million de-identified mobile phone users in Bangladesh over three months and two years respectively. Using data collected during Cyclone Mahasen, which struck Bangladesh in May 2013, we show first how analyses based on mobile network data can describe important short-term features (hours–weeks) of human mobility during and after extreme weather events, which are extremely hard to quantify using standard survey based research. We then demonstrate how mobile data for the first time allow us to study the relationship between fundamental parameters of migration patterns on a national scale. We concurrently quantify incidence, direction, duration and seasonality of migration episodes in Bangladesh. While we show that changes in the incidence of migration episodes are highly correlated with changes in the duration of migration episodes, the correlation between in- and out-migration between areas is unexpectedly weak. The methodological framework described here provides an important addition to current methods in studies of human migration and climate change.
Conference Paper
Full-text available
Deep learning has in recent years brought breakthroughs in several domains, most notably voice and image recognition. In this work we extend deep learning into a new application domain - namely classification on mobile phone datasets. Classic machine learning methods have produced good results in telecom prediction tasks, but are underutilized due to resource-intensive and domain-specific feature engineering. Moreover, traditional machine learning algorithms require separate feature engineering in different countries. In this work, we show how socio-economic status in large de-identified mobile phone datasets can be accurately classified using deep learning, thus avoiding the cumbersome and manual feature engineering process. We implement a simple deep learning architecture and compare it with traditional data mining models as our benchmarks. On average our model achieves 77% AUC on test data using location traces as the sole input. In contrast, the benchmarked state-of-the-art data mining models include various feature categories such as basic phone usage, top-up pattern, handset type, social network structure and individual mobility. The traditional machine learning models achieve 72% AUC in the best-case scenario. We believe these results are encouraging since average regional household income is an important input to a wide range of economic policies. In underdeveloped countries reliable statistics of income is often lacking, not frequently updated, and is rarely fine-grained to sub-regions of the country. Making income prediction simpler and more efficient can be of great help to policy makers and charity organizations – which will ultimately benefit the poor.
Book
This book was originally published by Macmillan in 1936. It was voted the top Academic Book that Shaped Modern Britain by Academic Book Week (UK) in 2017, and in 2011 was placed on Time Magazine's top 100 non-fiction books written in English since 1923. Reissued with a fresh Introduction by the Nobel-prize winner Paul Krugman and a new Afterword by Keynes’ biographer Robert Skidelsky, this important work is made available to a new generation. The General Theory of Employment, Interest and Money transformed economics and changed the face of modern macroeconomics. Keynes’ argument is based on the idea that the level of employment is not determined by the price of labour, but by the spending of money. It gave way to an entirely new approach where employment, inflation and the market economy are concerned. Highly provocative at its time of publication, this book and Keynes’ theories continue to remain the subject of much support and praise, criticism and debate. Economists at any stage in their career will enjoy revisiting this treatise and observing the relevance of Keynes’ work in today’s contemporary climate.
Conference Paper
The mapping of populations socio-economic well-being is highly constrained by the logistics of censuses and surveys. Consequently, spatially detailed changes across scales of days, weeks, or months, or even year to year, are difficult to assess; thus the speed of which policies can be designed and evaluated is limited. However, recent studies have shown the value of mobile phone data as an enabling methodology for demographic modeling and measurement. In this work, we investigate whether indicators extracted from mobile phone usage can reveal information about the socio-economical status of microregions such as districts (i.e., average spatial resolution < 2.7km). For this we examine anonymized mobile phone metadata combined with beneficiaries records from unemployment benefit program. We find that aggregated activity, social, and mobility patterns strongly correlate with unemployment. Furthermore, we construct a simple model to produce accurate reconstruction of district level unemployment from their mobile communication patterns alone. Our results suggest that reliable and cost-effective economical indicators could be built based on passively collected and anonymized mobile phone data. With similar data being collected every day by telecommunication services across the world, survey-based methods of measuring community socioeconomic status could potentially be augmented or replaced by such passive sensing methods in the future.
Article
Performance metrics for binary classification are designed to capture tradeoffs between four fundamental population quantities: true positives, false positives, true negatives and false negatives. Despite significant interest from theoretical and applied communities, little is known about either optimal classifiers or consistent algorithms for optimizing binary classification performance metrics beyond a few special cases. We consider a fairly large family of performance metrics given by ratios of linear combinations of the four fundamental population quantities. This family includes many well known binary classification metrics such as classification accuracy, AM measure, F-measure and the Jaccard similarity coefficient as special cases. Our analysis identifies the optimal classifiers as the sign of the thresholded conditional probability of the positive class, with a performance metric-dependent threshold. The optimal threshold can be constructed using simple plug-in estimators when the performance metric is a linear combination of the population quantities, but alternative techniques are required for the general case. We propose two algorithms for estimating the optimal classifiers, and prove their statistical consistency. Both algorithms are straightforward modifications of standard approaches to address the key challenge of optimal threshold selection, thus are simple to implement in practice. The first algorithm combines a plug-in estimate of the conditional probability of the positive class with optimal threshold selection. The second algorithm leverages recent work on calibrated asymmetric surrogate losses to construct candidate classifiers. We present empirical comparisons between these algorithms on benchmark datasets.
Article
Predicting unmeasurable wealth In developing countries, collecting data on basic economic quantities, such as wealth and income, is costly, time-consuming, and unreliable. Taking advantage of the ubiquity of mobile phones in Rwanda, Blumenstock et al. mapped mobile phone metadata inputs to individual phone subscriber wealth. They applied the model to predict wealth throughout Rwanda and show that the predictions matched well with those from detailed boots-on-the-ground surveys of the population. Science , this issue p. 1073