Conference PaperPDF Available

Abstract and Figures

This study aims to shed light on various aspects of refugees’ lives in Turkey using mobile call data records of Türk Telekom, which is enriched with numerous local data sets. To achieve this, we made use of several techniques in addition to a novel methodology we developed for this particular domain. Our results showed that refugees are highly mobile as a survival strategy, a significant number of whom work as seasonal workers. Most prefer to live in relatively cheap neighborhoods, close to city transport links and fellow refugees. The ones living in low-status neighborhoods appear to be introvert, living in a closed neighborhood. However, the middle and upper class refugees appear to be the opposite. Fatih, İstanbul was found as an important hub for refugees. Finally, the officially registered refugee numbers do not reflect the real refugee population in Turkey. Due to their high mobility, refugees lag behind in keeping up-to-date information about their residential address, resulting in a significant discrepancy between the official numbers and the real numbers. We believe that policy makers can benefit from the proposed methods
Content may be subject to copyright.
Data Analytics without Borders: Multi-Layered Insights
for Syrian Refugee Crisis
Özgün Ozan Kılıç1, Mehmet Ali Akyol1, Oğuz Işık1, Banu Günel Kılıç1, Arsev Umur
Aydınoğlu1, Elif Surer1, Hafize Şebnem Düzgün2, Sibel Kalaycıoğlu1, Tuğba Taşkaya
1Middle East Technical University, 06800, Çankaya, Ankara
2Colorado School of Mines, Brown Hall 268, CO 80401, USA
Abstract. This study aims to shed light on various aspects of refugees’ lives in
Turkey using mobile call data records of Türk Telekom, which is enriched with
numerous local data sets. To achieve this, we made use of several techniques in
addition to a novel methodology we developed for this particular domain. Our
results showed that refugees are highly mobile as a survival strategy, a significant
number of whom work as seasonal workers. Most prefer to live in relatively
cheap neighborhoods, close to city transport links and fellow refugees. The ones
living in low-status neighborhoods appear to be introvert, living in a closed
neighborhood. However, the middle and upper class refugees appear to be the
opposite. Fatih, İstanbul was found as an important hub for refugees. Finally, the
officially registered refugee numbers do not reflect the real refugee population in
Turkey. Due to their high mobility, refugees lag behind in keeping up-to-date
information about their residential address, resulting in a significant discrepancy
between the official numbers and the real numbers. We believe that policy
makers can benefit from the proposed methods in this study to develop real-time
solutions for the well-being of refugees.
Keywords: health · education · unemployment · social integration · safety and
1 Introduction
The civil war in Syria has caused one of the biggest forcibly displaced population in
human history [1]. Turkey has become the main destination for Syrian refugees, with
around 5 million. Although there are camps built for the refugees with better living
conditions than urban areas [2], more than 90% of the Syrian population in Turkey live
outside formal camps within host communities, the reasons for which are given as
overcrowded camps, illegally entered individuals not being allowed to register to a
camp, family ties, and financial independence [3]. The status of Syrian refugees under
temporary protection is shaped within the framework of “Temporary Protection
Regulation.” It is stated that under this regulation, the problems such as education,
health, work permit, and access to social services and assistance are solved. They are
also treated the same as the Turkish citizens in accessing such rights given that they are
registered with Ministry of Interior Directorate General of Migration Management
At the beginning, Syrian refugees were mainly located in the Southeast Anatolian
region bordering Syria. However, over time and with the influx of arriving refugees,
they expanded to other regions as well, covering the Mediterranean, Aegean, Central
Anatolia, and Marmara regionsİstanbul having the highest number of refugees. So
far, Turkey has provided exceptional support to Syrian refugees [4]; however, the
problems are mounting. They can be summarized as income, unemployment, education,
health, housing, and social tensions [3, 5, 6].
The Syrian refugees have impacted the economy [7]. For instance, “around 1.8
million of the Syrian refugees are of working age” [8]. Although some entrepreneurial
efforts have been observed and some of the refugees are skilled, most of the refugees
are employed as unqualified labor. Through supplying inexpensive informal labor for
labor-intensive sectors, refugees displaced native workers, both formal and informal
unemployment rates have increased, and furthermore, it was observed that in these
sectors the prices had fallen around 4% [9]. At the beginning of the refugee crisis, the
Turkish economy had been experiencing a transition from being a low-wage country to
one based on skilled labor. With the arrivals of Syrian refugees, this transition has
started to decelerate as they have offered cheap low-skilled labor to the job market,
which took advantage of their vulnerabilities. In particular, several refugees found jobs
as seasonal workers or in small industrial areas (“sanayi siteleri”) [10].
At the time of refugees’ arrival, Turkish cities were undergoing a profound
transformation in terms of housing. Illegal settlements (“gecekondu”) have started to
be demolished and TOKI (Governmental Mass Housing Administration) aimed to
regulate the housing market [11] which provided partial solutions to the problem.
Refugees arrived when Turkey was still struggling with its urbanization problems.
Therefore, refugees found safer places in the fragmented cities easily. A good evidence
of it is that they settled into still-untransformed poor and environmentally low-quality
districts, which are very close to city centers. These districts provided life-saving
pockets for refugees where they can survive easily.
Big data have recently started to be used to address big social and environmental
challenges in developing countries [12]. With ethical and privacy issues on mind,
humanitarian use of private data such as mobile call data records has a great potential
in improving society [13]. Data for Refugees, which is a good example of “big data for
good”, is a research challenge aiming to provide better living conditions to Syrian
refugees in Turkey [14]. In this research challenge, we investigated the mobility
patterns of refugees from different points of views in order to provide multi-layered
insights for the Syrian refugee crisis. We found out that refugees are highly mobile in
Turkey as a survival strategy. We also carried out detailed analyses based on three
different districts and cities, which we chose according to our previous results. When
enriched with our secondary data sets, we saw that those living in low-status
neighborhoods are introvert unlike refugees living in middle and high-status
As a result, the repeatability and the reproducibility of the proposed methods can be
beneficial to policy makers to obtain real-time insights about seasonal workers and to
arrange services such as mobile health and education services on time. We also put
some of our interactive visualization tools online on The project
website also provides detailed information about each step we have carried out in our
analyses with several examples.
2 Technical Description
D4R Challenge provided three main data sets (DS1, DS2, and DS3) along with some
helper files, which were collected from 992457 customers of Türk Telekom, of which
184,949 are tagged as “refugees”, and 807508 as Turkish citizens in 2017. As the paper
provides a description of all features, sampling strategies and anonymization methods
in depth [15], we will skip those in this paper and describe the other data sets (hereafter
called “secondary data sets”) we used.
Primary data sets
Data Set 1 (DS1) comprises annual antenna traffic between each site.
Data Set 2 (DS2) includes cell-tower identifiers of randomly chosen active users’
hourly based two-week call detail records. A user’s either incoming or outgoing
call traffic is provided but not both.
Data Set 3 (DS3) consists of randomly chosen refugee and non-refugee annual call
traffic but with reduced spatial resolution (district level).
BTS Locations (BL) comprises cell tower locations in latitude and longitude.
District Mapping (DM) maps the district IDs used in DS3 to district names.
City Mapping (CM) maps the city IDs used in DS3 to city names.
Secondary data sets
Neighborhood-level, district-level, and city-level geospatial data sets indicating the
administrative borders in Turkey (GSD) obtained from various official sources and
government agencies and used for any information that we needed to filter by a
geographic region.
Neighborhood-level population data and various statistics from 2013 to 2017 for
various cities (PD) obtained from Turkish Statistical Institute (TurkStat).
Coordinates of houses and workplaces for rent and for sale in various
neighborhoods in İstanbul (FRE), scraped from Hürriyet Emlak [16].
Rental fees and other relevant information such as the area of use and building age
for various neighborhoods in İstanbul districts (RF), scraped from Hürriyet Emlak
The results of 2017 Address-based Population Registry System where the
education levels of residents are given at neighborhood level (APRS2017).
Some of the terms defined in this study are borrowed from Ahas et al. [17]:
Mobile Operator User (MOU): A subscriber of the mobile operator, which is
present in the DS2 and DS3.
Home-Time Anchor Point (HAP): An everyday anchor point, at which the probable
home location of a person is identified based on the model.
Work-Time Anchor Point (WAP): An everyday anchor point, at which the probable
work-time location of a person is identified based on the model. As the demographics
of MOU are not known, it is not possible to determine whether that person is indeed
working, studying or unemployed. Therefore, we called it work-time anchor point to
refer to the most probable working time of day of that specific MOU.
Hereafter, we used the abbreviations in the parentheses provided for the data set
descriptions in the forthcoming sections.
We mainly used R to manipulate, analyze, query, and visualize data. We also utilized
GIS applications such as ArcMap and QGIS to match and enrich the data sets with our
secondary data sets, some of which were collected through web scraping and/or Google
Places API through Python/R. We benefited from MySQL and NoSQL approaches such
as Hive that work inside distributed frameworks that focus on big data, such as Hadoop,
to store and manipulate data particularly for organizing and querying data. The network
analyses were made in Pajek.
3 Pre-processing
Data quality issues with respect to base transceiver stations (BTS) were handled. Each
BTS was assigned to a neighborhood-level, district-level, and city-level administrative
units in Turkey by the coordinates, using GSD. The ones that do not fall inside the
administrative borders of Turkey or the ones that do not have coordinates in the first
place were discarded. This way, a total number of 98854 BTS were matched with city-
district pairs. In addition, it was noticed that the city and district columns in BL are not
entirely correct. As it can be seen in Fig. 1, there are several BTS coordinates, labeled
as located in İstanbul, were not within the administrative district. Those were corrected
using GSD.
Fig. 1. BTS coordinates provided in BL marked as located in İstanbul but not within the
administrative district of İstanbul are highlighted on the map.
DS1 included BTS data, which have zero number of calls or no entries in DS1 in
different days or times of days. Fig. 2 shows one of the BTS’ call numbers, some of
which are not present in DS2. Apart from BTSs that have no data at all (more than half
of the whole BTS population), a BTS has 82 missing days at best and 364 missing days
at worst in total while the median number of the missing days is 92. Considering that
some of the most missing ones belong to urban and dense urban areas (as also noted by
the data set itself), it suggests that at least some of the missing days might be related to
data quality issues rather than the lack of mobile traffic. This led us to work with
average call numbers per BTS but not with cumulative sums, where the days with no
data are excluded from calculations.
Fig. 2. The number of calls per day in July, 2017 in Çaykara, Trabzon measured in BTS number
5066930. The red line shows the total outgoing refugee call and SMS traffic and the blue line
shows the total outgoing non-refugee call and SMS traffic. There are no instances in the data set
for a number of days.
4 Methodology
The mobility of refugees is analyzed in five folds. In the first approach, we aim to
understand how comparable the registered Syrian population in each city of Turkey is
with that of call records in city level obtained from BTS call statistics. With the second
approach, we analyze city networks connected by refugee mobility. With the third
approach, we investigate the monthly refugee movement per district to be able to
understand the influx of refugees over time. In particular, we address which districts
are increasingly attracting refugees and whether there are any specific districts
attracting refugees at a specific time of the year. In the fourth approach, we identify the
most probable work and home-time anchor points from two-week data on selected rural
and urban locations using mobility statistics and clustering algorithms. This output
enabled us to identify the popular places for work and home of refugees. Here, we
developed a new method to identify possible work and home time locations. Finally,
we explore three different locations to understand the mobility patterns of refugees in
depth using several statistics.
4.1 Background: Approaches for Determining Meaningful Places
Meaningful places or meaningful locations are defined as regularly visited places,
which have a meaning for a person [18]. Mobile positioning data have been increasingly
used for determining meaningful locations. In particular, finding work-time and home-
time anchor points have been extensively studied in the literature [17, 19]. The common
assumption in finding these places relies on a frequentist approach. First, a specific time
interval is defined for work and home time periods. Second, the number of call days
and the total number of calls are considered within these time intervals to identify
significant anchor points. For example, if mobile calls are made from a specific BTS
repeatedly during the home time, that BTS is marked as a potential home-time anchor
point. In urban areas, several BTS can be located at the same site. Such BTS locations
can be clustered using Hartigan’s algorithm or k-medoid, and site information can be
utilized when calculating mobile call metrics. Networks are also constructed including
these frequent nodes (a node corresponds to a cluster comprising one or more BTS
locations) to correctly identify home and work locations [19]. Finally, several metrics
such as regularity, entropy, or radius of gyration (RoG) can be computed to understand
the behavior of citizens [20].
Mobile traffic signatures, defined as the typical activity pattern of the mobile demand
at one specific geographic zone, have been recently used to investigate the relationship
between urban fabrics such as touristic and leisure places, and mobile network usage
[21]. Different metrics were proposed based on voice and text traffic volume over
weekends and weekdays some of which take into consideration the seasonality.
4.2 Comparison of Registered Syrian Refugee Population with the Number of
Directorate General of Migration Management of Turkey published the registered
Syrian population in each city of Turkey for 2017 [22]. According to the statistics, the
highest numbers of refugees are located in İstanbul, Şanlıurfa, Hatay, and Gaziantep
with 479555, 420532, 384024, and 329670, respectively. In this part of the study, we
aggregated BTS call statistics at city level for refugees as follows: First, we calculated
the total number of refugee calls and such for each day per BTS. Then, we calculated
the monthly average from the daily totals per BTS in each month while discarding the
days with no calls due to the aforementioned data quality problem. Then, we computed
the sum of each BTS over a year to obtain the cumulative annual use of each BTS,
which is denoted as BRi. Each BTS is geospatially associated with a district and city
using GSD. Finally, for each city, we summed up all BTS aggregated call data, which
is denoted as Ci. The vertical percentage of refugee calls for each city is found using
VPCRi = Ci /
, where 81 is the total number of cities in Turkey. Then, we
made use of the registered Syrian population PRi and the total population of each city
in Turkey P to calculate the vertical percentage of the registered refugee population
with VPRi =PRi /Pi. Finally, we divided VPCRi by VPRi to obtain the magnitude of the
difference between the two statistics, which is denoted as MD. The map plotted in Fig.
3 shows the results. Although the registered Syrian refugees are officially reported low
in Antalya, the total number of refugees’ calls in Antalya is quite higher than expected.
The second highest city is Kilis, which shares a border with Syria. There also seems to
be an undocumented influx to East and West Black Sea regions, although it is not as
significant as the previously mentioned ones, which will be examined closely in the
following sections.
Fig. 3. The ratio of the total refugee calls per city in 2017 to the official residency records, higher
numbers (represented with lighter colors) indicating a higher refugee influx compared to official
figures (higher than expected)
The results show that there is a discrepancy between the number of registered users and
the call data in certain cities. They indicate that although some refugees change their
residency addresses, they do not inform the state agencies about this change on time.
An expert working in Refugees and Migrants Solidarity Association we consulted also
stated that they were suspecting of numerous undocumented refugees living in Antalya
but they are not sure about how big the discrepancy is.
4.3 Analysis of City Networks Connected by Refugee Mobility
In order to understand how Syrian refugees use space in Turkey, incoming and outgoing
calls data in DS3 were used to form 1-mode and 2-mode networks of refugees and the
cities in which they have made phone calls. Initial basic statistics showed that refugees
are highly mobile. Out of 37300 refugee MOUs, 53.8% visited only one, 19.9% two,
9.3% three, 5.2% four and 11.8% five or more cities.
In network analysis, initially multiple lines were summed in the 2-mode network of
refugees and the cities to obtain the total number of calls a refugee has made in a city.
In order to determine the cities where refugees reside in, rather than visit briefly during
a trip, the ties that indicate less than 100 phone calls in a city in one year were removed.
Then, all line values were replaced with the value 1, since the focus of attention is the
presence of a refugee in a city, not how many phone calls they have made. This 2-mode
network was then used to obtain the 1-mode network of cities. In this network, a pair
of cities is connected by refugees who have been to both cities. In the 1-mode network,
the value of the line that connects any two cities is the total number of refugees who
have been in those cities (aka “network traffic” in Figure 4).
In order to quantify the most important cities for refugee mobility, weighted degree
centralities were calculated. Top 10 cities that receive the highest refugee traffic in
descending order were found as İstanbul, Gaziantep, Ankara, İzmir, Mersin, Adana,
Hatay, Antalya, Şanlıurfa, and Kocaeli. Then, the lines with values lower than 50 were
removed to simplify the graph and the largest connected component was determined,
which yielded 33 cities. Fig. 4 shows the resulting ties between these cities. The sizes
of the vertices show the registered Syrian refugee population [22] in the corresponding
cities with a minimum of 155 (Giresun), a maximum of 479,555 (İstanbul), and a
median of 8120. The widths of the lines indicate the total number of refugees linking
the cities, with a minimum of 50 (between İstanbul and Kayseri), a maximum of 318
(between İstanbul and Kocaeli), and a median of 79.5. Some cities, such as Edirne,
Tekirdağ, Çanakkale, Aydın, Muğla, Antalya, Kastamonu, Samsun, Trabzon, Ordu,
Giresun, Tokat, and Sivas have a small number of registered refugees. Yet, the analysis
shows that refugees visit and stay in these cities. The majority of these travels are
to/from İstanbul only. The strongest single link a city has (compared to its registered
refugee population) belongs to Antalya, tied to İstanbul.
Fig. 4. Network map indicating the refugee links between cities (lines) along with their registered
Syrian refugee populations (vertices)
4.4 Investigating monthly refugee movement per district
In this section, monthly refugee mobility is studied. Some refugees are known to be
highly mobile and work as seasonal agricultural workers in Turkey. To be able to
identify districts receiving the highest refugee influx, we calculated the sum of BRi for
each district, which is denoted as BRDi. As we will study the density and distribution
of the outgoing refugee calls in DS1, we need to consider districts that have a significant
number of call data. This is due to the fact that the mobility pattern of subscribers
without heavy phone usage can be properly characterized, is indeed questionable [23].
The study reported that 17% and 38% of subscribers in their CDR data set had two or
fewer records and fewer than seven CDR, respectively. In addition, some BTS records
are not present in DS1. As it is not possible to understand the reason of omissions
(whether it is a measurement problem or indicating zero call entries), we performed
filtering, which resulted in discarding two-thirds of district data. As a rough cut off
point, the districts with total outgoing refugee calls in 2017 lower than 3650 were
filtered out to improve calculation time and reliability of the analysis results. This is
due to the fact that to make an inference with a few calls can produce biased estimations.
This threshold is determined using the mobile phone usage distribution statistics
provided in [23].
Fig. 5. Some of the districts with significantly higher outgoing refugee calls and/or monthly
changes that can be explained
By looking at the monthly SCS values and their first order differences, we
highlighted some of the districts that feature significant changes as seen in Fig. 5. Later,
we consulted experts to understand the reason for the influx of patterns. Fatih-İstanbul
was reported as an attraction center for refugees, the exact reasons for which are not
clear. However, it is advocated that the low prices in accommodation, Fatih’s easy
access to other districts and the existing local community’s religious background might
have attracted them. This district is studied in depth in the forthcoming section. The
agriculture expert we consulted informed us of many changes in the districts coincide
with the harvest times. To be specific, the harvests of Çaykara-Trabzon in July and
August are hazelnut and tea, Kastamonu in September and October are sugar beet,
garlic, and paddy. In addition to that, the harvests of Iğdır in August/September are
sugar beet and cotton while the harvest of Osmaniye in October is peanut. Wealthy
refugees visit summer locations as frequently as non-refugees. We see an interesting
peak in Kemalpaşa-İzmir in July and a smaller peak in August. In addition, for
February/March in Iğdır, there is another one we observed. However, at the time of
writing this report, we still could not reach an authority who could explain the exact
reason for this refugee influx in July for Kemalpaşa and in February/March for Iğdır.
On the other hand, for the smaller peak in Kemalpaşa in August, we were told by the
expert that the reason of the increase in August could be attributed to the religious
festival for Alevis, which is called Hamzababa Anma Törenleri (Hamzababa
Commemoration) taking place between 28 and 29th of August each year. For the peak
in July for Kemalpaşa, the experts from numerous local municipality services we
consulted, speculated that it is a high season for harvesting and refugees might have
stayed in this district for temporary accommodation. The agriculture expert also
suggests the refugee influx in Mezitli is based on tourism and the fact that the peak
happens in October does not contradict with it since Mersin is known for its scorching
heat and humidity which shifts the tourism season towards fall. Kilis has a border with
Syria, constantly receives refugees. We have provided the results of some districts in
Fig. 3. Many more others can be viewed on our web site using an interactive tool.
4.5 A New Method to Understand Possible Work-Time and Home-Time
In this section, we investigate refugees’ meaningful places, specifically work-time and
home-time mobility patterns, which are quantified using well-known measures used in
the literature [24]. In particular, different aspects of irregularity are studied. We
extended a well-known method in the literature, which was described in Section 4.1.
The main contribution of our proposed method is rather than using a pre-determined
time to identify work and home time locations, we identified important anchor points
automatically using an algorithm. Many people such as white-collar workers have very
structured daily lives. Their work hours are usually between 9:00 A.M. to 6:00 P.M.
However, a garbage collector visits several districts to collect recycled materials such
as glass, paper or plastics, while a seasonal agricultural worker might have changed his
location within a week and moved to another district. Some people might have been
working at different times of days, which we came across in the data sets considerably.
Therefore, it may not be convenient to use a pre-determined time to find WAP and HAP
locations. In addition, a static threshold for number of calls or number of call days is
generally used to filter out the MOUs as their HAP and WAP information cannot be
obtained due to limited number of call data. Instead of it, we used a clustering algorithm
in this study. Later, we made use of DS2 to test our methodology.
The pseudocode of the algorithm is provided in Table 1. The first step of the
algorithm involves finding the idle hours of a given user. As there might be idle hours
during the working-time, we made use of a median filter, where each hour is replaced
with the median value of a moving filter. Finding the first quartile enables us to identify
the start and end points of a continuous time block. For the home-time period, we
assume that MOU can be highly likely to be at home just before the idle time period so
we chose a time interval starting three hours before the starting point of the idle time
and ending at the end point of the idle time. Likewise, we considered four hours after
the idle time as a starting time of a work-time period as we assumed that these four
hours will highly likely to include home and commute locations. It is not meant the real
work-time will start at that point rather we aim to discard noisy data. These numbers
are obtained empirically based on DS2. Finally, the closer BTS locations are clustered.
As mentioned by Isaacman et al. [25], in urban areas the BTS can be as dense as 200
meters and in suburban areas, they can be 3-5 kilometers apart. Therefore, we have tried
different values for the radius, such as 200 meters, 500 meters, and 1 kilometer and the
radius of 1 km gave more meaningful results.
Table 1. Algorithm
Algorithm to detect work-time and home-time patterns
input: caller_id CID in DS2
output: WAP and HAP
1. Retrieve CDR of CID
2. Calculate hourly call counts hourly_callsi from CDR, where i>=0 and i<24
3. If there is no CDR in any hourly_callsi, assign it to zero.
4. Apply median filter with a window size three on hourly_callsi, to obtain filtered data,
denoted as filtered_hourly_callsi.
5. Sort the filtered_hourly_callsi in descending order.
6. Obtain the first quartile of filtered_hourly_callsi , which is denoted as fq.
7. Find the minimum and maximum hours in fq, denoted respectively as fqmin and fqmax
8. Determine the start and end points of WAP and HAP as follows:
Let wtp_start be the work-time period start time, where wtp_start = fqmax+4
Let wtp_end be the work-time period end time, where
wtp_end=wtp_start + 6
Let h_start be the home-time period start time, where h_start = fqmin -3
Let h_end be the home-time period end time, where h_end = fqmax
9. Find the most used BTS in terms of calls days between wtp_end and wtp_start, which is
denoted as BTSWmax
10. Apply Hartigan’s leader algorithm [26] to all BTS between wtp_end and wtp_start.
11. Select the cluster in which the most used BTS resides, which is denoted as WAP.
12. Find all BTS on the same calls days with WAP between h_end and h_start, denoted as
13. Find the most used BTS in BTSHomeAl l, denoted as BTSHmax
14. Apply Hartigan’s leader algorithm to all BTS between h_end and h_start
15. Select the cluster in which the most used BTS resides, which is denoted as HAP
Then, we have extracted 31 number features from two-week CDR data of each
MOU. Some of these features are borrowed from Soto et al. [27]. The complete list of
features can be found on our website. The significant features selected by the process
described below and our consequent statistical analyses are as follows:
N_call_days/N_call_days_home_time/N_call_days_work_time: The number of
unique days, unique days recorded within HAP, unique days recorded within WAP
N_calls: The number of calls
N_city: The total number of cities the MOU is seen
N_district: The total number of districts the MOU is seen
RoG [27]
Entropy_bts: Entropy of MOU based on the BTS footprint
Entropy_district: Entropy based on the district footprint
Entropy_district_home_time: Entropy calculated within HAP (based on district
Entropy_district_work_time: Entropy calculated within WAP (based on district
Entropy_cluster_home_time: Entropy calculated within HAP (based on the clusters
formed after Hartigan’s leader algorithm)
Entropy_cluster_work_time: Entropy calculated within WAP (based on the clusters
formed after Hartigan’s leader algorithm)
Ref_nonref_ratio: Ratio of calls made/received to/from refugees to non-refugees
Total_movement: Total distance of travel made by MOU, calculated by summing
the distance between each following BTS used by the MOU
Work_home_dist: Haversine distance between the WAP and HAP
After deriving these features, we first applied sparse K-means (SK-means) algorithm
using R’s RSKC package [28] with different parameters to obtain the most relevant,
uncorrelated and non-redundant features. As a result, we ended up with the following
features (sorted in descending order according to their significance values):
n_call_days, n_calls, n_call_days_work_time, n_call_days_home_time,
entropy_cluster_home_time, entropy_cluster_work_time, entropy_district,
entropy_district_home_time, rog_work_time, rog_home_time, n_city, and
work_home_dist to cluster the MOUs. Finally, these features are used as inputs to Self
Organizing Map (SOM) [29], which is a type of artificial neural network to visualize
high dimensional data in a low-dimensional grid. SOM produced different mobility
clusters and we selected the instances falling into the cluster nodes, where there is
sufficient number of call days and number of calls, which are important to detect HAP
and WAP more accurately. Finally, we have identified regions mostly preferred for
work and living purposes by refugees. Note that two week data is quite limited to
estimate WAP and HAP of an individual accurately. Hence, if there are not sufficient
numbers of data points for a MOU, the algorithm cannot successfully identify these
locations. The detailed steps of the algorithm are provided in the Appendix section
including how to interpret clusters with examples as well.
Since we did not have a ground truth in hand for evaluation, we scraped seemingly
the biggest online real estate website in Turkey, Hürriyet Emlak (2018), and obtained
a total of 701 house/workplace ads for rent/for sale that correspond to the
neighborhoods in Fatih, İstanbul (FRE). The sampling was neighborhood-wise
stratified (to ensure each neighborhood is represented) and ad-wise systematic (to
ensure that the price range and distribution are reflected) as every nth ad was recorded
from the list of ads ordered by price while while
𝑛 = ⌊𝑎𝑑_𝑠𝑖𝑧𝑒/𝑠𝑎𝑚𝑝𝑙𝑒_𝑠𝑖𝑧𝑒
. The
sampling of the ads for a specific neighborhood was only applied if the neighborhood
had more than 10 ads for houses or workplaces, separately. After collecting 338 houses
and 363 workplaces, we sampled the house and workplace locations that fall inside
Fatih, İstanbul (by also using GSD) from the cluster of refugees that we are most
certain. Again, we used stratified sampling to sample exactly 15 (house & workplaces
in total) coordinates from each neighborhood and we obtained 338 houses and 517
workplace locations (855 points in total). The difference in numbers of points (701 vs.
855) is due to different sample sizes, Hürriyet Emlak not having all the neighborhoods
available for filtering, and lack of ads for certain neighborhoods. Then, we compared
the locations of the ads with refugees’ predicted work/home locations. As can be seen
in Fig. 6, some locations such as İskenderpaşa neighborhood (central region) are both
residential and working places. The shops are mainly located on the sideways of the
main street. However, like in Tahtakale neighborhood (the northeastern region of
Fatih), some are mostly business related locations while some are mostly residential
places as in Şehremini or Silivrikapı neighborhoods (southwestern region). Our results
coincide with the ad types.
Fig. 6. HAP and WAP locations obtained from refugees’ call data (indicated with light orange
and light gray) along with the houses and workplaces obtained from Hürriyet Emlak (indicated
with orange and dark gray)
5 Investigating Regions in Depth: Case Studies for
Çaykara/Trabzon, Fatih/İstanbul, and Mezitli/Mersin
In this section, based on our previous findings, we explore three districts and cities in
detail. We look into the characteristics of these locations and attempt to understand
refugees’ living conditions.
5.1 Findings for Çaykara/Trabzon
Our initial analysis showed that Çaykara, Trabzon has a significant peak in August. An
agriculture expert we consulted suggested that it is most probably related to the harvest
of tea (which also gives its name to the district) and hazelnut. We also confirmed it with
“Development Workshop” reports [10]. The report states that the Black Sea region in
Turkey receives seasonal Syrian agricultural workers between August and September,
first starting to work in gardens in coastal regions then move innermost.
We analyzed DS2 and DS3 to find the refugees that were present around the specific
BTS (#5066930) in August and understand where they come from. We found that
people present in the area were also highly present in İstanbul and Mersin. The
coordinates where the phone calls occurred also come right on top of the usual intercity
bus route between İstanbul and the Black Sea region. The seasonal workers here might
be later switching to Mezitli, Mersin region (which can explain its peaks in October)
once the harvest is done. When we looked at DS3 to find the refugees that were in
Çaykara in August, we found 306 callers (people with outgoing call data in DS3) and
275 callees (people with incoming call data in DS3). Interestingly, the most common
district that these callers and callees present in 2017 was Fatih, İstanbul (68.4% and
60%, respectively). However, based on their most frequent call locations, it seems like
these refugees do not live in Fatih, İstanbul. They either live in Ortahisar, Trabzon (a
coastal district) or various districts in İstanbul, such as Kağıthane (a district known until
recently for its low living standards but undergoing a rapid transformation in some
5.2 Findings for Fatih/İstanbul
İstanbul appeared to be the top refugee location according to Directorate General of
Migration Management of Turkey (DGMM 2017) and in DS1. The first thing to be
noted as for the distribution of refugee calls in İstanbul where almost 40% of registered
refugees live, is the high concentration of such calls on the European side (see Fig. 7).
Given the fact that most of the city’s population, central business activities, job
opportunities, and urban amenities are on the European side, this concentration of
refugee calls is in harmony with the basic characteristics of the human geography of
the city.
Fig. 7. Dot density map of refugee calls (each dot represents 1000 calls) obtained using DS1
On the European side, refugees seem to prefer such districts as (from west to east)
Esenyurt, parts of Güngören, Bağcılar, Zeytinburnu, Fatih, Beyoğlu, Şişli, and Beşiktaş
districts. Of these districts, Şişli and Beşiktaş are known to be the parts of the modern
city center. The neighborhoods preferred by the refugees are close to the main transport
routes and provide easy access to some important urban amenities such as city center,
business districts, and entertainment facilities. These locations are also mostly low-
income areas of the city. In an attempt to prove this claim, we use APRS2017 data set.
The map in Fig. 8 shows the distribution of university graduates by neighborhoods as
a percentage of total population above 6 years of age together with the dot density map
of refugee calls on the European part of the city. We know from previous studies that
education level is almost perfectly correlated with income level, meaning that the
higher the education level in a given neighborhood, the higher the income of its
residents [30]. A comparison of the maps in Fig. 6 makes it clear that refugees tend to
conglomerate in areas where the residents are low-income groups.
Fig. 8. The percentage of university graduates compared to the dot density map
Table 2. Monthly rents for each square meter of housing units in selected neighborhoods
The zones in İstanbul where refugees are concentrated offer relatively cheap housing
opportunities for its residents, where the housing rents are generally low. In order to
reach a better understanding of refugee concentration areas in İstanbul, we compiled
the monthly rents for 3186 housing units advertised on 6 September 2018 in Hürriyet
Emlak (RF), for selected neighborhoods of İstanbul (see Table 2). The results show that
the areas preferred by refugees for housing purposes are low-housing cost areas, with
rents in some cases 4 to 5 times lower than those in middle and upper class areas
characterized by the absence of refugees. We can also add from the existing literature
on Syrian refugees in Turkey [31] that even this is a partial and misleading picture of
the housing conditions of refugees both in İstanbul and Turkey as a whole, as the figures
published in real-estate agents’ websites are only for those housing units on the
“formal” housing market and refugees do have to live in sub-standard housing units not
preferred by the local population.
To complement this study, we also investigated Fatih in street level using the
proposed algorithm to understand in which parts of Fatih they usually live and work.
The results show that HAP of non-refugees is clustered in the more expensive areas of
Fatih district. However, HAP of refugees is clustered relatively in poorer areas.
As for the distribution of refugees in İstanbul, the following conclusions can be
Refugees tend to live close to fellow refugees (evidenced by the exceptionally high
concentrations of refugee calls in some parts of İstanbul);
Refugees prefer those areas of the city where the poor and low-income groups live
(evidenced by the comparatively low-education levels of refugee concentration
zones in İstanbul);
Refugees tend to concentrate in low property value areas of the city (evidenced by
housing rent values of refugee concentration zones in İstanbul);
Refugees live in inner city areas, close proximity to main transport lines for easy
access to job opportunities and urban amenities (evidenced by the general
distribution of refugees in İstanbul).
Fatih, known for its rather traditional and religious residents since the Early
Republican Period, is the most refugee-saturated district not only in İstanbul but in the
whole country. It is believed that many Syrian refugees started their new life in here,
thanks to relatively cheap rental prices of basement floor flats, with hopes to hold on to
İstanbul where there are more jobs and the city life is dominant.
5.3 Findings for Mezitli/Mersin
Mezitli, a district of Mersin, appeared to have received a significant refugee influx
according to our chi-square analysis results. Therefore, in this section, we aim to study
Mersin in depth. Using our algorithm, we obtained HAP and WAP locations for each
MOU who has been seen at least two times according to their calls records in Mersin.
We have selected the MOUs after we have clustered them using SOM. The results of
the SOM clustering can be seen in Fig. 9.
Fig. 9. Codes plot of the SOM clustering of refugees in Mersin
While selecting the nodes, we have considered the ones having sufficient number of
call days and number of calls, as we have discussed before. As a result, we have selected
MOUs contained by the 1st, 5th, and 10th nodes (enumeration starts from the bottom
left and ends at the top right).
Using the GSD data set, we mapped each HAP and WAP locations to neighborhoods
in Mersin. Furthermore, using the PD data set, we labeled each quarter as low-status
(LSN), middle-status (MSN), and high-status neighborhoods (HSN) in seaside districts
of Mersin, which are Mezitli, Akdeniz, Yenişehir, and Toroslar, according to their
education level and youth population ratio. Among these districts, Mezitli is one of the
high-status neighborhoods in Mersin. The locations of the HAP and WAP of the
selected MOUs can be seen in Fig. 10.
Fig. 10. HAP (marked with orange color) and WAP (marked with gray color) locations of
selected MOUs in LSN (marked with light orange), MSN (marked with peach color), and HSN
(marked with purple)
Considering all 31 features, we investigated whether there is a statistically
significant difference between the neighborhoods using a pairwise Mann-Whitney U
test, as it is a non-parametric test that can be used for non-gaussian distributions. We
have presented the number of MOUs in each neighborhood (N), followed by the mean
and median values of each feature per neighborhood in Table 3.
The results presented in Table 3 only shows the features that have been found as
significant in comparisons, which are presented in Table 4, with p-value < 0.05.
Additionally, when Bonferroni adjustment is applied, significance level drops to 1.66e-
2 since we have made 3 different comparisons for each feature. Table 4 shows the
comparison results, which were found significant. We make 2-sided comparisons with
Mann-Whitney U Test and p-values implies the 2-tailed exact significance.
As a result, the test indicated that the distance between HAP and WAP locations is
the highest for HSN indicating that MOUs living in HSN probably work in farther
neighborhoods. Additionally, the entropy calculated based on different district visits
(Entropy_district) for LSN is the lowest, which is also confirmed by the entropy based
on the BTS visits (Entropy_bts), and for MSN it is the highest among others. Also,
MOUs living in HSN appear to be more regular and MOUs living in the LSN appear to
be more irregular than the others in their home-time periods based on their entropy
based on BTS visits (Entropy_bts_home_time). One feature that can shed light on the
introversion of the MOUs is the refugee to non-refugee call ratio (Ref_nonref_ratio)
and in our tests, we have seen that MOUs in LSN have the lowest ratio whereas MOUs
in HSN have the highest ratio, showing that refugees in LSN have the least interaction
with the non-refugees and refugees in HSN have the most interaction with the non-
refugees. Lastly, we have seen that the total movement (Total_movement) and RoG is
the lowest for LSN indicating that the total movement and RoG increase as the level of
education and youth population level increases.
Table 3. Descriptive statistics of LSN, MSN, and HSN for each feature presenting the number
of MOUs (n), mean (µ), standard deviation (
), min, max, and percentiles of the distributions.
50th (Median)
Table 4. Mann-Whitney U Test results for the comparisons of LSN, MSN, and HSN for each
feature (Comparisons read as follows; for all comparisons like “LSN vs HSN”, the one with the
lower median value written on the left hand size, that is, the median of LSN is less than the
median of HSN for this example).
Mann-Whitney U Test
6 Summary of Findings (SF)
SF1. A place in the city—Place as a means of survival: Finding an adequate place to
live in is a key to success in the urban jungle. A place where they can interact with their
peers, have easy access to urban facilities such as work and leisure is vital for their
survival in the city. Using the established networks of solidarity among the refugee
community, newcomers maximize their access to flows of information, which may
from time to time play a life-saving role. Almost as a rule, the newcomers to a city tend
to concentrate in particular parts of urban areas [32].
We have discovered in the second layer of our findings that Syrian refugees in
Turkish cities tend to live in areas where [a] they can have maximum interaction with
Syrian community, which is crucial in access to information flows; [b] the rents are
lower; [c] they can have easy access to urban facilities, namely in inner city areas.
SF2. Networks—Sine qua non for survival: All of the above—i.e. joining the
complex web of relations characterizing seasonal agricultural work and finding the best
place to live in a city—could not have been achieved without networks. Some of the
hotspots we have discovered seem to function like hubs in a network. For example,
Fatih district definitely plays a high-level hub role, as we have detected a non-negligible
number of refugees (whose speculated homes are outside Fatih) coming to Fatih and
also heading to Çaykara where they work for tea harvest.
SF3. Better off Syrians—Internal divisions: We know from other studies that an
important portion of Syrians came to Turkey without a chance to turn their savings in
their homeland into cash [31]. There are also some cases, though not many, where some
Syrians arrive in Turkey with some accumulated wealth. This we believe creates a
division within the Syrian refugee population, with better off Syrians living in
comparatively higher status neighborhoods and having different mobility patterns,
compared to the rest of the refugees. Mersin appeared to be a very interesting city where
refugees from different socioeconomic levels are living. Our analyses showed that
middle and high-status refugees use a very large space, travel long distances, have
regularity in their mobility (regular work home patterns), whereas low-status refugees
appear to be trapped in a small neighborhood meaning that traveling not very distant
districts, having high irregularity in their mobility.
SF4. Seasonal work—Geographical mobility as a survival strategy: One of the
most important findings of our study is the one that shows the movement of Syrian
refugees among various districts of Turkey. The evidence to this fact is the unusually
high number of calls made by refugees in certain parts of the country in certain months
of the year. What seems to be an anomaly of the data set is, in fact, a practice that is
very common in Turkish agriculture: the prevalence of seasonal work in agriculture.
We know from a recent study that Syrian refugees have replaced seasonal Kurdish
workers in the last few years especially in cotton and hazelnut harvests, two most
common products utilizing seasonal labor [33]. They have also replaced Georgian
migrants in tea harvest which, compared to other products, requires highly skilled labor.
The unusual peaks in the number of outgoing refugee calls in some regions in certain
months attest to the fact that a significant portion of Syrian refugees are on the move in
search of an adequate job and are in harmony with the harvest seasons of agricultural
products. Their enhanced geographical mobility as a survival strategy shows the
capability of Syrian refugees to adapt to new conditions. Furthermore, the fact that they
have replaced (or are in the process of replacing) Georgian migrants in tea harvest is a
testimony to their ability to alter and penetrate the existing networks.
7 Policy Recommendations
In this section, after a thorough analysis of D4R data, available literature reviews and
investigation of available social protection mechanisms, we list and group the main
vulnerabilities of Syrian refugees. In addition, we propose short-term and long-term
solutions and policy recommendations. We investigate the problems followed by
proposed recommendations under the subheadings of 1) Education & Employment 2)
Safety & Security 3) Work, 4) Healthcare, and 5) Integration.
Education & Employment: As our findings demonstrate, Syrian refugees use
geographical mobility as a survival strategy (SF4) which means spending short amounts
of time in certain regions. Policies are required for both children and agricultural
laborers. For children, the main problem is the access and participation to education.
Harvesting times often coincide with the school period for some districts. Although we
are not sure whether they are moving with their families (demographics of MOUs and
their call networks are not provided in detail in the data sets), it is highly probable that
there may be Syrian refugee children who are unable to benefit from education services.
Furthermore, having one parent moving all year long is not a healthy environment to
raise children. Studies also show that language barriers, insufficient salary, invisible
costs of education (i.e. costs for course materials, transportation, and lunch costs), and
distance from urban centers may result in education problems. As a remedy,
introductory programs can be expanded in order to take language and cultural
differences into account. Social assistance policies should specifically focus on the
involvement of children in education. In line with our finding (SF4), mobile school
programs such as TÜBİTAK 4007’s science buses and science fairs that bring science
to rural areas, can visit farm areas to help sustain continuity in children’s education. As
for adults, who are mostly seasonal workers in agriculture, problems are many: long
working hours, low wages, health and hygiene problems, makeshift tents they have to
live in, the lack of basic amenities, and so on. From a wider perspective, seasonal labor
cannot be a long-term solution in the sense that it guarantees only (or even falls short
of) a minimum level of subsistence. In addition, seasonal work is a strategy that may
lead to the failure of coming generations, since it almost requires the movement of
young members of the family who cannot attend their school during seasonal work, a
fact that may lead to persistent long-term problems. It was also reported that in
numerous stages of agriculture, different skills are required but refugees do not have
the necessary background skills [10], which can be solved with specialized training
programs. Generations may stick in seasonal work due to a lock-in network. As
networks are critical (SF2), conditions of the agricultural work network should be
improved and new occupational networks should be introduced. Agricultural
occupations can be made more structured with better pay. For the latter, new job
opportunities aiming to integrate Syrians into the labor force should be created. This
may be achieved by offering vocational and entrepreneurship trainings.
Safety & Security: The results of our analyses showed that Syrian refugees tend to
live close to other fellow refugees (SF2) and they live in poor housing conditions in
underdeveloped areas of the cities (SF1). This creates ghetto-like communities and
becomes a handicap on overall integration in Turkey. To address these vulnerabilities,
“urban transformation” policies should be redesigned taking Syrian refugees into
account and integration to the society should be aimed. Housing is a problem for
seasonal workers. Recently, a project called METİP (Project for Improvement of the
Working and Living Conditions of Seasonal Migratory Agricultural Workers) [34] has
been put into effect by the Ministry of Labor and Social Security with the goal of
improvement of the current living, shelter, transportation, education, health, security,
social relations and social security status of the seasonal migratory agricultural workers
who migrate to other cities with their families to be employed as agricultural workers.
It is important that the continuity of the METİP project will be ensured in such districts.
Healthcare: As mentioned above, most of the Syrians are employed in seasonal
work (SF4). Seasonal workers and women are the most vulnerable group in terms of
accessing healthcare due to remoteness and language barriers. Seasonal changes in
welfare, difficulties in access to childcare services, obligation to work during pregnancy
and concern about nutrition quality are the main problems of women. Hygiene is a
problem for everyone. Mobile health units that travel farms can be established as a
remedy to provide the necessary healthcare services.
Integration: All of our findings are in fact intertwined and part of the bigger
challenge that is going to take a long time for everyone involved to deal with –
integration. For instance, just with being employed in seasonal work will not be
enough for Syrians to integrate with the Turkish society. It also brings out the
competition on limited resources with Turkish seasonal workers. Another example may
be that if Syrian children do not receive the necessary education, their destiny will be
low-paid jobs and no integration. Ghettos would be another obstacle for integration.
Moreover, there are even fractions within the Syrian communities itself (SF3). Thus,
even with the limited data provided to researchers in D4R, it is obvious that addressing
the integration is the real challenge ahead of Turkey in the long run. In our opinion, in
addition to NGOs and academics, government bodies and international organizations
should work together to establish a big coalition to come up with an integration strategy
and obtain a buy-in from the public to tackle the integration challenge, because the last
challenge will require not only huge amounts of resources and wisdom but also
empathy, compassion, and tolerance.
8 Acknowledgement
We would like to thank Türk Telekomünikasyon A.Ş. for the one-year anonymized
mobile communication data they provided within the D4R Challenge.
1. United Nations High Commissioner for Refugees (UNHCR): Global trends 2017: Forced
displacement in 2017. (2017). Accessed 31 August
2. ORSAM (Ortadoğu Stratejik Araştırmalar Merkezi): Suriye’ye komşu ülkelerde Suriyeli
mültecilerin durumu: Bulgular, sonuçlar ve öneriler. (2014). Accessed 31
August 2018
3. İçduygu, A.: Syrian refugees in Turkey: The long road ahead. Washington DC: Migration
Policy Institute (2015)
4. Kaya, A.: Istanbul as a space of cultural affinity for Syrian refugees: “Istanbul is safe despite
everything!” Southeastern Europe 41(3):333–58 (2017)
5. World Bank: Turkey’s response to the Syrian refugee crisis and the road ahead. Washington
DC: World Bank (2015)
6. Stock, I., Aslan, M., Paul, J., Volmer, V., Faist, T.: Beyond humanitarianism: Addressing the
urban, self-settled refugees in Turkey. Bielefeld: COMCAD (2016)
7. Cagaptay, S., Menekse, B.: The impact of Syria's refugees on Southern Turkey. Policy Focus
130. Washington, DC: The Washington Institute For Near East Policy.
evised3s.pdf (2014). Accessed 31 August 2018
8. Asik, G.A.: Turkey badly needs a long-term plan for Syrian refugees. Harvard Business
(2017). Accessed 14 September 2018
9. Balkan, B., Tumen, S.: Immigration and prices: Quasi-experimental evidence from Syrian
refugees in Turkey. Journal of Population Economics 29:3:657–686 (2016)
10. Dedeoğlu, S.: Yoksulluk nöbetinden yoksulların rekabetine: Türkiye’de mevsimlik tarımsal
üretimde yabancı göçmen işçiler mevcut durum raporu.
(2016). Accessed 12 September 2018
11. Isikkaya, A.D.: Housing policies in Turkey: Evolution of TOKI (Governmental Mass
Housing Administration) as an urban design tool. Journal of Civil Engineering and
Architecture 10:316–326 (2016)
12. Chandy, R., Hassan, M., Mukherji, P.: Big data for good: Insights from emerging
markets. Journal of Product Innovation Management 34(5):703–713 (2017)
13. Alemanno, A.: Big data for good: Unlocking privately-held data to the benefit of the
many. European Journal of Risk Regulation 9(2):183–191 (2018)
14. Türk Telekom: Data for refugees. (2018). Accessed: 31
August 2018
15. Salah, A.A., Pentland, A., Lepri, B., Letouzé, E., Vinck, P., de Montjoye, Y.A., Dong, X.,
Dağdelen, Ö.: Data for Refugees: The D4R Challenge on mobility of Syrian refugees in
Turkey. arXiv preprint arXiv:1807.00523 (2018)
16. Hürriyet Emlak: Online real estate ads. (2018). Accessed 8
August 2018
17. Ahas, R., Silm, S., Järv, O., Saluveer, E., Tirui M.: Using mobile positioning data to model
locations meaningful to users of mobile phones. Journal of Urban Technology 17(1):3–27
18. Nurmi, P., Koolwaaij, J.: Identifying meaningful locations. In: Mobile and Ubiquitous
Systems: Networking & Services, 2006 Third Annual International Conference, pp. 1–8.
IEEE, San Jose (2006)
19. Jiang, S., Ferreira, J., González, M.C.: Activity-based human mobility patterns inferred from
mobile phone data: A case study of Singapore. IEEE Transactions on Big Data 3(2):208–219
20. Graells-Garrido, E., Peredo, O., García, J.: Sensing urban patterns with antenna mappings:
the case of Santiago, Chile. Sensors 16(7):1098 (2016)
21. Furno, A., Fiore, M., Stanica, R., Ziemlicki, C., Smoreda, Z.: A tale of ten cities:
Characterizing signatures of mobile traffic in urban areas. IEEE Transactions on Mobile
Computing 16(10):2682–2696 (2017)
22. Directorate General of Migration Management of Turkey (DGMM): Hangi ilde ne kadar
Suriyeli var? İşte il il liste.
liste/ (2017). Accessed 31 August 2018
23. Zhao, Z., Shaw, S.L., Xu, Y., Lu, F., Chen, J., Yin, L.: Understanding the bias of call detail
records in human mobility research. International Journal of Geographical Information
Science 30(9):1738–1762 (2016)
24. Yang, P., Zhu, T., Wan, X., Wang, X.: Identifying significant places using multi-day call
detail records. In: 2014 IEEE 26th International Conference on Tools with Artificial
Intelligence (ICTAI), pp. 360-366. IEEE, Limassol (2014)
25. Isaacman, S., Becker, R., Cáceres, R., Kobourov, S., Martonosi, M., Rowland, J.,
Varshavsky, A.: Identifying important places in people’s lives from cellular network data.
In: International Conference on Pervasive Computing, pp. 133–151. Springer, Berlin,
Heidelberg (2011)
26. Hartigan, J.A.: Clustering algorithms. New York, NY, USA (1975)
27. Soto, V., Frias-Martinez, V., Virseda, J., Frias-Martinez, E.: Prediction of socioeconomic
levels using cell phone records. In: International Conference on User Modeling, Adaptation,
and Personalization, pp. 377–388. Springer, Berlin, Heidelberg (2011).
28. Kondo, Y., Salibian-Barrera, M., Zamar, R.: RSKC: An R package for a robust and sparse k-
means clustering algorithm. Journal of Statistical Software 72(5):1–26 (2016)
29. Kohonen, T.: The self-organizing map. Neurocomputing 21(1–3):1–6 (1998)
30. Işık, O., Ataç, E.: Yoksulluğa dair: Bildiklerimiz, az bildiklerimiz, bilmediklerimiz. Birikim
269(268):66–86 (2011)
31. Eraydın, G.: Migration, settlement and daily life patterns of Syrian urban refugees through
time geography: A case of Önder neighborhood. Unpublished PhD Thesis, Middle East
Technical University (2017)
32. National Academies of Sciences, Engineering, and Medicine (NASEM): The integration of
immigrants into American society. Washington, DC: The National Academies Press (2015).
33. Kalkınma Atölyesi: Mevsimlik gezici tarım işçiliği izleme: Mevcut durum haritası (2012–
2013) (2013). Accessed 12 September 2018
34. Prime Minister’s Office: Memorandum Circular n. 2017/6. Official Gazette of Turkey, 30043
9 Appendix: Finding HAP and WAP
Fig. a1. The number of calls per hour for a MOU, original is shown with straight line and the
median filter applied is shown with dashed line.
In order to extract the important places of individuals, we study their fine grained
mobility using DS2. More specifically, we try to find WAP and HAP for each MOU.
To be able to find those points, first we aggregate the hourly calls counts for each MOU
and we apply a median filter with the bandwidth of three to hourly calls signal to smooth
unexpected low and high values of call counts out. As depicted in Fig. a1. after the
median filter is applied, the hourly call counts signal is smoothed.
Then, we sort the hours according to the number of calls made during that hour in
ascending order. We select the hours in the first quartile and three hours before that as
the Home-Time Period (HTP) assuming that this period is spent during the most
probable HAP. After that, we find the Work-Time Period (WTP) by first adding four
hours, which is allocated for preparation and commute, to the end of HTP and selecting
the next six hours, which is assumed to be a safe period for work activities for most of
the people, as the WTP. In Fig. a2, HTP and WTP have been marked on the filtered
Fig. a2. The graph showing the Work-Time Period (WTP) and Home-Time Period (HTP) of a
MOU. HTP included the hours in the first quartile (between 00:00 and 5:00) and three hours
before that (between 21:00 and 23:59). WTP is found by first adding four hours to the end of
HTP (9:00) and selecting the next six hours (end point is 15:00).
After finding the HTP and WTP, we derive the BTS used in those periods to find HAP
and WAP. First, we find the HAP by sorting BTS by the number of unique days they
used and we select the one with the highest number of unique day usage. In the next
step, we cluster the BTS using Hartigan’s leader clustering algorithm. The advantage
of the Hartigan’s leader algorithm, unlike clustering algorithms like K-means, we do
not need to set the number of clusters at the beginning. We only need to set the radius
to cluster the BTS based on their proximity. We set the radius as 1 km. After clustering
the BTS, we select the cluster, in which the BTS with the most call days resides and
then set the centroid of the cluster as HAP. In order to find the WAP, on the list of
possible BTS for WAP, we apply the same steps by applying the Hartigan’s leader
algorithm and then selecting the centroid of the cluster in which the BTS has the most
number of call days in weekday usage. Aside from the hour differences between the
HTP and WTP, to specify the WTP we only look at the calls made or received on the
weekdays within the designated work time hours. In Fig. a3, we present the possible
WAP, which is represented by the green circle, and HAP, which is represented by the
red circle, locations for a MOU living around Siteler, Ankara.
Fig. a3. WAP and HAP for a person living around Siteler, Ankara
10 Appendix: SOM Clustering
In SOM clustering, we trained the neural network until the average distance between
the data points and the node centroids converged to a relatively small distance and then,
we get the codes plots of the clusters, an example can be seen in Fig. a4. If we are to
analyze some of the clusters, for instance, in Node 1 (enumeration starts from the
bottom left and ends at the top right), we observe that MOUs in this node have higher
entropy while having close to zero radius of gyration and visited very few numbers of
different cities. As their HAP and WAP locations are very close, their daily commute
is expected to be small. Even though they visited different numbers of BTS and
districts, their travel distance is relatively small and they wondered in a very small area
by visiting a lot of places, which are very close to each other. On the other hand, for
Node 6, we see that MOUs in this node have low entropy and high radius of gyration
indicating that these MOUs are making longer commutes, especially in the work-time
periods, while visiting a fewer number of different places. Additionally, larger
commutes were also confirmed by the larger distance between HAP and WAP
locations. Lastly, we observe MOUs in Node 10 having only higher values of radius of
gyration in the home-time period while other features like entropy, radius of gyration
in work-time period, number of different cities visited, and WAP-HAP distance are
small. Hence, we can deduce that these MOUs are making larger commutes in their
home-time period but they are not visiting different places in terms of BTS, districts,
or cities and their HAP and WAP locations are very close to each other.
Fig. a4. Codes plot of a SOM clustering
The Data for Refugees (D4R) Challenge resulted in many insights related to the movement patterns of the Syrian refugees within Turkey. In this chapter, we summarize some of the important findings, and suggest policy recommendations for the main areas of the challenge. These recommendations are sometimes broad suggestions, as the policy interventions involve many factors that are difficult to take into account. We give examples of such issues to help policy-makers.
Full-text available
Urban landscapes present a variety of socio-topological environments that are associated to diverse human activities. As the latter affect the way individuals connect with each other, a bound exists between the urban tissue and the mobile communication demand. In this paper, we investigate the heterogeneous patterns emerging in the mobile communication activity recorded within metropolitan regions. To that end, we introduce an original technique to identify classes of mobile traffic signatures that are distinctive of different urban fabrics. Our proposed technique outperforms previous approaches when confronted to ground-truth information, and allows characterizing the mobile demand in greater detail than that attained in the literature to date. We apply our technique to extensive real-world data collected by major mobile operators in ten cities. Results unveil the diversity of baseline communication activities across countries, but also evidence the existence of a number of mobile traffic signatures that are common to all studied areas and specific to particular land uses.
Full-text available
In this study, with Singapore as an example, we demonstrate how we can use mobile phone call detail record (CDR) data, which contains millions of anonymous users, to extract individual mobility networks comparable to the activity-based approach. Such an approach is widely used in the transportation planning practice to develop urban micro simulations of individual daily activities and travel; yet it depends highly on detailed travel survey data to capture individual activity-based behavior. We provide an innovative data mining framework that synthesizes the state-of-the-art techniques in extracting mobility patterns from raw mobile phone CDR data, and design a pipeline that can translate the massive and passive mobile phone records to meaningful spatial human mobility patterns readily interpretable for urban and transportation planning purposes. With growing ubiquitous mobile sensing, and shrinking labor and fiscal resources in the public sector globally, the method presented in this research can be used as a low-cost alternative for transportation and planning agencies to understand the human activity patterns in cities, and provide targeted plans for future sustainable development.
Full-text available
Witten and Tibshirani (2010) proposed an algorithim to simultaneously find clusters and select clustering variables, called sparse K-means (SK-means). SK-means is particularly useful when the dataset has a large fraction of noise variables (that is, variables without useful information to separate the clusters). SK-means works very well on clean and complete data but cannot handle outliers nor missing data. To remedy these problems we introduce a new robust and sparse K-means clustering algorithm implemented in the R package RSKC. We demonstrate the use of our package on four datasets. We also conduct a Monte Carlo study to compare the performances of RSK-means and SK-means regarding the selection of important variables and identification of clusters. Our simulation study shows that RSK-means performs well on clean data and better than SK-means and other competitors on outlier-contaminated data.
Full-text available
Mobile data has allowed us to sense urban dynamics at scales and granularities not known before, helping urban planners to cope with urban growth. A frequently used kind of dataset are Call Detail Records (CDR), used by telecommunication operators for billing purposes. Being an already extracted and processed dataset, it is inexpensive and reliable. A common assumption with respect to geography when working with CDR data is that the position of a device is the same as the Base Transceiver Station (BTS) it is connected to. Because the city is divided into a square grid, or by coverage zones approximated by Voronoi tessellations, CDR network events are assigned to corresponding areas according to BTS position. This geolocation may suffer from non negligible error in almost all cases. In this paper we propose "Antenna Virtual Placement" (AVP), a method to geolocate mobile devices according to their connections to BTS, based on decoupling antennas from its corresponding BTS according to its physical configuration (height, downtilt, and azimuth). We use AVP applied to CDR data as input for two different tasks: first, from an individual perspective, what places are meaningful for them? And second, from a global perspective, how to cluster city areas to understand land use using floating population flows? For both tasks we propose methods that complement or improve prior work in the literature. Our proposed methods are simple, yet not trivial, and work with daily CDR data from the biggest telecommunication operator in Chile. We evaluate them in Santiago, the capital of Chile, with data from working days from June 2015. We find that: (1) AVP improves city coverage of CDR data by geolocating devices to more city areas than using standard methods; (2) we find important places (home and work) for a 10% of the sample using just daily information, and recreate the population distribution as well as commuting trips; (3) the daily rhythms of floating population allow to cluster areas of the city, and explain them from a land use perspective by finding signature points of interest from crowdsourced geographical information. These results have implications for the design of applications based on CDR data like recommendation of places and routes, retail store placement, and estimation of transport effects from pollution alerts.
Big Data for Good: Unlocking Privately-Held Data to the Benefit of the Many - Volume 9 Issue 2 - Alberto ALEMANNO
The research question to be answered in this paper is to what extent Istanbul provides Syrian refugees with a feeling of security and safety despite the practical difficulties of everyday life such as working conditions, exclusion, xenophobia and exploitation. The main premise of the paper is that historical, cultural and religious forms of affinity are likely to particularly attach the Sunni Muslim Arab Syrians originating from Aleppo province to Istanbul. This paper is expected to contribute to the discipline of refugee studies by shedding light on the historical elements and agency that are often neglected in such analyses. Based on the findings of a qualitative and quantitative study conducted by the Support to Life Association among Syrian refugees in Istanbul in the last quarter of 2015 and the first quarter of 2016, this article aims to delineate the strong attachment of the Syrian refugees to the city of Istanbul.
This paper examines how innovations involving big data are helping to solve some of the greatest challenges facing the world today. Focusing primarily on the developing world, this paper explores how the large volumes of digital information, increasingly available in these contexts, can help decision-makers better address problems as big as poverty, illness, conflict, migration, corruption, natural disasters, climate change, and pollution, among other areas. This paper argues that the information vacuum that still exists in many developing countries makes the potential for impact from big data much greater in these contexts. Through a series of case studies, the authors demonstrate how big data can be used to address pressing social and environmental challenges in developing countries. The authors present research questions that could not have been addressed in the absence of dramatic recent increases in data volume, variety, and velocity. They then extrapolate from these questions, and discuss the nature of the technological changes that now allow decision-makers in developing countries to leapfrog from data poverty to big data, and permit innovative solutions to the aforementioned challenges. The authors emphasize the importance of looking beyond the current focus, in the literature, on storing, analyzing, and creating commercial value from big data. Instead, they point to the importance of innovativeness in identifying, integrating, disseminating, and applying new sources of data to execute actions that in turn generate product, service, process, and business model innovations that are impactful due to big data. The authors argue that academic researchers have an important role to play in helping the world harness the potential for big data innovations, through validation, visualization, and verification of such data.
The United States prides itself on being a nation of immigrants, and the country has a long history of successfully absorbing people from across the globe. The integration of immigrants and their children contributes to our economic vitality and our vibrant and ever changing culture. We have offered opportunities to immigrants and their children to better themselves and to be fully incorporated into our society and in exchange immigrants have become Americans - embracing an American identity and citizenship, protecting our country through service in our military, fostering technological innovation, harvesting its crops, and enriching everything from the nation's cuisine to its universities, music, and art. Today, the 41 million immigrants in the United States represent 13.1 percent of the U.S. population. The U.S.-born children of immigrants, the second generation, represent another 37.1 million people, or 12 percent of the population. Thus, together the first and second generations account for one out of four members of the U.S. population. Whether they are successfully integrating is therefore a pressing and important question. Are new immigrants and their children being well integrated into American society, within and across generations? Do current policies and practices facilitate their integration? How is American society being transformed by the millions of immigrants who have arrived in recent decades? To answer these questions, this new report from the National Academies of Sciences, Engineering, and Medicine summarizes what we know about how immigrants and their descendants are integrating into American society in a range of areas such as education, occupations, health, and language. © 2015 by the National Academy of Sciences. All rights reserved.
In recent years, call detail records (CDRs) have been widely used in human mobility research. Although CDRs are originally collected for billing purposes, the vast amount of digital footprints generated by calling and texting activities provide useful insights into population movement. However, can we fully trust CDRs given the uneven distribution of people’s phone communication activities in space and time? In this article, we investigate this issue using a mobile phone location dataset collected from over one million subscribers in Shanghai, China. It includes CDRs (~27%) plus other cellphone-related logs (e.g., tower pings, cellular handovers) generated in a workday. We extract all CDRs into a separate dataset in order to compare human mobility patterns derived from CDRs vs. from the complete dataset. From an individual perspective, the effectiveness of CDRs in estimating three frequently used mobility indicators is evaluated. We find that CDRs tend to underestimate the total travel distance and the movement entropy, while they can provide a good estimate to the radius of gyration. In addition, we observe that the level of deviation is related to the ratio of CDRs in an individual’s trajectory. From a collective perspective, we compare the outcomes of these two datasets in terms of the distance decay effect and urban community detection. The major differences are closely related to the habit of mobile phone usage in space and time. We believe that the event-triggered nature of CDRs does introduce a certain degree of bias in human mobility research and we suggest that researchers use caution to interpret results derived from CDR data.