Conference PaperPDF Available

Personal analytics: Time management using Google Maps

Authors:

Abstract and Figures

The modern world runs on data captured from millions of individuals. Data of an individual is captured by both the government and private companies. Only a small percentage of the data on a person captured is accessible to that person. This paper explains a seven-step data-driven way for any individual to use this data for improving their lives. Time management using Google maps and other data sources for an individual is used as an example to explain the process.
Content may be subject to copyright.
Personal analytics: Time management using Google Maps
Achyuthuni Sri Harsha
EEP, Business Analtics and Intelligence, Indian Institute of Management,
Bangalore
achyuthuni.sri.harsha@gmail.com
Abstract. The modern world runs on data captured from millions of
individuals. Data of an individual is captured by both the government and
private companies. Only a small percentage of the data on a person captured
is accessible to that person. This paper explains a seven-step data-driven
way for any individual to use this data for improving their lives. Time
management using Google maps and other data sources for an individual is
used as an example to explain the process.
Keywords: Personal analytics, regression, personal data, hypothesis testing,
time management, Google maps, CRISP-DM.
1 Introduction
Data is an integral component of today's life. Analytics is used extensively for
problem-solving and in assisting decision making across many verticals and
companies(Kumar, 2017). Studies have shown that 1.7Mb of data is captured
every second on every person on earth(Miller, 2019). From engineering(Aho and
Uden, 2014; Chiang et al., 2017) to retail(Dinesh Kumar et al., 2012; Jeeson et al.,
2013), from transportation to finance, from medical (Pannu et al., 2010) to
insurance(Lixia, 2010), data has changed the way businesses work. Behind this
revolution is personal data of every individual captured for various
purposes(Schwartz, 2003). Some percentage of this data is available for the
individual,(Gurrin et al., 2014) and data science and analytics enthusiasts can use
this data to make their lives better(Selke, 2014; Sellen and Whittaker, 2010). A
time management problem for an individual is taken as an example to explain the
process. Data that was collected by various external agents/organizations on the
person in the study (Person A) are cumulated. A CRISP-DM(Wirth and Hipp,
2000) analytical approach is used to identify factors and make recommendations.
CRISP-DM is a widely used methodology for solving data science
problems(Azevedo and Santos, 2008).
Time management is a significant problem in every person's life. This paper
discusses different factors that influence the time at which an employee arrives the
workplace. The goal is to explain maximum variation in in-time of the individual.
2
2 Data and Methods
2.1 Set the goals
The goal is to identify and quantify factors that affect the time at which an
employee (Person A) arrives at the workplace. The time at which an employee
arrives at his/her workplace (in-time) is dependent on various factors. They are
broadly classified into the following groups(Ailabouni et al., 2009; Porter and
Steers, 1973):
1. Personal factors
2. Commute based factors(Olsson et al., 2013; van Hooff, 2015)
3. Work-related factors
4. Time and seasonality-based factors
2.2 Identifying data sources
For each factor, the availability and sources of data were considered. Two sources
of data identify with most of the factors taken into consideration. They are:
1. Google location history
2. Workplace management tools at Person A's workplace
Google captures large amounts of data on every individual, from browser activity,
health (Google Fit), bank transactions (Google Pay), photos, emails, and location
history (Google Maps).
Data from Google Location history was downloaded for the person in the study.
The downloaded file is in JSON format which was converted to a data frame
format in R. The date and time column which were in POSIX milliseconds format
were converted to a human-readable format. Similarly, latitude and longitude were
in magnitudes of 107 were converted to GPS coordinates. This data was filtered
for the timeframe (of travelling to the workplace) and the location of the
individual's workplace and home.
Data from the workplace management systems consisting of in-time and out-time
was also collected from 4th October 2017 to 29th November 2018. This data was
joined with the previous data extracted using Google Location history.
From the data available, the influence of the following factors on the in-time of
the employee are considered:
1. Commute based factors
a. Travelling time
b. Vehicle type
c. Starting place and the route has taken
2. Work-related factors
a. Nature of work
b. The previous day out time
3
c. Previous day hours worked
3. Time-based factors
a. Deterministic and stochastic trend
b. The previous day's in time
c. Previous day's error of in-time
3 Results
3.1 Exploratory data analysis
Initial EDA was carried for all the factors available. Fig 1 shows the distribution
among the dependent variable in-time. The distribution of in-time is not normal.
Fig 2 exhibits the variation of the dependent variable in-time across time. There
seems to be a logarithmic decrease in the mean across time. Fig 3 shows the
relationship between travelling time and in-time. Fig 4 displays the variation of in-
time for different transportation methods. On average, the individual is earlier to
the workplace while walking when compared to bicycle or by vehicle. Fig 5
presents the variation among the in-time from two starting positions A and B. The
individual is earlier to the workplace from location A.
Fig 6 shows the variation of in-time with different types of work. There seems to
be a distinction between C and D type of work when compared to A and B. From
Fig 7, a slight decrease in in-time with a decrease in out of time (of the previous
day) can be observed, especially for different types of work.
4
Fig 1: In time distribution
Fig 2: Variation of in-time across time
Fig 3: Variation of travelling time across
time
Fig 4: Variation in in-time across different
modes of transport
Fig 5: Variation of in-time across different
starting locations
Fig 6: Variation of in-time across different
nature-of-work
Fig 7: Change of in-time with a previous
day out time
3.2 Confirmatory data analytics
The next step is to test the significance of every factor using hypothesis tests. The
conclusions from EDA can be validated using hypothesis tests.
5
In-time distribution: Fig 1 indicates that the distribution of in-time is not nor-
mally distributed. Chi-Square Goodness of fit test can be conducted to identify if
the distribution is normally distributed. The null and alternative hypothesis are as
follows:
𝐻:𝑖𝑛 − 𝑡𝑖𝑚𝑒 𝑓𝑜𝑙𝑙𝑜𝑤𝑠 𝑎 𝑛𝑜𝑟𝑚𝑎𝑙 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
𝐻: 𝑖𝑛 − 𝑡𝑖𝑚𝑒 𝑑𝑜𝑒𝑠 𝑛𝑜𝑡 𝑓𝑜𝑙𝑙𝑜𝑤 𝑎 𝑛𝑜𝑟𝑚𝑎𝑙 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
The χ test statistic is 258.04, which is higher than the cut-off value at 5% α indi-
cating that the distribution is not normally distributed.
Travelling time: From Fig 3, no direct correlation between travelling time and in-
time is observed. A correlation test with Null and alternate hypothesis
𝐻: 𝑟 , = 0
𝐻: 𝑟 , 0
gave a p-value of 0.43, indicating that travelling time might not be correlated with
in-time.
Vehicle type: From Fig 1, it can be observed that the distribution is not normally
distributed, and from Fig 4, it can be observed that the variations within the groups
are not constant. As the assumptions of ANOVA are violated, multiple t-tests with
Bonferroni correction of 3 is conducted. The null and alternate hypothesis is as
follows:
𝐻: μ = μ = μ
𝐻: Not 𝑎ll μ  values  are  equal
The p-values of the tests are given in Table 1:
Table 1: p-values for t-tests for different vehicle types
Walking
Cycling
Motor
vehicle
Walking 1 1
3
×
10

Cycling
1
1
0.6103
Motor vehicle
3
×
10

0.6103
1
From Table 1, it can be inferred that there is a statistical difference in in-time be-
tween walking and travelling in a motor vehicle. The same was visualized in Fig
4.
Starting location: From Fig 5, it can be observed that there is a significant differ-
ence of in-time between starting location A and starting location B. This can be
tested using a t-test with the hypothesis:
𝐻: μ= μ
𝐻: μ≠ μ
The p-value for the t-test is 4 × 10 which indicates that there might be a signif-
icant difference in in-time between A and B starting locations.
Nature of work: From Fig 6, it can be inferred that A and B type of work might
6
not be significantly different from each other. But when compared to A: C and D
types of work are significantly different in terms of in-time. An ANOVA test is
conducted to validate the following hypothesis:
𝐻: μ= μ= μ= μ
𝐻: Not 𝑎ll μ  values  are  equal
The p-value for ANOVA is 2 × 10 indicating the nature of work is a signifi-
cant factor in identifying in-time.
Previous day out-time: Fig 7 shows that as the previous day out-time increases,
in-time decreases. A correlation test is used to validate this hypothesis.
𝐻: r, = 0
𝐻: r, 0
The p-value is 0.009918, which is lesser than 5% cut off. The previous day out-
time is a significant factor affecting in-time.
Stochastic and deterministic trend: From Fig 2, the time series is not stationary.
Dickey-Fuller unit root tests(Gujarati, 2009) with the following null hypothesis:
Yt is a random walk : ΔYt = δYt−1 + ϵt
Yt is a random walk with drift : ΔYt = β1 + δYt−1 + ϵt
Yt is a random walk with drift around a deterministic trend : ΔYt12t+δYt−1t
is used to test if the series is stationary. The results of the test are as shown in Ta-
ble 2. A summary of all the tests is given in Table 3.
Table 2: Dickey-Fuller test results
Random walk
The series is not stationary
Random walk with
drift
Series is not stationary, and there is drift
Random walk with
drift around a de-
terministic trend
The series is not stationary there is a trend, and
there may or may not be drift
Table 3: Summary of hypothesis tests
Variable Test Null Hypothe-
sis
p-value Comments
Travelling
time
Correlation r = 0 0.4358 Travelling time
does not affect in-
time
Vehicle type Multiple t-
tests with
Bonferroni
correction
μ
=
μ
=
μ
2
×
10

The difference in
in-time between
walking and motor
vehicle
Starting place t-test
μ
=
μ
4
×
10

Starting place af-
fects in-time
7
Nature of
work
ANOVA
𝜇
=
𝜇
=
𝜇
=
μ
2
×
10


The difference in
in-time between C
and D when com-
pared to A
Previous day
out time
Correlation
𝑟
=
0
0.00991
8
Previous day out
time is a significant
factor which affects
in
-
time
Stationarity Dickey
fuller
Random walk
with drift
Series is not stationary and
there is a deterministic trend
3.3 Modelling and forecasting
A linear regression model was built using stepwise elimination based on AIC.
Travelling time, nature of work, starting location and deterministic trend were ob-
served as essential factors affecting in-time. The model summary statistics can be
found in Table 4, Table 5 and Table 6.
Table 4: Model summary
Regression Statistics
R Square
0.5441
Adjusted R Square 0.5337
Table 5: Regression ANOVA
Df F
Significance F
Regression 5 52.51 Significant
Residual 220
Total
225
Table 6: Regression coefficients
Coefficients Standard Error
t-stat p-value
Intercept 52.052635 6.36202
Travelling time -0.006913 0.002165 -3.193
0.00162
Nature of work B 20.008394 2.158982 9.268 2.00×10-16
Nature of work C 15.711124 3.252032 4.831 2.54×10-06
Starting location B 10.720798 3.272792 3.276 0.00122
8
log(t) -10.077145 1.178391 -8.552
2.07×10×
Independently, factors like the previous day out time or vehicle type were signifi-
cant while travelling time was not significant. But their significance changes in the
presence of other variables. The model is significant as the F statistic of ANOVA
(from Table 5) is less than 5%. The variation inflation factor (VIF =
) above
4 indicates multicollinearity between variables(Hair et al., 1998; Kumar, 2017).
From Table 7 based on VIF, there is no correlation between variables. The ideal
number of independent variables in the model to prevent overfitting is given by
Mallows Cp(Mallows, 1973). Mallows Cp is 4.82, while the number of variables
in the current model is 5 indicating no overfitting. From Table 4, the overall varia-
tion in in-time explained by the model is 54%. Durbin−Watson test(Durbin and
Watson, 1950; Kumar, 2017) was used to check the existence of autocorrelation in
the residuals. The test statistic is 1.87, which has a p-value of 0.22, indicating no
further presence of autocorrelation in the residuals. The residuals in Fig 8 show no
remaining multicollinearity.
Fig 8: Unexplained variation
Table 7: VIF among variables
Variable VIF
Travelling time
1.42
Nature of work B
1.43
Nature of work C
1.31
Starting location B
1.79
log(t)
1.51
4 Discussion
This paper aimed to quantify the different reasons affecting the in-time of an indi-
vidual using the CRISP-DM method. The regression results in Table 6 illustrates
how the individual (Person A) can plan his/her time accordingly. Travelling time,
nature of work and starting location are important factors that determine in-time.
There is a logarithmic decrease in in-time. This can cause problems if there is an
unexpected delay in the future. Person A should examine traffic-related delays
better by finding alternate routes as travelling time is an essential factor.
5 Conclusion
9
This paper illustrates how the individual can use data collected by various compa-
nies or agencies. As the needs and analytics capabilities of every individual are
different, the approach used in the in-time example can be generalized. The
CRISP-DM procedure, as implemented in this paper, can be summarised as fol-
lows:
1. Identify the problem, measurement metrics, and success criterion
2. Identify the factors affecting the problem
3. Find data sources which capture the data for the different features
4. Perform exploratory data analysis to identify the relationships between
variables
5. Perform hypothesis tests to confirm the relationships between the varia-
bles
6. Build an explainable model that optimizes the success metrics
7. Incorporate learnings from the study
8. Repeat step 3 to 8 with newer data sources and factors
9 References
Aho, A.-M., Uden, L., 2014. Developing data analytics to improve services in a mechanical
engineering company, in: International Conference on Knowledge Management
in Organizations. Springer, pp. 99–107.
Ailabouni, N., Gidado, K., Painting, N., 2009. Factors affecting employee productivity in
the UAE construction industry, in: 25th Annual ARCOM Conference,
Nottingham, UK. pp. 7–9.
Azevedo, A.I.R.L., Santos, M.F., 2008. KDD, SEMMA and CRISP-DM: a parallel
overview. IADS-DM.
Chiang, L., Lu, B., Castillo, I., 2017. Big data analytics in chemical engineering. Annual
review of chemical and biomolecular engineering 8, 63–85.
Dinesh Kumar, U., Arun, P., Nachiappan, S.P., 2012. Supply chain optimization at Madurai
Aavin milk dairy.
Durbin, J., Watson, G.S., 1950. Testing for serial correlation in least squares regression: I.
Biometrika 37, 409–428.
Gujarati, D.N., 2009. Basic econometrics. Tata McGraw-Hill Education.
Gurrin, C., Smeaton, A.F., Doherty, A.R., 2014. Lifelogging: Personal big data.
Foundations and Trends® in information retrieval 8, 1–125.
Hair, J.F., Black, W.C., Babin, B.J., Anderson, R.E., Tatham, R.L., 1998. Multivariate data
analysis. Prentice hall Upper Saddle River, NJ.
Jeeson, K.J., Jathar, A., Dinesh Kumar, U., 2013. Consumer choice between house brands
and national brands in detergent purchases at Reliance retail.
Kumar, U.D., 2017. Business Analytics: The Science of Data-driven Decision Making.
Wiley India.
Lixia, Q., 2010. Empirical research on the importance of incentive factors to life insurance
agents, in: 2010 International Conference On Computer Design and Applications.
IEEE, pp. V5-38-V5-41.
Mallows, C.L., 1973. Some comments on C p. Technometrics 15, 661–675.
Miller, P.D., 2019. Introduction to Focus: The App Issue. American Book Review 40, 3–4.
10
Olsson, L.E., Gärling, T., Ettema, D., Friman, M., Fujii, S., 2013. Happiness and
satisfaction with work commute. Social indicators research 111, 255–263.
Pannu, H.S., Kumar, U.D., Farooquie, J.A., 2010. Impact of innovation on the performance
of Indian pharmaceutical industry using Data Envelopment Analysis. IIM
Bangalore Research Paper.
Porter, L.W., Steers, R.M., 1973. Organizational, work, and personal factors in employee
turnover and absenteeism. Psychological bulletin 80, 151.
Schwartz, P.M., 2003. Property, privacy, and personal data. Harv. L. Rev. 117, 2056.
Selke, S., 2014. Lifelogging. Wie die digitale Selbstvermessung unsere Gesellschaft
verändert. Berlin: ECON.
Sellen, A., Whittaker, S., 2010. Beyond total capture: a constructive critique of lifelogging.
Communications of the ACM.
van Hooff, M.L., 2015. The daily commute from work to home: examining employees'
experiences in relation to their recovery status. Stress and Health 31, 124–137.
Wirth, R., Hipp, J., 2000. CRISP-DM: Towards a standard process model for data mining,
in: Proceedings of the 4th International Conference on the Practical Applications
of Knowledge Discovery and Data Mining. Springer-Verlag London, UK, pp.
29–39.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Big data analytics is the journey to turn data into insights for more informed business and operational decisions. As the chemical engineering community is collecting more data (volume) from different sources (variety), this journey becomes more challenging in terms of using the right data and the right tools (analytics) to make the right decisions in real time (velocity). This article highlights recent big data advancements in five industries, including chemicals, energy, semiconductors, pharmaceuticals, and food, and then discusses technical, platform, and culture challenges. To reach the next milestone in multiplying successes to the enterprise level, government, academia, and industry need to collaboratively focus on workforce development and innovation. Expected final online publication date for the Annual Review of Chemical and Biomolecular Engineering Volume 8 is June 7, 2017. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Article
Full-text available
Research suggests that for many people happiness is being able to make the routines of everyday life work, such that positive feelings dominate over negative feelings resulting from daily hassles. In line with this, a survey of work commuters in the three largest urban areas of Sweden show that satisfaction with the work commute contributes to overall happiness. It is also found that feelings during the commutes are predominantly positive or neutral. Possible explanatory factors include desirable physical exercise from walking and biking, as well as that short commutes provide a buffer between the work and private spheres. For longer work commutes, social and entertainment activities either increase positive affects or counteract stress and boredom. Satisfaction with being employed in a recession may also spill over to positive experiences of work commutes. Electronic supplementary material The online version of this article (doi:10.1007/s11205-012-0003-2) contains supplementary material, which is available to authorized users.
Conference Paper
Business today must apply analytics to create new and incremental value. In today's economy, it is imperative that businesses develop and enhance their understanding of how digital data is collected and analyzed in order to generate new or incremental profitable revenue or to reduce cost. The purpose of this paper is to report on one in-depth case of a mechanical engineering company introducing a process how data analytics (DA) could be used in the creation of new services. Manufacturing firms are under increasing pressure to create industrial services that offer unique contributions to long term profitability. This paper increases understanding of how the mechanical engineering company can create new services by using big data, through servitization.
Article
We have recently observed a convergence of technologies to foster the emergence of lifelogging as a mainstream activity. Computer storage has become significantly cheaper, and advancements in sensing technology allows for the efficient sensing of personal activities, locations and the environment. This is best seen in the growing popularity of the quantified self movement, in which life activities are tracked using wearable sensors in the hope of better understanding human performance in a variety of tasks. This review aims to provide a comprehensive summary of lifelogging, to cover its research history, current technologies, and applications. Thus far, most of the lifelogging research has focused predominantly on visual lifelogging in order to capture life details of life activities, hence we maintain this focus in this review. However, we also reflect on the challenges lifelogging poses to an information retrieval scientist. This review is a suitable reference for those seeking an information retrieval scientist's perspective on lifelogging and the quantified self.
Article
Sufficient recovery after daily effort expenditure at work is important to protect employee health and well-being. However, the role of commuting in the daily effort-recovery process is still not very well understood. The present study aimed to advance insight in this respect by examining if relaxation, detachment, mastery and stressful delays experienced during the commute from work to home affect employees' recovery status after returning home from work and at the end of the evening. Daily job demands were expected to moderate these effects. Serenity and (low) anxiety were included as indicators of employees' recovery status. Data were collected by means of a 5-day daily diary study (three measurements daily) among 76 participants from various industries. Multilevel analyses showed that relaxation was positively and stressful delays were negatively related to employees' recovery status after returning home from work but not to indicators of recovery at the end of the evening. For detachment, similar relations were found but only on days with high job demands. Mastery was not related to employees' recovery status. These findings enhance our insight in the daily effort-recovery cycle and underline the importance of promoting detachment (on demanding workdays) and relaxation on the way home from work. Copyright © 2013 John Wiley & Sons, Ltd.
Article
In this paper we have used data envelopment analysis (DEA) and econometric models to analyse the impact of research and development and innovation on relative efficiency and productivity change and firm performance in Indian pharmaceutical industry (IPI) between 1998 and 2007 which covers the post-TRIPS (1995) and post Indian Patent Act Amendment (2005) period. Output oriented BCC DEA model and Malmquist productivity index are used to estimate the relative efficiency and productivity change of Indian pharmaceutical companies over the 10 year period. Using econometric models, we have proposed and tested several hypotheses for the IPI and found a positive impact of innovation represented by R&D investment and patents on productivity (sales), market share, exports and ability to attract contract manufacturing among Indian pharmaceutical companies. We also found that the sales growth is additionally driven by DEA efficiency, size, age which have a positive impact on productivity (sales). Export revenue is additionally driven by sales. Within the limitations of the model discussed, contract manufacturing was additionally driven by innovation, size and sales. The company sales growth was additionally driven by export growth and DEA efficiency. The DEA efficiency having a positive impact on sales and sales growth is a new finding as there appears to be no previous investigation to explore this relationship.
Article
Critically examines research over the past 10-12 yrs concerning factors related to turnover and absenteeism in work situations. On a general level, overall job satisfaction was consistently and inversely related to turnover. In an effort to break down the global concept of job satisfaction, various factors in the work situation were analyzed as they related to withdrawal behavior. 4 categories of factors, each representing 1 "level" in the organization, were utilized: organization-wide factors, immediate work environment factors, job-related factors, and personal factors. Several variables in each of the 4 categories were found to be related fairly consistently to 1 or both forms of withdrawal. An attempt is made to put the diverse findings into a conceptual framework centering around the role of met expectations. Methodological considerations and future research needs are also discussed. (83 ref.)