Planning for sustainable cities by estimating
building occupancy with mobile phones
Edward Barbour1,2,3, Carlos Cerezo Davila4, Siddharth Gupta1, Christoph Reinhart4, Jasleen Kaur5&
Marta C. González 1,3,6
Accurate occupancy is crucial for planning for sustainable buildings. Using massive,
passively-collected mobile phone data, we introduce a novel framework to estimate building
occupancy at unprecedented scale. We show that, at urban-scale, occupancy differs widely
from current estimates based on building types. For commercial buildings, we ﬁnd typical
occupancy rates are 5 times lower than current assumptions imply, while for residential
buildings occupancy rates vary widely by neighborhood. Our mobile phone based occupancy
estimates are integrated with a state-of-the-art urban building energy model to understand
their impact on energy use predictions. Depending on the assumed relationship between
occupancy and internal building loads, we ﬁnd energy consumption which differs by +1% to
−15% for residential buildings and by −4% to −21% for commercial buildings, compared to
standard methods. This highlights a need for new occupancy-to-load models which can be
applied at urban-scale to the diverse set of city building types.
1Department of Civil and Environmental Engineering, MIT, Cambridge, MA, USA. 2Centre for Renewable Energy Systems Technology, Loughborough
University, Loughborough, LE, UK. 3Lawrence Berkeley National Laboratory, Berkeley, CA, USA. 4Sustainable Design Lab, MIT, Cambridge, MA, USA.
5Signify Research North America (formerly Philips Lighting), Cambridge, MA, USA. 6Department of City and Regional Planning, UC, Berkeley, CA, USA.
Correspondence and requests for materials should be addressed to M.C.G. (email: firstname.lastname@example.org)
NATURE COMMUNICATIONS | (2019) 10:3736 | https://doi.org/10.1038/s41467-019-11685-w | www.nature.com/naturecommunications 1
Energy use in buildings accounts for over 40% of total pri-
mary energy use in the U.S and E.U.1,2, and is particularly
concentrated in urban areas, which are set to rapidly expand
in the near future3.Efﬁciency measures applied to urban build-
ings therefore constitute an unparalleled opportunity for global
energy use and emissions reduction4,5. Recognizing this, many
national and local governments have implemented or are con-
sidering policies to promote building efﬁciency, for example, by
incentivizing increased insulation levels or high efﬁciency appli-
ances. However, these initiatives typically operate with limited
resources, both in terms of budget and technical assistance.
Urban-scale building energy models have therefore been devel-
oped to support strategic decision-making by pinpointing policies
with the greatest saving potential.
Urban-scale energy models can be broadly split into two cate-
gories; those that rely on data-driven statistical or machine
learning models (black-box models) and those that include a
physics-based building simulation engine (white-box models). In
the ﬁrst category, statistical models typically reduce the energy
consumption of a large number of buildings to a small number of
explanatory variables6–8. These models can subsequently be used
for energy benchmarking and were the ﬁrst to estimate urban-
scale impacts of potential Energy Conservation Measures (ECMs).
However, while useful, these models have two notable limitations.
Firstly, the lack of detail regarding individual buildings can result
in poor out-of-sample predictions, particularly related to the
widespread impacts of many ECMs and feedback between ECMs
and building operation8, and secondly, they cannot be used in the
design of new, or densiﬁcation of existing, urban districts required
to facilitate booming urban population growth9.
In the second category, a new class of Urban Building Energy
Models (UBEMs) has recently been developed, which includes
detailed physics-based simulations of thousands of individual
buildings in cities or districts10–16. In an UBEM, each individual
building is represented by a dedicated physics-based engineering
model, the likes of which are commonly used worldwide for
engineering design, code-compliance demonstration, and
improved operation9. UBEMs rely on highly automated work-
ﬂows to generate the detailed individual building models without
requiring time-intensive work by building modeling experts9,17,18,
often combining several datasets, including Geographic Infor-
mation Systems (GIS) databases, LiDAR and stock building
archetypes. The models are then calibrated to match real data
samples11,19. However, despite high versatility20, inaccuracies are
introduced in these models due to a lack of understanding
regarding building occupancy at an urban scale20–22.
For most buildings, with the exception of certain industrial
facilities, occupant presence and behavior have a decisive impact
on building energy use. In recognition of this well-known fact, the
International Energy Agency (IEA) has funded two Annexes
dedicated to understanding occupant presence and behavior in
buildings and improving models, Annex 66 and its follow-up,
Annex 7923,24. While at the individual building level, several
occupancy models exist (i.e., refs. 25,26.), only a few studies have
considered occupancy in an urban context27. In absence of better
data, standardized deterministic space-based occupant presence
and behavior models are the only viable approach for UBEMs of
urban areas with mixed-use buildings20.
To understand building occupancy at an urban-scale requires
either a vast array of sensing infrastructure or knowledge of popu-
lation movements over an entire metropolitan regions–capturing the
daily movements of citizens between different buildings. To that end,
in this work, we propose using massive, passively-collected mobile
phone data to infer building occupancy on a city-level. This data has
already been used to extract locations where individuals stay (stay
points), as well as to infer their location-based activities28 and for
highly-aggregated electricity load predictions29,30. A recent modeling
framework (TimeGeo) synthesizes previous ﬁndings in human
mobility31,32, demonstrating that sparse mobile phone data can be
used to model individual trajectories for entire urban populations33.
Several other passive data sources have already been used to
understand city dynamics34 and infer building occupancy35–39,in
particular bluetooth, wiﬁ, cameras and electricity data, as well as
their combinations. However, in contrast with mobile phone data,
these sources are not available at sufﬁcient scale for predicting
simultaneous occupancy for thousands of different buildings.
In this work, we develop a method for estimating building
occupancy at urban-scale by extending the TimeGeo frame-
work33. Our statistical method assigns occupants to buildings and
we demonstrate the proposed framework for 83,000 buildings in
the city of Boston, using the individual trajectories of 3.5 million
urban inhabitants. The building assignment is probabilistic and
analogous to route assignment models, successfully developed
with mobile phone data in vehicular trafﬁc40,41. In order to
demonstrate the signiﬁcance of our results for energy modelling,
we develop an UBEM of a central, mixed-use neighborhood and
compare occupancy estimates between standard reference meth-
ods and the mobile-inferred building occupancy. Since occupancy
is a principal driver of building energy use but no model relating
occupancy and building loads on this scale exists, we develop
approximate upper and lower bound scenarios for the impact of
occupancy. Firstly, a low-impact of occupancy scenario wherein
we adjust the occupant-driven reference building loads according
to the relative occupancy, primarily changing the timing of loads
rather than the magnitude. Secondly, a high-impact scenario,
wherein we adjust the occupant-driven loads according to the
absolute mobile-inferred occupancy, signiﬁcantly changing both
load magnitude and timing. We ﬁnd that these scenarios imply
energy consumption that differs by +1to−15% in residential
buildings and by −4to−21% in commercial buildings. Finally,
we illustrate that the mobile occupancy implies the impact of
energy efﬁciency measures may be mispredicted, ﬁnding savings
from insulation improvements up to 25% higher than predicted
with standard occupancy while savings from improving equip-
ment efﬁciency were predicted up to 40% lower in our modelled
region. Our method therefore provides an improved alternative
estimate of the number of occupants in buildings, in contrast to
indirect space-based estimates which represent standard practice
in city-scale energy planning today.
Reference building occupancy. In the United States, the
Department of Energy (DOE) reference commercial buildings
dataset42 provides the standard set of templates for the US
building stock for the EnergyPlus simulation engine43,44. These
include sixteen different building categories in all US climate
zones with different construction periods. Each building category
has its own associated fractional occupancy and load schedules
based on previous studies by the DOE, ASHRAE and several US
National Research Labs. The schedules include estimates for
occupant density based on building type (i.e., small retail occu-
pancy: 0.141 occupants per m2of ﬂoor area) and peak load values
per unit ﬂoor area for different load types (i.e., ofﬁce equipment
density: 17 Wm−2)45. When used for individual buildings, the
space-based templates can be adjusted using local knowledge of
the occupants; however, this is not possible at an urban-scale20.
Hence, without any empirically based alternatives, city-scale
energy models typically use the reference building occupancy,
introducing uncertainty into energy-use predictions and, as a
result, into predicted savings from efﬁciency measures20. This
barrier for urban energy policy development is an international
ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-11685-w
2NATURE COMMUNICATIONS | (2019) 10:3736 | https://doi.org/10.1038/s41467-019-11685-w | www.nature.com/naturecommunications
issue, since all national building stock models suffer from the
same lack of empirical occupancy data.
From mobile phone stay points to building occupancy. Using
the TimeGeo framework33 we simulate the individual mobility of
Boston’s entire metropolitan area population at high resolution.
The framework starts with the Call Detail Records (CDRs) of 1.92
million anonymous mobile phone users of for the period 20
February till 30 March 2010 in the Greater Boston area (see
Methods). Stay points for these users are identiﬁed based on
consecutive mobile phone records within certain temporal and
spatial thresholds, namely 10 min and 300 m. Each stay point is
characterized as either home, work or other, depending on
whether it is estimated to occur at the individual’s place of resi-
dence, work or some other location33, and includes a start time,
duration, latitude/longitude coordinate pair and user id. We ﬁnd
that nearly 200,000 users have more than 50 total stays and at
least 10 home stays (home stays only occur at an individual’s
speciﬁc home-designated building) during the observation period.
These are designated active users and their phone records are
used to extract the mobility parameters for an empirically based
population-wide mobility model, reliant only on measurable
parameters in the data. In each census tract in Boston’s metro
area, TimeGeo expands the active phone users to the population
(i.e., simulating 3.54 million people including 2.10 million
workers and 1.44 million non-workers). The model encompasses
individuals with homes in the region shown in Fig. 1a. The colors
in Fig. 1a indicate the average expansion factor of commuters and
non-commuters for each tract, calculated as the ratio of the total
tract population (as deﬁned by the 2010 census) to the number of
active mobile users with homes within that tract. The model has
been proven to be accurate at the census tract level33 in com-
parison with both the 2009 National Household Travel Survey46
and the 2010–2011 Massachusetts Travel Survey47 (see Supple-
mentary Fig. 1 and Supplementary Note 1). TimeGeo offers sig-
niﬁcant improvements on current urban mobility models,
which typically involve expensive surveys and have very low
sampling rates. It should be noted that the model is primarily
limited by the extent, resolution and duration of the data. As
such, as more large-scale data with higher frequency (i.e., GPS
traces) and longer observation periods (i.e., years) become
available, the model resolution could be increased and hetero-
The ﬁne-scale population mobility model has been shown to
agree with existing methods of quantifying population mobility at
the census tract level33. Hence, each stay can be thought of as a
person-visit to a tract, as implied by the empirical mobile phone
data and the Boston metro census population, and in agreement
with current best-in-class travel demand models. Figure 2a
illustrates the journeys for the people who visit the ﬁve tracts
shown in Fig. 1c. The predicted tract-level occupancy is revealing,
along with the expected ﬂux of people in-and-out-of the area, as
6 km0 20 60 km
Residence + Retail
Office + Retail
Hotel + Retail
Fig. 1 Extent of the data. aThe extent of the TimeGeo simulation in the greater Boston metropolitan area. Each census tract is colored by its expansion
factor. bThe extent of the Boston Buildings dataset—tracts are marked as either majority residential or majority commercial depending on the most
common building use. cThe buildings used in the Urban Building Energy Model, colored by type. These buildings occupy ﬁve census tracts. Black boxes in
Fig. 1a. and Fig. 1b. illustrate the extent of the next panel. Maps created using Google Maps, 2019 Google
NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-11685-w ARTICLE
NATURE COMMUNICATIONS | (2019) 10:3736 | https://doi.org/10.1038/s41467-019-11685-w | www.nature.com/naturecommunications 3
shown in Fig. 2b, c. Here, we see that the implied occupancy of
the ﬁve tracts over the course of a typical weekday using DOE
reference building models ranges from 71,300–122,500, whereas
the mobile-informed model ranges from 11,800–20,700. Given
the census population for these ﬁve tracts is 13,600, the mobile
model certainly seems more credible and the DOE reference
occupancy standard may be overestimating actual occupancy
numbers by over 600%.
Each tract typically encompasses hundreds to thousands of
buildings, for example, the ﬁve tracts used in the energy model
(Fig. 1c) contain 1330 buildings. Therefore, to assign each stay to
a building we assume that each stay point represents a visit to a
building in the same tract and probabilistically map each stay to a
building. Buildings data is provided by the city of Boston and
includes footprint, geometry, height and a tax assessment type
(see Supplementary Fig. 2 and Supplementary Note 2). We re-
classify the buildings into three broad classes; residential,
commercial and industrial (see Methods) and stipulate that
home type stays can only be assigned to residences, while work
stays can be assigned to all building types except residential and
other stays can be assigned to commercial buildings only.
Therefore, we assume that an individual stay point can potentially
be located in any building within the same tract, provided the
building is open for the stay duration and the building
functionality matches the stay type. To probabilistically select a
particular building, we use nominal building capacities, and
assume that, in general, buildings with higher capacity attract
more stays. To estimate the nominal capacities we use typical Per
Capita Area (PCA) values for each functional building type (see
Methods, Supplementary Table 1 and Supplementary Note 3).
We assume residential buildings are always open and have a PCA
of 40 m2, i.e. a house with a ﬂoor area of 200 m2would have a
nominal capacity of ﬁve occupants. For non-residential buildings,
we use Places Of Interest (POIs) available in digital maps where
the exact building functional use is unknown (Fig. 3b, see
Methods, Supplementary Table 2 and Supplementary Note 4).
Furthermore, we hypothesize that at a city-block level there
may be a rich-get-richer effect, which results in popular areas
attracting more people per-unit-ﬂoor-area. Therefore, in our
model, stays are preferentially attracted to areas with larger
capacities for that type of stay. This is based on the observation
that shops and industries agglomerate due to clustering of
economic activities48—for example, resulting in the formation of
popular shopping, dining or entertainment districts. To model
this, we adopt a two stage process when assigning a stay to a
building within a particular tract. Firstly, we spatially group the
possible buildings by their centroid coordinates using hierarchical
agglomerative clustering and Wards minimal increase of variance
method49 to form city-blocks. Then, after probabilistically
selecting a block we select an individual building (Fig. 3c). The
preferential attraction at the cluster level is given as follows:
For a given cluster iwith a nominal capacity C
,P(i) is the
probability of that cluster being assigned a particular stay. The
total capacity of each cluster iis the sum of the capacity of the
buildings contained within-cluster i, and the total tract
capacity is the sum of the capacity of the Nclusters. Therefore,
j¼1Cij, where the buildings are indexed by j, and C
are the PCA and total ﬂoor area of building jin
cluster i, respectively. The parameter μvaries the degree of
preferential attraction to areas with a high-capacity for stays of
that type (see next section). Once a cluster ihas been selected,
then within that cluster we assign the stay to a building jwith
probability P(j|i), proportional to its relative within-cluster
capacity, as described by Eq. (2).
Figure 3illustrates the process of assigning stays to the
buildings. Once we have considered all the stay points within a
particular tract for the 24 h period, we assume that we have a
complete picture of building occupancy over the day.
Occupancy for Boston’s buildings. We now study the expected
building occupancy for 0 ≤μ≤1. The rationale for adopting these
bounds are, (1) we do not expect a rich-get-richer effect where the
0 20 60 km40
DOE reference Census pop.
4 a.m. 8 a.m. noon 4 p.m. 8 p.m.
Fig. 2 Residents and visitors in the Back Bay. aPredicted journeys from the TimeGeo for users who visit the ﬁve tracts shown in Fig. 1c. Each line represents
approximately 50 users. bThe total predicted hourly occupancy for the area shown in Fig. 1c according to the TimeGeo model and as implied by the DOE
reference building model. The census population for the area is also shown. cThe predicted population ﬂux as implied by the reference buildings and
TimeGeo model. Map created using Google Maps, 2019 Google
ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-11685-w
4NATURE COMMUNICATIONS | (2019) 10:3736 | https://doi.org/10.1038/s41467-019-11685-w | www.nature.com/naturecommunications
returns are diminishing (i.e., if μ< 0 then areas with more
opportunities (capacity) will attract proportionally less people per
unit ﬂoor space) and (2) we do not anticipate that μ> 1 because
above this value we ﬁnd that high-capacity areas start to dominate
the stay attraction to an unreasonable degree, leaving many
unoccupied buildings. The effect of the non-linear parameter μis
discussed in detail in Supplementary Note 5 (see also Supple-
mentary Fig. 3 and Supplementary Table 3).
The results are consistent with residences having high night
occupancy and commercial buildings having higher daytime
occupancy, as people generally leave their homes during the day
(Fig. 4a). Figure 4b shows that greater μresults in more highly
occupied spaces, shifting occupants from lower occupancy spaces
to higher ones. In the range 0 ≤μ≤1 we see that shift in the
distributions change is slight and the median occupancy-over-
capacity ratio for residential buildings decreases from 1.4 to 1.2
while the median commercial occupancy decreases from 0.3 to
0.2. For the 83,000 buildings in the city of Boston, we ﬁnd that
most commercial buildings have nominal capacities that are
signiﬁcantly higher than their peak occupancy (Fig. 4b shows that
mobile-inferred peak occupancy is higher than the capacity in
only 5% of commercial buildings). Conversely, for residential
buildings, the occupancy-over-capacity distribution has a long tail
and implies a wide range of residential occupant densities. These
are highly neighborhood dependent and we observe high
numbers of occupants in student areas situated in close proximity
to universities, where higher-than-average residential occupancy
persists throughout the day (Fig. 4a and Supplementary Fig. 4).
For commercial buildings, our estimate of peak capacity is rarely
approached. We ﬁnd that when μ=0.5, the most common value
for residential peak-occupancy-over-capacity is 1 while the
median is 1.3. Therefore our assumed residential PCA appears
close to typical values for residential space in Boston.
Occupancy for urban energy predictions. At this point, we
create an urban energy model for the mixed-use district in the
Back-Bay region of Boston, covering ﬁve census tracts with 1,330
buildings as shown in Fig. 1c. Out of these, we were able to model
1,266 buildings using the DOE reference buildings (see Supple-
mentary Fig. 5). The other buildings were considered in the
occupancy-assignment process (since they could still be open and
attract people); however, due to atypical uses (i.e,. ﬁre/police,
churches, etc.), no appropriate reference model existed and
therefore these were not included in the energy model. To gen-
erate the modeled buildings we used data regarding use per ﬂoor,
period of construction and geometry. We use the previously
developed UMI framework10 (see Methods) which is based on
EnergyPlus, a state-of-the-art whole building energy simulation
engine developed by the US DOE43. The building constructions
and systems were speciﬁed through archetype templates based on
the DOE Reference Buildings dataset (see Supplementary Note 6
and Supplementary Figs. 6 and 7). The templates have the same
nominal peak occupancy values as used for the 83,000 building
analysis and also contain 24-h occupancy proﬁles dependent on
the building use. The occupancy is compared to our mobile-
informed occupancy results, as shown in Fig. 4d. Our results
suggest that in individual commercial buildings the reference
building occupancy can over-predict occupancy by an order of
magnitude or more at all times during the day, in our modeled
The reason for the major discrepancy between the reference
building occupancy and mobile-inferred occupancy is that the
mobile occupancy is normalized to the Boston metro area
population, whereas the reference building occupancy proﬁles are
based on a combination of factors including ASHRAE ventilation
requirements, ﬁre safety codes, and generalized building-
manager-surveys. Importantly, these factors have no ability to
account for the city-speciﬁc population and its movements, and
therefore, as shown in Fig. 2b these are inconsistent with the
census population. Accordingly, for the total modeled neighbor-
hood we ﬁnd the mobile occupancy corresponds to only 15–24%
of the reference building occupancy over the course of a typical
weekday. The population ﬂuxes also seem unreasonable whereas
the mobile-inferred people-ﬂuxes between different regions are
consistent with current transportation models33.
Occupant-driven energy loads. Since occupancy is often a large
driver for building energy use8,20,37,50, the differences between the
reference building and the mobile-inferred occupancy imply
different energy consumption patterns. Unfortunately, while
methodologies exist that can predict appliance use for individual
buildings based on the maximum number of occupants26,no
satisfactory method exists that can function at urban-scale20,50,51.
Therefore, we rely on the occupant-driven loads prescribed in the
DOE reference buildings, which are empirically related to the
reference occupancy. We develop two rule-based scenarios, which
broadly represent high-impact-of-occupancy and low-impact-of-
occupancy case studies, and compare the results of each with a
base model using the DOE reference buildings. The resulting
three scenarios are described as follows. First, the DOE reference
0 100 200 300 m
Other tract buildings Stay
0 100 200 300 m
Other cluster buildings
0 100 200 300m
Get possible buildings Cluster into 'walkable' groups
For each stay:
Select a cluster and then a building
Probabilistic occupancy assignment
Update building occupancy
6 a.m. Noon 6 p.m.
Call detail records
Identify active users and extract
Urban population movements:
Stay types, times, locations and
Building outline, height, class
Use google maps API to get POIs
From POIs types estimate PCA and
opening hours for non-residential
Fig. 3 CDR data and building data to building occupancy method. aThe
TimeGeo framework going from call detail records to population-wide stay
locations33.bThe building data includes building type, PCA and opening
hours. cThe set of possible buildings for a stay depends on building type
and opening hours (for the stay type other as shown, possible buildings are
open commercial buildings). Buildings are clustered into localized spatial
groups using Ward’s minimum increase of variance method and the
dendrogram is truncated at a Ward distance of 500 m, which yields city-
block shaped clusters. Then, a cluster is probabilistically selected followed
by a building, and the stay is added to that building’s occupancy proﬁle.
Maps created using Google Maps, 2019 Google
NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-11685-w ARTICLE
NATURE COMMUNICATIONS | (2019) 10:3736 | https://doi.org/10.1038/s41467-019-11685-w | www.nature.com/naturecommunications 5
scenario is the control scenario and presents simulations using
occupancy and load schedules unaltered from DOE reference
building templates. Second, the low-impact scenario develops
occupancy-driven load schedules considering the ratio between
the relative mobile occupancy and relative reference building
occupancy, as described in Eq. (3). Third, the high-impact sce-
nario develops occupancy-driven load schedules considering the
ratio between the absolute mobile occupancy and reference
building occupancy, as described in Eq. (4).
In Eqs (3) and (4), Lmob(h) is the fractional schedule for each
load at hour hassociated with the mobile occupancy, Omob(h).
LDRB(h) is the DOE Reference Building load schedule associated
with the reference building occupancy proﬁle ODRB(h) and bis
the baseload—the proportion of load independent of occupancy
—which we assume is given by the minimum value of LDRB(h).
The purpose of the scenarios is to create relationships between
occupancy and the resulting use of lights, equipment and hot
water. In both, we deﬁne a constant baseload which includes non-
occupant related energy use, such as refrigerators etc. For the
remaining loads we assume upper and lower bounds as follows:
For the lower bound (low-impact), we assume that loads vary by
the relative ratios between mobile and reference building
occupancy, so if OmobðhÞ
MAXðODRBÞthe predicted load increases.
This primarily corrects time-of-day effects, i.e., if occupants arrive
earlier loads are shifted to be earlier. In contrast, the upper bound
(high-impact) assumes non-baseload loads change as the absolute
ratio between the mobile and reference building occupancies, i.e.,
if the mobile occupancy predicts half the number of people at a
certain time the (non-baseload) loads will be halved. The high-
impact scenario works best in a compartmentalized building,
such as an apartment block, where light and equipment scale with
the number of occupied rooms at any moment in time.
Conversely, the low-impact scenario is suitable for open plan
ofﬁces or retail, where occupants drive when lighting, fans etc. are
switched on, but the exact consumption is insensitive to the
absolute occupancy. However, Eq. (3) leads to implausible results
if the occupant number predicted by the mobile occupancy
schedule at a particular time is much lower than the reference
building schedule and simultaneously corresponds to a larger
relative mobile occupancy. Therefore, we stipulate that the
predicted mobile load can only be a factor of Omob ðhÞ
the reference building load. Supplementary Fig. 8 illustrates the
occupancy-driven load schedules for different types of building
under each scenario (see also Supplementary Note 7).
Heating and cooling may also be occupant-driven—occupants
may change temperature set-points according to personal thermal
comfort and reduce heating/cooling in unused space. EnergyPlus
speciﬁes heating and cooling set-points and requires a schedule
for ON/OFF operation. In all scenarios, we use the consistent set
of heating and cooling set-points provided in the DOE reference
buildings dataset. The differences in heating and cooling loads
between the scenarios is then driven by the number of occupants
and their appliance usages.
Urban energy prediction with mobile occupancy. Using the
previously described UBEM we now run simulations for the 1,266
modeled buildings under each of the three scenarios (DOE
reference, low-impact and high-impact), each with occupancy
distributions for μ=0, μ=0.5, and μ=1. We simulate one
Noon: residential 9 p.m.: residential
Noon: commercial 9 p.m.: commercial
Estimated number of people by building type
0.5 Residential, = 0.0
Res. = 0.0
Comm. = 0.0
Res. = 0.5
Res. = 1.0
Comm. = 0.5
Comm. = 1.0
Commercial, = 0.0
Mixed, = 0.0
Residential, = 0.5
Commercial, = 0.5
Mixed, = 0.5
Residential, = 1.0
Commercial, = 1.0
Mixed, = 1.0
–10 0 10 100 1000
Mean hourly occ. difference (persons)
P(Mean hourly occ. difference)
Fig. 4 Building Occupancy results. aEstimated number of building occupants by building type for the mobile-inferred model for residential and commercial
buildings. bDistributions of the ratio of maximum observed occupancy to nominal building capacity for ~83,000 buildings in Boston. cDistribution of
average hourly occupancy difference between DOE reference and mobile-inferred occupancy for the 1,266 buildings in the energy model. Positive means
the DOE occupancy is greater while negative implies the opposite. Maps created using Google Maps, 2019 Google
ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-11685-w
6NATURE COMMUNICATIONS | (2019) 10:3736 | https://doi.org/10.1038/s41467-019-11685-w | www.nature.com/naturecommunications
representative day in each season using the Typical Meteor-
ological Year (see Methods) to get a predicted daily energy con-
sumption value for each building in each season. For the
reference model we also compare the predicted annual Energy
Use Intensity (EUI)—the energy consumption per unit ﬂoor area
—for each building to the national average value for the
equivalent building types in the Commercial Building Energy
Consumption Survey and the Residential Building Energy Con-
sumption Survey. Here, we ﬁnd the average difference of the
modeled EUI compared to the CBECS/RBECS averages for
the speciﬁc geographic region ranged between 5% and 20% for
the modeled building types with the simulated value always being
higher than the CBECS ref. 19.
Figure 5a, b shows the predicted distributions of daily
Energy Use Intensity (EUI) averaged over all seasons, grouped
into residential-only and commercial buildings, for each μ
value. Consumption is further split into gas (heating and hot
water) and electricity (cooling, equipment, and lighting)
estimates. We ﬁnd that the low-impact scenario predicts
increased gas use and decreased electricity use compared to the
DOE reference scenario, with both effects being larger in
commercial buildings. The high-impact scenario exaggerates
both of these effects in commercial buildings, although for
residential buildings gas use is decreased. The uncertainty from
the different occupancy distributions (as generated by the
range of μvalues) is also illustrated. We see that the differences
between the occupancy distributions are insigniﬁcant com-
pared to the difference between the high-impact and low-
In comparison to the DOE reference scenario, the low-impact
scenario predicts a median EUI increase of 0.6 ± 0.1% for
residential buildings and a decrease of 4.2 ± 0.1% for commercial
buildings. The high-impact scenario predicts median EUI
decreases of 14.9 ± 0.9% and 21.0 ± 0.4% for residential buildings
and commercial buildings, respectively. Figure 5c, d illustrates
hourly electricity consumption predictions for all scenarios
aggregated for all modeled buildings. Electrical load proﬁles and
total winter gas usage are particularly important to grid operators
and utilities to plan peak electrical generation capacity and gas
storage respectively. We see that the prominence of the winter
electric peak is reduced by 4 MW in the low-impact scenario and
almost completely removed in the high-impact scenario. The
summer peak electricity loads are also predicted signiﬁcantly
lower, due both to reduced numbers of occupants and
heterogeneous occupancy. These effects are overemphasized in
the high-impact scenario. Total winter gas use is predicted ~9.9%
greater in the low-impact scenario while it is 10.4% greater in the
high-impact scenario (see Supplementary Tables 4 and 5,
Supplementary Note 8 and Supplementary Figs. 9 and 10).
The impact of energy efﬁciency measures. Finally, we predict
the impact of two different generic energy efﬁciency interven-
tions, that could be encouraged by energy policy (see Supple-
mentary Note 9). Firstly, we improve wall and roof insulation by
10% in comparison to the 2010 building code requirements for all
residential spaces, and secondly, we implement a 10% efﬁciency
gain for equipment in commercial spaces (plug loads not
including light and heating or cooling). Figure 5e, f shows the
cd e f
Daily energy use intensity (kWhm–2)
Energy use (×103 kWh)
Energy use (×103 kWh)
Δ(Energy use) (%)
Δ (Energy use) (%)
0.05 0.15 0.25 0.35 0.45
High-impact: = 0 = 0.5
= 0.5 = 0
0.05 0.15 0.25 0.35 0.45
30 Winter electric
DOE ref. High
Insultation upgrade Equipment upgrade
Elec. Gas Total
06 a.m. Noon 6 p.m. 6 a.m. Noon 6 p.m. Ref Low High Ref Low High
0.15 0.25 0.35 0.45
Daily energy use intensity (kWhm–2)
0.2 0.4 0.6
0.2 0.4 0.6
0.5 1.0 1.5
Fig. 5 Scenario energy consumption predictions. aDistribution of the daily EUI for all residential-only buildings for μ=0, μ=0.5, and μ=1 occupancy.
bDistribution of the daily EUI for all non-residential-only buildings. cHourly electricity use results for winter in each scenario (shading denotes the range
from the different μvalues). dHourly electricity use results for summer in each scenario. eMedian change in energy use (original minus upgrade) from
upgrading the insulation effectiveness in residential spaces under each scenario. Error bars illustrate the range of predictions for different μvalues.
fMedian change in energy use from upgrading the equipment efﬁciency in commercial spaces under each scenario
NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-11685-w ARTICLE
NATURE COMMUNICATIONS | (2019) 10:3736 | https://doi.org/10.1038/s41467-019-11685-w | www.nature.com/naturecommunications 7
median reduction for each efﬁciency measure in each scenario.
The DOE reference scenario predicts upgrading insulation
decreases the median gas use by around 3.8% but increases
electricity by 0.1%, which results in a median decrease in total
energy use for residential buildings of ~2.3%. While the effects are
similar for the low-impact scenario, the high-impact gas saving
increases to 4.2%. Interestingly, electricity consumption decreases
by 0.1% in the high-impact scenario, having increased by 0.1% in
the DOE reference and low-impact scenarios. With upgraded
equipment efﬁciency, Fig. 5f shows that electricity use is
decreased while gas use increases to compensate for the reduction
in the heat generated by equipment. The DOE reference scenario
predicts the median commercial building saves approximately
2.9% in electricity consumption, which is improved to 3.2% in the
low-impact scenario and to 4.4% in the high-impact scenario.
However, due to increased gas use, the median total energy
savings for commercial buildings are 1.2, 0.9, and 0.7% in the
DOE reference, low-impact and high-impact scenarios, respec-
tively (see Supplementary Tables 6 and 7). These results show
that knowledge of occupancy is crucial to accurately predict the
results of large-scale efﬁciency improvements in building envel-
opes or systems.
In this paper, we have presented a framework for improving
urban scale building occupancy estimates. Our method extends
proven techniques for modeling urban mobility based on mobile
phone and census data and is the ﬁrst to estimate urban-scale
building occupancy with mobile phone data. We ﬁnd that typical
maximum daily occupancy in individual commercial buildings is
likely to be only 20–30% of assumed capacity by building type.
Conversely, residential occupancy is highly neighborhood
dependent, with certain areas experiencing much higher occu-
pancy per unit of ﬂoor space than others. The differences between
our mobile phone based occupancy estimates and current occu-
pancy assumptions based on building types arise because current
methods treat buildings in isolation whereas our estimates con-
sider that occupants can visit multiple buildings, i.e., people leave
residential buildings to go to ofﬁce buildings to work. Therefore
our estimates are normalized against the total urban population.
While the exact results are speciﬁc to the modeled region, the
method is easily portable to other locations due to the ubiquitous
nature of the data sources used.
These differences between the estimated mobile occupancy and
current building-based assumptions have strong implications for
urban energy modelling and predictions. In a state-of-the-art
UBEM of a mixed-use urban neighborhood in Boston, the mobile-
estimated occupancy predicted energy consumption that differed by
+1to−15% for residential buildings and was reduced by 4–21%
for commercial buildings compared with reference building model.
This is highly consequential for urban energy policy development,
since policy makers rely on urban energy models to inform their
decisions and these decisions have widespread and long-term
impacts. For example, in the US, entire new neighborhoods are
designed with the US Green Building Council’sLEED(Leadership
in Energy and Environmental Design) certiﬁcation, which acts as a
de facto building standard in certain jurisdictions and for which
compliance is demonstrated through a simulated comparison
against a code compliant base version45. Furthermore, urban energy
models will be required to develop future sustainable solutions
speciﬁc to individual cities or neighborhoods52.
Given the large range between the low-impact and high-impact
of occupancy scenarios, our work underscores an urgent need to
understand how occupancy drives different load classes53,54
across the diverse set of building types found in cities. Our results
suggest that building heating and electrical loads may be sys-
tematically mispredicted due to incorrect occupancy assumptions,
especially in large commercial buildings. However, while our
results imply heating loads may be under-predicted in most
buildings due to lower-than-design occupancy, the opposite may
also be true if signiﬁcant portions of unoccupied indoor space are
allowed to cool below comfortable temperature levels. Deter-
mining whether or not this is the case will require new large-scale
datasets relating building occupancy and energy use. Further
uncertainty analysis of building parameters (i.e., range of heating/
cooling set-points, max hot water temperatures etc.) may also
prove informative in this regard, however without empirical data
on both the most important parameters (each building model
contains many thousands of parameters) and the typical ranges
encountered for the set of city building types modelled, it is
unclear how informative this may be. Accordingly, collating data
on the most inﬂuential parameters and their corresponding
ranges for different building types would be a useful exercise. Our
analysis also highlights how the different occupancy patterns and
their assumed relationship with building loads implies different
effectiveness for building efﬁciency measures. This is of particular
importance, since energy efﬁciency incentives are vital in the real
world of sustainable urban planning with limited budgets.
Through the inclusion of heterogeneous mobile phone inferred
occupancy patterns, this work represents a signiﬁcant improve-
ment on current best practice occupancy estimates and an
important step towards bespoke urban building energy models. It
also has its set of limitations, including the lack of available real
occupancy data on a sufﬁciently large scale for comparison.
Therefore, we expect that as higher resolution data relating to
urban mobility, building occupancy and building energy use
become available, the presented modeling framework may be
Finally, while our work has focused on improving urban
building energy models with the mobile-inferred occupancy
proﬁles, our results are also relevant for building utilization. The
large discrepancy between estimated building occupancy and
actual capacity implies that building efﬁciency may also be
improved through better space utilization, for example by tran-
sitioning residential spaces into commercial spaces at times of
low-residential occupancy. This is an interesting area of future
urban planning and is especially important due to the high pre-
dicted expansion rates of global urban centers, although it has
associated legislative and social barriers.
Mobile phone data and TimeGeo. The TimeGeo framework33 starts with the
CDRs of 1.92 million people in the Greater Boston Area. The data were collected by
AirSage for operational purposes for two mobile phone carriers. The location
coordinates are estimated by the data provider using standard triangulation algo-
rithms and have higher resolution (accuracy of 200–300 meters) compared to
TimeGeo extracts stay points from each individual’s sequence of consecutive
mobile phone records, and separates commuters from non-commuters by
identifying users with work stay points. For each tract in the Boston metropolitan
area expansion factors are calculated for commuters and non-commuters using
census data. Based on the distributions of empirical individual mobility parameters
extracted from the active user data (0.78 million active users with homes inside the
census region—Fig. 1a), the entire urban population movement is simulated. See33.
Building data for occupancy model. Building data are obtained from the city of
Boston, for 82,542 buildings. The geographic extent of this data is shown in Fig. 1b.
Buildings are classiﬁed into several tax classes including residential classes, com-
mercial classes, an industrial class and a mixed residential-commercial class (see
Supplementary Fig. 2 and Supplementary Note 2). For mixed-use residential-
commercial buildings, if the building is multi-ﬂoor we assume the ﬁrst ﬂoor is
commercial and if single-ﬂoored then we assume the ﬂoor area is split equally
between residential and commercial classes.
ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-11685-w
8NATURE COMMUNICATIONS | (2019) 10:3736 | https://doi.org/10.1038/s41467-019-11685-w | www.nature.com/naturecommunications
Building opening hours and Per Capita Areas. We collect all the stays in each
census tract using outlines provided by the Massachusetts 2010 census (and the
latitude/longitude for each stay). For each building, we use the centroid and collect
all buildings in each tract. We calculate the building total ﬂoor area, assuming each
ﬂoor has equal area and the number of ﬂoors is the greatest integer of (building
height)/(ﬂoor height). We assume all ﬂoor heights are 10 feet. We also add a broad
classiﬁer marking each building as residential, commercial, and industrial, which
we use for stay compatibility.
To get building capacity, we need further information and we rely on a mapping
between the functional use of buildings and the PCA required for that use case—see
Supplementary Notes 2–4 and Supplementary Tables 1 and 2. Unfortunately, the set
of functional uses for which PCAs are available is not coincident with the Boston
buildings classiﬁcation, nor is a credible translation possible, except for residential
uses. Therefore we obtain additional use information at high spatial resolution from
digital maps. We use the Google Places API Service and query for Points of Interest
(POIs) located throughout Boston in the proximity of each non-residential building.
We aggregate the results to a list of POIs near non-residential buildings,
discounting duplicate information. The information associated with each POI is
variable and can include category (i.e., restaurant, bar, bank etc), place name, location,
address, hours of operation, ratings, and reviews amongst others. Not all information
ﬁelds are always available. We then use the building outlines to ﬁnd POIs contained
within each building and subsequently assign opening hours and PCAs.
Residential buildings are always open. For each non-residential building, the
opening hours is the superset of the opening hours for all contained POIs. If there
is no opening hour information then we assign opening hours of 7am–9pm
(commercial) or 9am–5pm (industrial).
To assign PCAs, we map the different POI categories to a functional use for
which we have an estimate of PCA (see Supplementary Note 4). Residential
buildings are assigned 40 m2person−1. For non-residential buildings, we ﬁrst assign
PCAs to the set of buildings containing POIs—assigning the highest PCA for all
contained POIs. Second, for each building without a PCA assigned, a PCA is
randomly selected from the distribution of non-residential PCAs in that tract
(inferred by POIs). In this way, if there are a high number of restaurants in a tract,
an unmarked building would have a relatively higher probability of being allocated
the PCA of a restaurant.
Creating EnergyPlus building templates. EnergyPlus building models were cre-
ated using the UMI framework, as developed in ref. 10. Using this tool building
footprints were imported into Rhinoceros 3D CAD and extruded to the height
speciﬁed by the data. Constructions, systems, and load characteristics were assigned
to buildings using an appropriate template from the DOE reference buildings
dataset, and based on the use and age of the structure as reported in Boston’s tax
assessment dataset (see Supplementary Note 6). EnergyPlus takes hourly occu-
pancy as an input, deﬁned by a combination of a maximum occupant density per
unit ﬂoor area and a fractional hourly schedule. Additionally, inputs for occupant-
driven loads are also required (i.e., lighting, equipment and hot water), again
speciﬁed by peak values per unit area and a fractional schedule.
Manipulating EnergyPlus input ﬁles and running simulations. All adjustments
to the generated building models, including changes to the occupancy patterns
and implementing energy efﬁciency measures, were made in Python using
Eppy56, a python package for scripting EnergyPlus input ﬁles. For the 1,266
modeled buildings EnergyPlus was run for one representative day for each
season. Weather data for the simulations was extracted from Boston’sTypical
Meteorological Year (TMY) data climate ﬁle provided by the US DOE57,and
developed from weather information gathered at Boston’s Logan Airport station
(~3 miles from the simulated site). The TMY data deﬁne a typical year’sweather
in the region and are commonly used for annual energy simulations. However,
since this data is typical rather than extreme, it is not suitable for designing
building systems for worst-case scenarios. Hence, the energyplus weather ﬁles
(EPW ﬁles) also contain designations for the most extreme and most typical
weather-weeks in each season. Since we are interested in understanding the
average effect of occupancy on energy consumption and we only have a single
day of mobile-inferred occupancy, we select the mid-range day from each typical
week of weather data by season for our simulations. Therefore, each building was
simulated for the days 30-Jan (winter), 1-Apr (spring), 30-Jul (summer), 23-Oct
(autumn) under all scenarios.
Data to run the whole analysis for a demonstration census tract are provided along with
the code—see Code Availability. Boston buildings data are available from the city of
Boston’s data portal Analyze Boston (https://data.boston.gov/). Larger samples of the
data are available from the corresponding author upon reasonable request.
Python scripts to run the analysis are available at https://github.com/humnetlab/
Mob_Building_Occ_Energy_Use.git, with stay point and building data provided for a
demonstration census tract. This includes the source code used to generate the analysis
Figures in the manuscript and supplementary material. EnergyPlus (also required) is
freely available from the US DOE (https://energyplus.net/). The UMI urban modelling
interface used to create the energyplus building models is available from the sustainable
design lab at MIT (http://web.mit.edu/sustainabledesignlab/projects/umi/index.html).
Received: 22 October 2018 Accepted: 24 July 2019
1. U.S Energy Information Administration. Monthly energy review. https://www.
2. Cao, X., Dai, X. & Liu, J. Building energy-consumption status worldwide and
the state-of-the-art technologies for zero-energy buildings during the past
decade. Energy Build. 128, 198–213 (2016).
3. Güneralp, B. et al. Global scenarios of urban density and its impacts on
building energy use through 2050. Proc. Natl Acad. Sci. USA 114, 8945–8950
4. International Energy Agency. Transition to sustainable buildings: Strategies
and opportunities to 2050. Tech. Rep., IEA. https://www.iea.org/publications/
5. Kammen, D. M. & Sunter, D. A. City-integrated renewable energy for urban
sustainability. Science 352, 922–928 (2016).
6. Qomi, M. J. A. et al. Data analytics for simplifying thermal efﬁciency planning
in cities. J. R. Soc. Interface 13, 20150971 (2016).
7. Kontokosta, C. E. & Tull, C. A data-driven predictive model of city-scale
energy use in buildings. Appl. Energ. 197, 303–317 (2017).
8. Hsu, D. How much information disclosure of building energy performance is
necessary? Energ. Policy 64, 263–272 (2014).
9. Reinhart, C. F. & Davila, C. C. Urban building energy modeling—a review of a
nascent ﬁeld. Build. Environ. 97, 196–202 (2016).
10. Reinhart, C., Dogan, T., Jakubiec, J. A., Rakha, T. & Sang, A. Umi-an urban
simulation environment for building energy use, daylighting and walkability.
In Proc13th Conference of International Building Performance Simulation
11. Hong, T., Chen, Y., Lee, S. H. & Piette, M. A. Citybes: A web-based platform
to support city-scale building energy efﬁciency. Urban Computing.https://
12. Walter, E., & Kämpf, J. H. A veriﬁcation of CitySim results using the BESTEST
and monitored consumption values. Proceedings of the 2nd Building
Simulation Applications conference (No. CONF, pp. 215–222). (Bozen-Bolzano
University Press 2015).
13. Kazas, G., Fabrizio, E. & Perino, M. Energy demand proﬁle generation with
detailed time resolution at an urban district scale: A reference building
approach and case study. Appl. Energ. 193, 243–262 (2017).
14. Nageler, P. et al. Novel validated method for gis based automated dynamic
urban building energy simulations. Energy 139, 142–154 (2017).
15. Sousa, G., Jones, B. M., Mirzaei, P. A. & Robinson, D. An open-source
simulation platform to support the formulation of housing stock
decarbonisation strategies. Energy Build. 172, 459–477 (2018).
16. Quan, S. J., Li, Q., Augenbroe, G., Brown, J., & Yang, P. P. J. Urban data and
building energy modeling: A GIS-based urban building energy modeling
system using the urban-EPC engine. Planning Support Systems and Smart
Cities (pp. 447–469). (Springer 2015).
17. Chen, Y., Hong, T. & Piette, M. A. Automatic generation and simulation of
urban building energy models based on city datasets for city-scale building
retroﬁt analysis. Appl. Energ. 205, 323–335 (2017).
18. Frayssinet, L. et al. Modeling the heating and cooling energy demand of
urban buildings at city scale. Renew. Sustain. Energy Rev.81, 2318–2327
19. Davila, C. C., Reinhart, C. F. & Bemis, J. L. Modeling boston: A workﬂow for
the efﬁcient generation and maintenance of urban building energy models
from existing geospatial datasets. Energy 117, 237–250 (2016).
20. Happle, G., Fonseca, J. A. & Schlueter, A. A review on occupant behavior in
urban building energy models. Energy Build. 174, 276–292 (2018).
21. Francisco, A., Truong, H., Khosrowpour, A., Taylor, J. E. & Mohammadi, N.
Occupant perceptions of building information model-based energy
visualizations in eco-feedback systems. Appl. Energ. 221, 220–228 (2018).
22. Baetens, R. & Saelens, D. Modelling uncertainty in district energy simulations
by stochastic residential occupant behaviour. J. Build. Perform. Simul. 9,
23. International Energy Agency. IEA-EBC Annex 66: Deﬁnition and Simulation
of Occupant Behavior in Buildings. https://annex66.org/ (2018).
24. International Energy Agency. IEA-EBC Annex 79: Occupant-Centric Building
Design and Operation. http://annex79.iea-ebc.org/ (2018).
NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-11685-w ARTICLE
NATURE COMMUNICATIONS | (2019) 10:3736 | https://doi.org/10.1038/s41467-019-11685-w | www.nature.com/naturecommunications 9
25. Page, J., Robinson, D., Morel, N. & Scartezzini, J.-L. A generalised stochastic
model for the simulation of occupant presence. Energy Build. 40,83–98
26. Richardson, I., Thomson, M., Inﬁeld, D. & Clifford, C. Domestic electricity
use: a high-resolution energy demand model. Energy Build. 42, 1878–1887
27. Chen, Y., Hong, T., & Luo, X. An agent-based stochastic occupancy simulator.
Building Simulation (vol. 11, 37–49). (Tsinghua University Press 2018).
28. Jiang, S. et al. review of urban computing for mobile phone traces: current
methods, challenges and opportunities. Proceedings of the 2nd ACM SIGKDD
international workshop on Urban Computing (p. 2). (ACM 2013).
29. Wheatman, B., Noriega, A. & Pentland, A. Electricity Demand and Population
Dynamics Prediction from Mobile Phone Metadata. International Conference
on Social Computing, Behavioral-Cultural Modeling and Prediction and
Behavior Representation in Modeling and Simulation (pp. 196–205). (Springer
30. Bogomolov, A. et al. Energy consumption prediction using people dynamics
derived from cellular network data. EPJ Data Sci. 5, 13 (2016).
31. Toole, J. L. et al. The path most traveled: travel demand estimation using big
data resources. Transp. Res. Part C Emerg. Technol. 58, 162–177 (2015).
32. Alexander, L., Jiang, S., Murga, M. & González, M. C. Origin–destination trips
by purpose and time of day inferred from mobile phone data. Transp. Res.
part c Emerg. Technol. 58, 240–250 (2015).
33. Jiang, S. et al. The timegeo modeling framework for urban mobility without
travel surveys. Proc. Natl Acad. Sci. USA 113, E5370–E5378 (2016).
34. Miranda, F. et al. Urban pulse: capturing the rhythm of cities. IEEE. Trans.
Vis. Comput. Graph. 23, 791–800 (2017).
35. Corna, A., Fontana, L., Nacci, A. A. & Sciuto, D. Occupancy detection via
iBeacon on Android devices for smart building management. Proceedings of
the 2015 Design, Automation & Test in Europe Conference & Exhibition
(pp. 629-632). (EDA Consortium 2015).
36. Akkaya, K., Guvenc, I., Aygun, R., Pala, N. & Kadri, A. IoT-based occupancy
monitoring techniques for energy-efﬁcient smart buildings. Wireless
Communications and Networking Conference Workshops (WCNCW), New
Orleans, LA, 2015, pp. 58-63. (IEEE 2015).
37. Chen, D., Barker, S., Subbaswamy, A., Irwin, D. & Shenoy, P. Non-intrusive
occupancy monitoring using smart meters. Proceedings of the 5th ACM
Workshop on Embedded Systems For Energy-Efﬁcient Buildings (pp. 1-8).
38. Kim, Y.-S., Heidarinejad, M., Dahlhausen, M. & Srebric, J. Building energy
model calibration with schedules derived from electricity use data. Appl.
Energ. 190, 997–1007 (2017).
39. Shen, W. & Newsham, G. Smart phone based occupancy detection in ofﬁce
buildings. 2016 IEEE 20th International Conference on Computer Supported
Cooperative Work in Design (CSCWD), Nanchang, 2016, pp. 632–636. (IEEE
40. Wu, C., Thai, J., Yadlowsky, S., Pozdnoukhov, A. & Bayen, A. Cellpath: Fusion
of cellular and trafﬁc sensor data for route ﬂow estimation via convex
optimization. Transp. Res. Part C: Emerg. Technol. 59, 111–128 (2015).
41. Çolak, S., Lima, A. & González, M. C. Understanding congested travel in
urban areas. Nat. Commun. 7, 10793 (2016).
42. US Department of Energy. Commercial Reference Buildings. https://energy.
43. Crawley, D. B. et al. Energyplus: creating a new-generation building energy
simulation program. Energy Build. 33, 319–331 (2001).
44. US-DOE Ofﬁce of Energy Efﬁciency and Renewable Energy. EnergyPlus V8.4
[software]. https://energyplus.net/weather (2018).
45. ASHRAE. Ansi/ashrae/ies standard 90.1-2016 energy standard for buildings
except low-rise residential buildings. Tech. Rep., ASHRAE. https://www.
46. US Department of Transportation Federal Highway Administration. 2009
National Household Travel Survey. http://nhts.ornl.gov/download.shtml (2009).
47. Massachusetts Department of Transportation. Massachusetts travel survey.
48. Fujita, M. & Thisse, J.-F. Economics of Agglomeration: Cities, Industrial
Location, and Regional Growth, 2nd Edition. (Cambridge, UK: Cambridge
University Press 2013)
49. Ward, J. Hierarchical grouping to optimize an objective function. J. Am. Stat.
Assoc. 58, 144–236 (1963).
50. Hong, T., Yan, D., D’Oca, S. & Chen, C.-f Ten questions concerning
occupant behavior in buildings: the big picture. Build. Environ. 114, 518–530
51. Yan, D. et al. Occupant behavior modeling for building performance
simulation: current state and future challenges. Energy Build. 107, 264–278
52. Ramaswami, A., Russell, A. G., Culligan, P. J., Sharma, K. R. & Kumar, E.
Meta-principles for developing smart, sustainable, and healthy cities. Science
352, 940–943 (2016).
53. Khosrowpour, A. et al. A review of occupant energy feedback research:
opportunities for methodological fusion at the intersection of
experimentation, analytics, surveys and simulation. Appl. Energ. 218, 304–316
54. Kim, Y.-S. & Srebric, J. Impact of occupancy rates on the building electricity
consumption in commercial buildings. Energy Build. 138, 591–600 (2017).
55. Song, C., Koren, T., Wang, P. & Barabási, A.-L. Modelling the scaling
properties of human mobility. Nat. Phys. 6, 818 (2010).
56. Santosh, P. Eppy 0.5.48 [software]. https://pypi.org/project/eppy/ (2018).
57. US Department of Energy. EnergyPlus weather data. https://energyplus.net/
This work was supported by grants from the Centre for Complex Engineering Systems at
MIT-KACST, the MIT Energy Initiative and Philips Lighting.
E.B., M.G., S.G. and J.K. conceived the research. E.B. and S.G. wrote the code for the
occupancy analysis. C.C.D. and C.R. constructed the urban energy model. E.B., C.C.D.
and C.R. developed the occupancy scenarios. E.B. performed the energy analysis. M.C.G.,
J.K. and C.R. supervised the research and provided guidance. E.B., C.C.D., C.R. and
M.C.G. wrote the paper. All authors approved the manuscript.
Supplementary Information accompanies this paper at https://doi.org/10.1038/s41467-
Competing interests: The authors declare no competing interests.
Reprints and permission information is available online at http://npg.nature.com/
Peer review information:Nature Communications thanks Tianzhen Hong, Jelena
Srebric and Deborah A Sunter for their contribution to the peer review of this work. Peer
reviewer reports are available.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional afﬁliations.
Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative
Commons license, and indicate if changes were made. The images or other third party
material in this article are included in the article’s Creative Commons license, unless
indicated otherwise in a credit line to the material. If material is not included in the
article’s Creative Commons license and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder. To view a copy of this license, visit http://creativecommons.org/
© The Author(s) 2019
ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-11685-w
10 NATURE COMMUNICATIONS | (2019) 10:3736 | https://doi.org/10.1038/s41467-019-11685-w | www.nature.com/naturecommunications
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at