Conference Paper

Supporting the analysis of urban data through NoSQL technologies

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In the last few years, the capability to both generate and collect data of public interest within the urban area has increased at an unprecedented rate, to such an extent that data rapidly scale towards big urban data. The abundance of information collected through ad-hoc sensors networks in the smart city context provides a remarkable opportunity to tackle interesting urban challenges and to add intelligences in the urban environment. However, for each data source and type, different spatial and temporal references are potentially used. Hence, the complexity of dealing with such an heterogeneity of data has significantly increased. This paper proposes a distributed business intelligence engine, named BI2CITY, able to efficiently manage the process of integrating and analysing a large volume of heterogeneous data generated by various sources in the smart city context. BI2CITY exploits a Big Data approach to support (i) data storage, (ii) spatio-temporal data aggregation, and (iii) different targeted data analyses, such as correlating urban data and forecasting the expected values of some interesting data (e.g., air pollution). Spatio-temporal data aggregation and analyses are performed on the fly using MapReduce based algorithms. Experimental results on real data collected in a major Italian city demonstrate the effectiveness of the proposed distributed system to perform interesting and efficient analysis.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Table 2 presents a set of key influential criteria in smart cities, which have been extracted from the literature. Table 2 Key criteria for smart city evaluation Reference Criteria [21], [28], [41], [61], [62], [63] Smart Transportation C1 [32], [61], [64], [65] Integrated Urban Systems C2 [26], [41], [56], [66], [67] Citizen Engagement C3 [2], [22], [41], [68] Urban Innovation and Competitiveness C4 [41], [61], [69], [70], [71] Environmental Sustainability C5 [37], [38], [43], [72], [73] Digitalization of Infrastructure C6 [7], [17], [69], [74] Quality of Life for Citizens C7 [31], [33], [61], [67] Access to Urban Services C8 [4], [27], [62], [75] Urban Monitoring Systems C9 [62], [64], [74], [76], [77] Data Aggregation from Various Sources C10 ...
... Table 2 presents a set of key influential criteria in smart cities, which have been extracted from the literature. Table 2 Key criteria for smart city evaluation Reference Criteria [21], [28], [41], [61], [62], [63] Smart Transportation C1 [32], [61], [64], [65] Integrated Urban Systems C2 [26], [41], [56], [66], [67] Citizen Engagement C3 [2], [22], [41], [68] Urban Innovation and Competitiveness C4 [41], [61], [69], [70], [71] Environmental Sustainability C5 [37], [38], [43], [72], [73] Digitalization of Infrastructure C6 [7], [17], [69], [74] Quality of Life for Citizens C7 [31], [33], [61], [67] Access to Urban Services C8 [4], [27], [62], [75] Urban Monitoring Systems C9 [62], [64], [74], [76], [77] Data Aggregation from Various Sources C10 ...
... [4], [7], [24], [51], [74] Urban Condition Forecasting C11 [17], [37], [38], [78] Resource Optimization C12 [36], [42], [72], [79] Investment Transparency C13 [27], [40], [41], [42] Renewable Resources C14 [10], [11], [28], [80] Operational Costs C15 ...
Article
Full-text available
With the expansion of smart cities, the use of business intelligence (BI) has emerged as a crucial tool for resource optimization, increasing efficiency, and improving the citizens' quality of life. BI enables companies to make better strategic decisions by analyzing vast amounts of urban data, helping them remain competitive in the dynamic smart city environment. This study utilizes content analysis and the Fermatean Fuzzy TOPSIS (FF-TOPSIS) method to rank the strategies based on business intelligence in the context of smart city. Initially, relevant criteria were identified through content analysis, and subsequently, five strategies were developed and ranked based on these criteria. The results revealed that the "Development of IOT-enabled smart networks (S2)" ranked highest, as it plays a significant role in optimizing resource management and enhancing urban service performance, thereby contributing greatly to the advancement of smart cities. "Process automation and the deployment of robotic systems (S5)" ranked second, as it enhances efficiency and reduces human errors. "Cloud platform integration for seamless access to data and services (S3) " also proved to be of considerable importance, ranking third, as it provides seamless access to data and services. " Artificial intelligence deployment for predictive analytics and process optimization (S4)" ranked fourth and was vital for predictive analytics and process optimization, while " Big data analytics for smart decision-making (S1)"-despite its importance-ranked fifth. Urban managers should prioritize the development of IOT networks to fully leverage their potential for resource management and efficiency gains. Following this, attention to process automation and AI integration can significantly enhance the quality of life for citizens and reduce urban costs.
... It contains implementations in the data mining areas such as algorithms to address the cluster analysis, classification, and to support recommendation systems. All the current implementations are based on Hadoop MapReduce and has been exploited to support different data warehousing applications [6,7]. MLlib [8], instead, is the Machine Learning library developed on Spark, and it is rapidly growing both in development and adoption (e.g., network traffic analysis [9], social networks [10]). ...
Article
Full-text available
The pervasive and increasing deployment of smart meters allows collecting a huge amount of fine-grained energy data in different urban scenarios. The analysis of such data is challenging and opening up a variety of interesting and new research issues across energy and computer science research areas. The key role of computer scientists is providing energy researchers and practitioners with cutting-edge and scalable analytics engines to effectively support their daily research activities, hence fostering and leveraging data-driven approaches. This paper presents SPEC, a scalable and distributed engine to predict building-specific power consumption. SPEC addresses the full analytic stack and exploits a data stream approach over sliding time windows to train a prediction model tailored to each building. The model allows us to predict the upcoming power consumption at a time instant in the near future. SPEC integrates different machine learning approaches, specifically ridge regression, artificial neural networks, and random forest regression, to predict fine-grained values of power consumption, and a classification model, the random forest classifier, to forecast a coarse consumption level. SPEC exploits state-of-the-art distributed computing frameworks to address the big data challenges in harvesting energy data: the current implementation runs on Apache Spark, the most widespread high-performance data-processing platform, and can natively scale to huge datasets. As a case study, SPEC has been tested on real data of an heating distribution network and power consumption data collected in a major Italian city. Experimental results demonstrate the effectiveness of SPEC to forecast both fine-grained values and coarse levels of power consumption of buildings.
... Several datasets can be also available and utilized within the context of a SC: historical data from surveys and interviews; statistical data with regard to local demographics and activities; processed datasets from service providers (e.g., city utility and telecommunications providers, energy suppliers) and information systems (e.g., ITS, GIS); and official reports (e.g., from local and national authorities, from the Organization of Economic Cooperation and Development (OECD), European organizations) are such datasets [119][120][121][122][123]. Some corresponding examples come from Llacuna and Ibnez [124], who analyzed data from questionnaires for urban planning processes; Li et al. [125] examined the fiber-optic network in the city of Hankou with GIS data and tools; Calegari et al. [126] used several local and regional data sources in the city of Milan, Italy to recognize the emerging affinities; Balasubramani et al. [108] used datasets in the city of Chicago to help city administrators in decision making; while several scholars present cases where data from heterogeneous sources were combined for interdisciplinary studies and for smart applications' development [127][128][129][130][131][132][133][134][135][136][137][138][139][140]. ...
Article
Full-text available
Smart cities (SCs) are becoming highly sophisticated ecosystems at which innovative solutions and smart services are being deployed. These ecosystems consider SCs as data production and sharing engines, setting new challenges for building effective SC architectures and novel services. The aim of this article is to “connect the pieces” among Data Science and SC domains, with a systematic literature review which identifies the core topics, services, and methods applied in SC data monitoring. The survey focuses on data harvesting and data mining processes over repeated SC data cycles. A survey protocol is followed to reach both quantitative and semantically important entities. The review results generate useful taxonomies for data scientists in the SC context, which offers clear guidelines for corresponding future works. In particular, a taxonomy is proposed for each of the main SC data entities, namely, the “D Taxonomy” for the data production, the “M Taxonomy” for data analytics methods, and the “S Taxonomy” for smart services. Each of these taxonomies clearly places entities in a classification which is beneficial for multiple stakeholders and for multiple domains in urban smartness targeting. Such indicative scenarios are outlined and conclusions are quite promising for systemizing.
Conference Paper
Full-text available
Energy efficiency and energy consumption awareness are a growing priority for many countries. Among the large variety of methods proposed by energy scientists and professionals to evaluate building energy consumption, a widely adopted approach is the energy signature. Since the energy data easily scale towards very large datasets, the problem of characterizing energy efficiency through the energy signature from these huge data collections becomes challenging. This paper presents a distributed system, named ESA, for the collection, storage, and analysis of a large amount of energy-related data to keep continuously informed users on their energy consumption and building performance. ESA exploits a Big Data approach to perform a scalable and distributed computation of the building energy signature, which is exploited to forecast the expected power consumption for given contextual conditions in a specific time period. E S A characterizes monitored buildings through direct indicators designed to (i) evaluate the efficient use of the heating system by comparing latest observations with past energy demand in the same conditions, (ii) rank the overall building performance with respect to nearby and similarly characterized buildings. Experimental results on real energy consumption data demonstrate the effectiveness and the efficiency of the proposed distributed system to provide actionable knowledge at user fingertips for actors interacting with ESA.
Conference Paper
Full-text available
Energy efficiency by means of reduction in wasteful energy consumption is a growing policy priority for many countries. Innovative systems should be designed to continuously monitor a smart city environment and provide all stakeholders the tools to improve energy efficiency. This paper presents the EDEN platform, designed to collect and analyze thermal energy consumption of residential and public building heating systems. EDEN is being deployed in a major Italian city and collects energy consumption measurements through an extensive smart metering grid involving thousands of buildings. EDEN also collects and analyzes indoor climate conditions, and user feedbacks, such as their thermal comfort perception, by means of an ad-hoc social network. Collected data are further enriched with temporal and spatial information at different abstraction levels and meteorological data available as an open source data set. Several technical Key Performance Indicators (KPIs) have been defined to inform users on their building thermal energy consumption, while user-friendly KPIs present energy savings or over-consumptions in an informative fashion.
Conference Paper
Full-text available
Inferring the type of activities in neighborhoods of urban centers may be helpful in a number of contexts including urban planning, content delivery and activity recommendations for mobile web users or may even yield to a deeper understanding of the geographical evolution of social life in the city . During the past few years, the analysis of mobile phone usage patterns, or of social media with longitudinal attributes, have aided the automatic characterization of the dynamics of the urban environment. In this work, we combine a dataset sourced from a telecommunication provider in Spain with a database of millions of geotagged venues from Foursquare and we formulate the problem of urban activity inference in a supervised learning framework. In particular, we exploit user communication patterns observed at the base station level in order to predict the activity of Foursquare users who checkin-in at nearby venues. First, we mine a set of machine learning features that allow us to encode the input telecommunication signal of a tower. Subsequently, we evaluate a diverse set of supervised learning algorithms using labels extracted from Foursquare place categories and we consider two application scenarios. Initially, we assess how hard it is to predict specific urban activity of an area, showing that Nightlife and Entertainment spots are those easier to infer, whereas College and Shopping areas are those featuring the lowest accuracy rates. Then, considering a candidate set of activity types in a geographic area, we aim to elect the most prominent one. We demonstrate how the difficulty of the problem increases with the number of classes incorporated in the prediction task, yet the classifiers achieve a considerably better performance compared to a random guess even when the set of candidate classes increases.
Conference Paper
In recent years, the European Union (EU) has developed a shared European vision of sustainable urban development. Towards this direction, holistic solutions and advanced energy management services are necessary, addressed to any local authority that has as purpose to implement sustainable energy action plans. In this context, the aim of this paper is to present an advanced Information and Communication Technologies (ICT) platform for real-time monitoring and infrastructure efficiency at the city level, namely a Web Portal addressed to city authorities. The Web Portal will enable the city administration to collect data for energy management, real-time monitoring and billing purposes from city's infrastructure, sensors, meters and other energy sources and react to critical incidents or systems failure in an urban (city) environment. All the collected data will be handled by the "green" tools of the ICT platform, focused on the city's municipal buildings, pillars/poles and electric vehicle stations.
Conference Paper
Environmental monitoring plays an important role in the identification of abnormalities in the environment's characteristics. Abnormalities are related to negative effects that, consequently, heavily affect human lives. A number of sensors could be placed in a specific area and undertake the responsibility of monitoring environment's characteristics for specific phenomena. Sensors report back their measurements to a central system that is capable of situational reasoning. Accordingly, the system, through decision making, responds to any event related to the observed phenomena. In this paper, we propose a mechanism that builds on top of the sensors measurements and derives the appropriate decisions for the immediate identification of events. The proposed system adopts data fusion and prediction (time series regression) statistical learning methods for efficiently aggregating sensors measurements. We also adopt Fuzzy Logic for handling the uncertainty on the decision making on the derived alerts. We perform a set of simulations over real data and report on the advantages and disadvantages of the proposed system.
Article
Cities are complex environments in which digital technologies are more and more pervasive; this digitization of the urban space has led to a rich ecosystem of data producers and data consumers. Moreover, heterogeneous sources differ in terms of data complexity, spatio-temporal resolution and curation/maintenance costs. Do those diverse urban sources reflect the same picture of the city? Do distinct perspectives share some commonalities? In this paper we present our data analytics/empirical experiments on a set of urban sources related to the city of Milano; our investigation is aimed at discovering “affinities” between datasets by means of different quantitative and qualitative correlation analyses. We also explore the influence of spatial resolution and data complexity on the dependence strength between heterogeneous urban sources, to pave the way to a meaningful information fusion.
Conference Paper
Big Data is conceived as the powerful tool to exploit all the potential of the Internet of Things and the Smart Cities. A new dimension of understanding about the human behaviours is expected to be reached through all the gathered data in the emerging smart environment. This work analyses the data from the European Project SmartSantander. This work has correlated the traffic behaviour with respect to the temperature in the Santander City. This has been presented as the evolution of both flows present a similar behaviour. Specifically, they present a fine grain correlation of over the 57%. Finally, it has been also presented as the traffic distribution, aggregated by temperature bins, follows up a Poisson distribution model. Thereby, allowing interpolate and predict complex behaviours based on simple measures such as the temperature.
Article
Big data techniques are conceived as the powerful tool to exploit all the potential of the Internet of Things and the smart cities. A new dimension of understanding about the human behaviours is expected to be reached through all the gathered data in the emerging smart environment. The described potential, so-called Human Dynamics, pursues to describe in real-time the human behaviours and activities. This work presents our experiences for big data analytics in smart cities, in terms of sensors data management, data fusion and knowledge discovery from the data. The data used is from the European Project SmartSantander, where the traffic behaviour has been correlated with respect to the temperature in the Santander City. The evolution of both flows present a similar behaviour, in detail, a fine grain correlation is discovered. On the one hand, the traffic distribution, aggregated by temperature bins, follows up a Poisson distribution model. The Poisson modelling allows to interpolate and predict complex behaviours based on simple measures such as the temperature. At the same time, on the other hand, the isolated traffic density distribution, without taking into account the temperature-based aggregation has been analysed. The traffic distribution has presented a burst behaviour, which presents a closer model to the human dynamics. Therefore, this work presents as the smart cities data can be modelled as Poisson or Human Dynamics (burst models). Finally, reference data analytics process, data sets and models are offered for the Open Source Data analytics platform Konstanz Information Miner (KNIME). Copyright © 2014 John Wiley & Sons, Ltd.
Article
This paper illustrates the impacts of spatial data aggregation on the analysis of urban development. Spatial econometric methods are used to control for spatial autocorrelation in the data and existing weighting methods are used to overcome aggregation dependencies that are due to differences in sizes of areal units. The analyses show that shape dependencies can be partially removed by the used weighting methods, and that even regularly latticed areal units need such weighting in practice. Aggregating to coarser resolutions does not affect the order of magnitude of coefficients estimated for variables that are aggregated by averaging, if the aggregation process maintains sufficient variance within variables. We argue that small-sized areal units approximating the true characteristics of the studied process are to be preferred in urban development analyses, because such micro-data allows the exploration of highly local factors alongside higher scale linkages. We demonstrate that spatial autocorrelation and scale dependencies interact and that spatial econometric methods can help explain variance in analyses of small-grained land-use data.
Article
Today in the world of cloud and grid computing integration of data from heterogeneous databases is inevitable. Virtual Database Technology (VDB) is one of the effective solutions for integration of data from heterogeneous sources. This will become complex when size of the database is very large. MapReduce is a new framework specifically designed for processing huge datasets on distributed sources. Apache’s Hadoop is an implementation of MapReduce. Currently Hadoop has been applied successfully for file based datasets. This paper proposes to utilize the parallel and distributed processing capability of Hadoop MapReduce for handling heterogeneous query execution on large datasets. So Virtual Database Engine built on top of this will result in effective high performance distributed data integration.
Article
As the volume of data increasing sharply and the relationship among different data sources becoming intricately, how to integrate mass data sources and how to find latent information from the integrated data is a matter of urgency. At present, industry tends to adopt distributed computing model to solve the integration of massive data. Aiming at getting the valuable and in-depth information, visualization is a critical step in data analysis and data mining. We design a tool called MRData for heterogeneous data integration which has two features: 1) parallel data processing based on Hadoop which is a distributed platform; 2) visual analysis. And at last, experiments verify the efficiency of MRData.
Article
Consider a population in which sexual selection and natural selection may or may not be taking place. Assume only that the deviations from the mean in the case of any organ of any generation follow exactly or closely the normal law of frequency, then the following expressions may be shown to give the law of inheritance of the population.
Urban Computing: Concepts, Methodologies, and Applications ACM Trans. Intell. Syst. Technol.
  • Yu Zheng
  • Licia Capra
  • Ouri Wolfson
  • Hai Yang
Air qualityin Piemonte
  • Sistema Piemonte
Weather Underground web service
  • Weather Underground