ThesisPDF Available

Analisi di dati geospaziali per applicazioni di Urban Informatics: il caso dei Google Place nella città di Milano

Authors:

Abstract and Figures

La città digitale Lo sviluppo e la diffusione delle tecnologie hanno portato alla digitalizzazione della vita di tutti i giorni. Ogni nostra azione, come utilizzare i mezzi pubblici, incontrare un conoscente o fare dello sport, è condizionata dai dati digitali e ne produce di nuovi. In altre parole la nostra quotidianità si fonde con la tecnologia, creando una rappresentazione digitale di noi e delle nostre interazioni. [35] Si parla ormai di una Città Digitale, rappresentazione digitale della città fisica ovvero dei suoi spazi, dei suoi utilizzatori e delle sue dinamiche sotto forma di flussi di informazioni digitali sempre più eterogenei fra loro ed in quantità sempre più elevate. Oggi più che mai la realtà sociale, economica e materiale della città dipendono in modo inevitabile da flussi di informazioni. Dare un senso a questa massa di dati eterogenei, che potrebbero essere utilizzati ben al di là delle singole funzioni, spesso di natura commerciale, per le quali sono stati prodotti, è una delle sfide del futuro. Interpretare la città come realtà fisica, economica e sociale attraverso la sua rappresentazione digitale vuol dire introdurre nuovi metodi di analisi. Esplorare nuove forme di acquisizione di dati ed introdurre nuove tecnologie e metodologie di analisi a supporto del processo di produzione della conoscenza della città è l’obiettivo di questo lavoro. L’analisi di dati geospaziali L’elaborato descrive un caso di studio relativo all’analisi di dati geospaziali per applicazioni di Urban Informatics sul territorio di Milano, quest’ultima realizzata attraverso l’acquisizione di Google Place e la successiva caratterizzazione del territorio utilizzando tecniche di clustering spaziale ed altri metodi di analisi. L’elaborato si sviluppa in cinque capitoli divisi in tre parti: una prima parte di stato dell’arte, una seconda relativa alla descrizione del progetto ed infine ultima parte di discussione dei risultati, conclusioni e sviluppi futuri. Il primo capitolo affronta lo stato dell’arte della Urban Informatics, dei nuovi metodi di analisi delle dinamiche urbane e le nuove informazioni spaziali. Mentre il secondo capitolo affronta in modo dettagliato lo stato dell’arte dei metodi di clustering e più nello specifico del clustering spaziale e gerarchico sui quali è basato l’approccio sviluppato in questo lavoro. Mentre con i capitoli tre e quattro descrivono nel dettaglio le due fasi principali che compongono l’approccio di analisi geospaziale. Il capitolo tre, dopo un’introduzione alle caratteristiche dei dati offerti da Google Place, prosegue descrivendo in dettaglio l’acquisizione dei dati attraverso un metodo adattivo di raccolta dei dati. Mentre il quarto capitolo descrive inizialmente l’algoritmo di clustering gerarchico-iterativo basato sul DBSCAN e successivamente affronta un’analisi preliminare dei dati. Infine, l’elaborato si conclude con i capitoli cinque e sei che realizzano rispettivamente l’analisi delle classificazioni ottenute dal processo di clustering e la valutazione del significato di tali risultati. Nello specifico il quinto capitolo effettua un’analisi di natura esplorativa sui dati ottenuti, attraverso una combinazione di metodi di analisi e visualizzazione di dati vettoriali e geospaziali. Mentre il sesto capitolo è composto dalle conclusioni e i possibili sviluppi futuri emersi durante la realizzazione di questo lavoro.
Content may be subject to copyright.
A preview of the PDF is not available
... This paper is derived from the master thesis "Geospatial data analysis for Urban informatics applications: the case of the Google Place of the City of Milan" [1]. ...
... To see more datils read a Master Thesis: "Geospatial data analysis for Urban informatics applications: the case of the Google Place of the City of Milan" [1]. ...
Full-text available
Preprint
This work is developed over the question ”How to automatically create a good clusteringon spatial dataset with hight different local densities?” opened by previus work of Berzi. To answer the main question, this work describe a approach of recursive clustering pro-cess based on a technique of finding ”vague-solution”, where the output is an hierarchicalclustering of initial dataset. In particularly the the approach is developed and tested on DBSCAN algorithm with large dataset gathered by Google Place in Metropolitan Areaof Milan. The core solutions developed in this algorithm are condensed in the capacity of gener-ation a Hierarchical Clustering with a recursive select the best solutions in according tothe our goals, previously dfined by some sets of rules. The algorithm described here, and developed in my Master Thesis, rosolve two problem: - When we use an algorithm of clustering that can create a set of differents clustering,but equally valid and don’t know exctly what we must have as good solution, we areled to ask ourselves: ”What are the better?” This obviously depends by our goals, sonow the question is: ”What are our goals?” - The second problem is condesned in the sentence ”Not all clusters can be found inone-shot clustering process, more often we must reapply the process to some partof datataset”, so there we have a second question that this paper answering: ”Howto create clustering of data with ad-hoc processing for each different part of inputdataset?” These questions are resolved by the approaches namend in this work as: Space ofSolutions, Vague-Solution, Vague-Solution finding Method and finaly Recursive Clustering.All of these approach was drafetd and testes algoruthm in mine Master Thesis titled”Geospatial data analysis for Urban informatics applications: the case of the Google Placeof the City of Milan”
... In Master Thesis [13] these POIs was used as input for Recursive Data Clustering Algorithm [14] specifically developed to work with geolocated points. One question that remain open in Master Thesis is: ...
Full-text available
Preprint
This agent was developed over the question "how to crawl all google place of an urban area?". To solve this i developed a data crawler for Google Place as and adaptive agent. This was used to crawling data in the Master Thesis "Geospatial data analysis for Urban informatics applications: the case of the Google Place of the City of Milan". In this context the urban area is a real envrioment while the Google Place API is a digital representaion of this, both have some limitations and rules to access it. In this case our envrioment is Google Place API while the agent goal is capturing all Places of an area with a minimum input. This agent in completely autonomy, whit a minimum input, capture all Place by center position to a maximum diameter, both specified by user. This is realized over a spiral movement inspired on my Roomba spiral-pattern. In this case the cells where the agent moves are hexagons, why this are best approximation of circle and has the property to have same distance by neighboring hexagons. The agent work on three steps: Planning of track with cells of crawling; Process of crawling over cells; Checking of results and if necessary replanning of track of crawling in more fine cells. The minimun user-input are composed by: center of crawling, default size of cells and finally the number of spirals of crawling. The algorithm choose when use more smaller cells and where, and, if there are some problem, where re-planning cells. So the core of algorithm is the adaptation on some enviromental details of his planned track and granularity tesselation previously planned.
... Questo elaboratoè estratto dallo stato dell'arte della tesi magistrale "Analisi dei dati Geospaziali per applicazioni di Urban informatics: il caso dei Google Place della Città di Milano" [24]. Tale elaborato ha lo scopo di esplorare il campo dell'Urban Informatics attavero la ricerca e lo studio di contributi accademici che possono offrire un quadro generale del suddetto campo di ricerca. ...
Full-text available
Preprint
Questo elaborato è estratto dallo stato dell'arte della tesi magistrale "Analisi dei dati Geospaziali per applicazioni di Urban informatics: il caso dei Google Place della Città di Milano". Tale elaborato ha lo scopo di esplorare il campo dell'Urban Informatics attavero la ricerca e lo studio di contributi accademici che possono offrire un quadro generale del suddetto campo di ricerca. L'elaborato si sviluppa in tre sezioni: una prima sezione mirata alla raccolta sistematica di definizioni del campo di ricerca Urban informatics e di altre definizioni correlate al tema dell'interazione tra tessuto urbano, persone e tecnologie; una seconda sezione dedicata all'analisi dei possibili dati digitali a supporto dell'Urban Informatics; ed infine un'ultima sezione dedicata all'analisi dei modi di suddividere il territorio urbano, con particolare attenzione al caso italiano ed in particolare di della città di Milano.
Full-text available
Article
In this paper, we propose a computational approach to Jane Jacobs' concept of diversity and vitality, analyzing new forms of spatial data to obtain quantitative measurements of urban qualities frequently employed to evaluate places. We use smart card data collected from public transport to calculate a diversity value for each research unit. Diversity is composed of three dynamic attributes: intensity, variability, and consistency, each measuring different temporal variations of mobility flows. We then apply a regression model to establish the relationship between diversity and vitality, using Twitter data as a proxy for human activity in urban space. Final results (also validated using data sourced from OpenStreetMap) unveil which are the most vibrant areas in London.
Full-text available
Article
Urbanization represents a huge opportunity for computer applications enabling cities to be managed more efficiently while, at the same time, improving the life quality of their citizens. One of the potential application of this kind of systems is a bottom-up evaluation of the level of walkability of the city (namely the level of usefulness, comfort, safety and attractiveness of an urban area for walking). This is based on the usage of data from social media for the computation of structured indicators describing the actual usage of areas by pedestrians. This paper will present an experimentation of analysis of data about the city of Milano (Italy) acquired from Flickr and Foursquare. The over 500 thousand points, which represent the photos and the POIs collected from the above mentioned social meda, were clustered through an iterative approach based on the DBSCAN algorithm, in order to achieve homogeneous areas defined by the actual activity of inhabitants and tourists rather than by a top down administrative procedure and to supply useful indications on the level of walkability of the city of Milan.
Full-text available
Chapter
Big Data is the term being used to describe a wide spectrum of observational or " naturally-occurring " data generated through transactional, operational, planning and social activities that are not specifically designed for research. Due to the structure and access conditions associated with such data, their use for research and analysis becomes significantly complicated. New sources of Big Data are rapidly emerging as a result of technological, institutional, social, and business innovations. The objective of this background paper is to describe emerging sources of Big Data, their use in urban research, and the challenges that arise with their use. To a certain extent, Big Data in the urban context has become narrowly associated with sensor (e.g., Internet of Things) or socially generated (e.g., social media or citizen science) data. However, there are many other sources of observational data that are meaningful to different groups of urban researchers and user communities. Examples include privately held transactions data, confidential administrative micro-data, data from arts and humanities collections, and hybrid data consisting of synthetic or linked data. The emerging area of Urban Informatics focuses on the exploration and understanding of urban systems by leveraging novel sources of data. The major potential of Urban Informatics research and applications is in four areas: (1) improved strategies for dynamic urban resource management, (2) theoretical insights and knowledge discovery of urban patterns and processes, (3) strategies for urban engagement and civic participation, and (4) innovations in urban management, and planning and policy analysis. Urban Informatics utilizes Big Data in innovative ways by retrofitting or repurposing existing urban models and simulations that are underpinned by a wide range of theoretical traditions, as well as through data-driven modeling approaches that are largely theory agnostic, although these divergent research approaches are starting to converge in some ways. The paper surveys the kinds of urban problems being considered by going from a data-poor environment to a data-rich world and the ways in which such enquiries have the potential to enhance our understanding, not only of urban systems and processes overall, but also contextual peculiarities and local experiences. The paper concludes by commenting on challenges that are likely to arise in varying degrees when using Big Data for Urban Informatics: technological, methodological, theoretical/epistemological, and the emerging political economy of Big Data.
Full-text available
Conference Paper
Clustering is an interdisciplinary-studied subject of statistical data analysis. In this study, among various types of clustering algorithms, the algorithms derived from Density Based Spatial Clustering of Applications with Noise (DBSCAN) are investigated. Although DBSCAN is the well-known density-based algorithms it has some bottlenecks. So, enhanced versions of DBSCAN are developed to provide some solutions and to ameliorate the algorithm. In this study, we provide a compact source of DBSCAN-based algorithms for the mentioned challenges.
Article
Purpose The purpose of this paper is to propose a systematic review of the contributions present in the literature about walkability. This is aimed at defining a set of criteria and methodologies for the assessment of the level of pedestrian friendliness of cities characterised by mass tourism. Design/methodology/approach The paper is based on a theoretical review about walkability and on the study of mass tourism phenomenon in Venice in relation to the ongoing de-urbanisation process. The analysis of open data sets provides by the Public Institutions of Venice and the execution of on-site observations allowed a qualitative assessment on the level of walkability of the historical centre of Venice. Findings The results of the proposed study highlighted that the level of walkability in Venice is profoundly affected by the lack of base services, the presence of massive tourism flows and the scarcity of road signage. Practical implications All the elements highlighted in this work could lead to proposing several design solutions and policies to manage the tourism phenomenon in Venice in a more effective and sustainable manner. Social implications The assessment and enhancement of the level of walkability of urban areas represent a useful tool to manage the tourist flows and to reduce the conflicts between inhabitants and visitors in tourism cities. Originality/value The current work represents a valuable contribution towards the systematisation of the theoretical and methodological framework towards a tourism-based walkability assessment.
Article
Neighbourhoods have been described as \the building blocks of public services society". Their subjective nature, however, and the resulting difficulties in collecting data, means that in many countries there are no officially defined neighbourhoods either in terms of names or boundaries. This has implications not only for policy but also business and social decisions as a whole. With the absence of neighbourhood boundaries many studies resort to using standard administrative units as proxies. Such administrative geographies, however, often have a poor fit with those perceived by residents. Our approach detects these important social boundaries by automatically mining the Web en masse for passively declared neighbourhood data within postal addresses. Focusing on the United Kingdom (UK), this research demonstrates the feasibility of automated extraction of urban neighbourhood names and their subsequent mapping as vague entities. Importantly, and unlike previous work, our process does not require any neighbourhood names to be established a priori.
Book
This is the third edition of the premier professional reference on the subject of data mining, expanding and updating the previous market leading edition. This was the first (and is still the best and most popular) of its kind. Combines sound theory with truly practical applications to prepare students for real-world challenges in data mining. Like the first and second editions, Data Mining: Concepts and Techniques, 3rd Edition equips professionals with a sound understanding of data mining principles and teaches proven methods for knowledge discovery in large corporate databases. The first and second editions also established itself as the market leader for courses in data mining, data analytics, and knowledge discovery. Revisions incorporate input from instructors, changes in the field, and new and important topics such as data warehouse and data cube technology, mining stream data, mining social networks, and mining spatial, multimedia and other complex data. This book begins with a conceptual introduction followed by a comprehensive and state-of-the-art coverage of concepts and techniques. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. Wherever possible, the authors raise and answer questions of utility, feasibility, optimization, and scalability. relational data. -- A comprehensive, practical look at the concepts and techniques you need to get the most out of real business data. -- Updates that incorporate input from readers, changes in the field, and more material on statistics and machine learning, -- Scores of algorithms and implementation examples, all in easily understood pseudo-code and suitable for use in real-world, large-scale data mining projects. -- Complete classroom support for instructors as well as bonus content available at the companion website. A comprehensive and practical look at the concepts and techniques you need in the area of data mining and knowledge discovery.
Article
Place names are often used to describe and to enquire about geographical information. It is common for users to employ vernacular names that have vague spatial extent and which do not correspond to the official and administrative place name terminology recorded within typical gazetteers. There is a need therefore to enrich gazetteers with knowledge of such vague places and hence improve the quality of place name‐based information retrieval. Here we describe a method for modelling vague places using knowledge harvested from Web pages. It is found that vague place names are frequently accompanied in text by the names of more precise co‐located places that lie within the extent of the target vague place. Density surface modelling of the frequency of co‐occurrence of such names provides an effective method of representing the inherent uncertainty of the extent of the vague place while also enabling approximate crisp boundaries to be derived from contours if required. The method is evaluated using both precise and vague places. The use of the resulting approximate boundaries is demonstrated using an experimental geographical search engine.