PresentationPDF Available

The Role of Graph Databases in Geomatics

Authors:

Abstract

The metropolitan area of Greater Moncton is the fastest growing census metropolitan area in eastern Canada. We are working with CODIAC transit to promote efficient transportation services and reduce private car dependency. CODIAC Transpo currently operates 30 regular routes Monday to Saturday, some of which provide additional evening and Sunday services. Graph database provides us with the capabilities to deal with the big spatial data that is being streamed from the buses at very high velocity, volume, and variety. Graph data management and analytics approach have shown to be powerful in its ability to capture observational and topological information from the transit networks in a database. Data management of transit networks as a network using graph unlocks the deep knowledge that is embedded both within the observed entities and their connectivity.
The Role of Graph
Databases in Geomatics
Hung Cao, Ikechukwu Maduako, Emerson Cavalheri, Ryan Brideau, Monica Wachowicz
{hcao3, imaduako, e.cavalheri, Ryan.Brideau, monicaw}@unb.ca
People in Motion Lab, University of New Brunswick
Introduction
It is a database that stores data in a
graph.
It is a data structure that is capable of
representing any kind of data for storage
and accessibility.
What is a Graph Database?
Graph Database in Geomatics
Neo4J Spatial Stack
Cell Network Analysis
Neo4j OSM in uDig
OpenStreetMap Structure in Neo4J
Collect
data
Step 1:
Filter data into
the group of
data files
Step 2: Compute the Move
and the Stop status
of each data row
Step 3:
Do the
annotation for
each data row
Step 4: Compute street segments
and do annotation on each
street segment
Step 5:
Index Trips and
compute Arrival Time
and Departure Time
Step 6:
From GPS coordinates to nodes in a graph
Data was provided by Codiac Transit Moncton
There are 17 fields in each row of the data. E.g.:
TripID, RouteID, Longitude, Latitude, Time
Stamp
Each row is collected every 5 seconds.
From GPS coordinates to nodes in a graph
Outcomes
Time-Varying Graph Data Model
System Configuration & Dataset
Graph Database System : Neo4J 3.01
Development Language : Cypher & Python
Machine: 3GHz, 32GB Memory & 3TB Disk
Database composition size:
2 weeks of data / 30 buses routes
Approximately 1 million nodes
4.5 million directed weighted edges
Trip Connectivity
Longest and shortest path at peak hours
Degree of Centrality
Degree of Centrality by Bus Lines
Data Visualization
Conclusions
Intuitive for data representation
Reliable with ACID transactions
Fast processing using a custom disk-based, native storage engine
Scalable up to several billions of nodes/relationships/properties
No standard graph query language
People In Motion Lab
www.people-in-motion-lab.org
... 12: The dendrogram of the first observation week(13)(14)(15)(16)(17)(18)(19). ...
... 13: Clustering result of the first observation week(13)(14)(15)(16)(17)(18)(19). ...
Thesis
Full-text available
Despite many efforts on developing protocols, architectures, and physical infrastructures for the Internet of Things (IoT), previous research has failed to fully provide automated analytical capabilities for exploring IoT data streams in a timely way. Mobility and co-location, coupled with unprecedented volumes of data streams generated by geo-distributed IoT devices, create many data challenges for extracting meaningful insights. This research work aims at exploring an edge-fog-cloud continuum to develop automated analytical tasks for not only providing higher-level Intelligence from continuous IoT data streams but also generating long-term predictions from accumulated IoT data streams. Towards this end, a conceptual framework, called “Analytics Everywhere”, is proposed to integrate analytical capabilities according to their data life-cycles using different computational resources. Three main pillars of this framework are introduced: resource capability, analytical capability, and data life-cycle. First, resource capability consists of a network of distributed compute nodes that can handle automated analytical tasks either independently or in parallel, concurrently or in a distributed manner. Second, analytical capability orchestrates the execution of algorithms to perform streaming descriptive, diagnostic, and predictive analytics. Finally, data life-cycles are designed to manage both continuous and accumulated IoT data streams. The research outcomes from a smart parking and a smart transit scenario have confirmed that a single computational resource is not sufficient to support all analytical capabilities that are needed for IoT applications. Moreover, the implemented architecture relied on an edge-fog-cloud continuum and offered some empirical advantages: (1) on-demand and scalable storage; (2) seamlessly coordination of automated analytical tasks; (3) awareness of the geo-distribution and mobility of IoT devices; (4) latency-sensitive data life-cycles; and (5) resource contention mitigation.
ResearchGate has not been able to resolve any references for this publication.