
Mohamed Mokbel- University of Minnesota
Mohamed Mokbel
- University of Minnesota
About
306
Publications
60,094
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
12,351
Citations
Current institution
Publications
Publications (306)
Though data cleaning systems have earned great success and wide spread in both academia and industry, they fall short when trying to clean spatial data. The main reason is that state-of-the-art data cleaning systems mainly rely on functional dependency rules where there is sufficient co-occurrence of value pairs to learn that a certain value of an...
Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traff...
The availability of trajectory data combined with various real life practical applications have sparked the interest of the research community to design a plethora of algorithms for various trajectory analysis techniques. However, there is an apparent lack of full-fledged systems that provide the infrastructure support for trajectory analysis techn...
Numerous important applications rely on detailed trajectory data. Yet, unfortunately, trajectory datasets are typically sparse with large spatial and temporal gaps between each two points, which is a major hurdle for their accuracy. This paper presents Kamel; a scalable trajectory imputation system that inserts additional realistic trajectory point...
GPS-enabled devices, including vehicles, smartphones, wearable and tracking devices, as well as various check-in and social network data are continuously producing tremendous amounts of trajectory data, which are used consistently in many applications such as urban planning and map inference. Existing techniques for trajectory data imputation rely...
Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traff...
The ability to collect large numbers of trajectory data through
GPS-enabled devices have enabled a myriad of very important
applications that are widely used on a daily basis. This includes
urban computing, transportation, and map APIs for routing and
navigation. Unfortunately, a major hinder for all these applications
is the accuracy of collected...
Maps services are ubiquitous in widely used applications including navigation systems, ride sharing, and items/food delivery. Though there are plenty of efforts to support such services through designing more efficient algorithms, we believe that efficiency is no longer a bottleneck to these services. Instead, it is the accuracy of the underlying r...
This demo presents QARTA; an open-source full-fledged system for highly accurate and scalable map services. QARTA employs machine learning techniques to: (a) construct its own highly accurate map in terms of both map topology and edge weights, and (b) calibrate its query answers based on contextual information, including transportation modality, un...
The systems community in both academia and industry has tremendous success in building widely used general purpose systems for various types of data and applications. Examples include database systems, big data systems, data streaming systems, and machine learning systems. The vast majority of these systems are ill equipped in terms of supporting s...
As pandemic wide spread results in locking down vital facilities, digital contact tracing is deemed as a key for re-opening. However, current efforts in digital contact tracing, running as mobile apps on users' smartphones, fall short in being effective and present two major weaknesses related to accessibility and apparent privacy concern augmentat...
Due to its significant economic and environmental impact, sharing the ride among a number of drivers (i.e., car pooling) has recently gained significant interest from industry and academia. Hence, a number of ride sharing services have appeared along with various algorithms on how to match a rider request to a driver who can provide the ride sharin...
As pandemic wide spread results in locking down vital facilities, digital contact tracing is deemed as a key for re-opening. However, current efforts in digital contact tracing, running as mobile apps on users' smartphones, fall short in being effective. This paper lays out the vision and guidelines for the next era of digital contact tracing, wher...
Travel time estimation is an important component in modern transportation applications. The state of the art techniques for travel time estimation use GPS traces to learn the weights of a road network, often modeled as a directed graph, then apply Dijkstra-like algorithms to find shortest paths. Travel time is then computed as the sum of edge weigh...
This paper presents GeoTrend+; a system approach to support scalable local trend discovery on recent microblogs, e.g., tweets, comments, online reviews, and check-ins, that come in real time. GeoTrend+ discovers top-k trending keywords in arbitrary spatial regions from recent microblogs that continuously arrive with high rates and a significant por...
Microblogs data is the microlength user-generated data that is posted on the web, e.g., tweets, online reviews, comments on news and social media. It has gained considerable attention in recent years due to its widespread popularity, rich content, and value in several societal applications. Nowadays, microblogs applications span a wide spectrum of...
The proliferation in amounts of generated data has propelled the rise of scalable machine learning solutions to efficiently analyze and extract useful insights from such data. Meanwhile, spatial data has become ubiquitous, e.g., GPS data, with increasingly sheer sizes in recent years. The applications of big spatial data span a wide spectrum of int...
Autologistic regression is one of the most popular statistical tools to predict spatial phenomena in several applications, including epidemic diseases detection, species occurrence prediction, earth observation, and business management. In general, autologistic regression divides the space into a two-dimensional grid, where the prediction is perfor...
In this paper, we present two novel decaying operators for Telco Big Data (TBD), coined TBD-DP and CTBD-DP that are founded on the notion of Data Postdiction. Unlike data prediction, which aims to make a statement about the future value of some tuple, our formulated data postdiction term, aims to make a statement about the past value of some tuple,...
Event detection applications have gained significant attention with the rise of user-generated spatio-temporal data over the past decade. However, building event detection applications still encounter high cost and effort due to lack of support in existing data management systems. This paper envisions a holistic system approach to support an effici...
Hadoop, employing the MapReduce programming paradigm, has been widely accepted as the standard framework for analyzing big data in distributed environments. Unfortunately, this rich framework has not been exploited for processing large-scale computational geometry operations. This paper introduces CG_Hadoop; a suite of scalable and efficient MapRed...
The current explosion in spatial data raises the need for efficient spatial analysis tools to extract useful information from such data. However, existing tools are neither generic nor scalable when dealing with big spatial data. This demo presents Flash; a framework for generic and scalable spatial data analysis, with a special focus on spatial pr...
With the increase in amount of remote sensing data, there have been efforts to efficiently process it to help ecologists and geographers answer queries. However, they often need to process this data in combination with vector data, for example, city boundaries. Existing efforts require one dataset to be converted to the other representation, which...
The proliferation in amounts of generated data has propelled the rise of scalable machine learning solutions to efficiently analyze and extract useful insights from such data. Meanwhile, spatial data has become ubiquitous, e.g., GPS data, with increasingly sheer sizes in recent years. The applications of big spatial data span a wide spectrum of int...
The proliferation in amounts of generated data has propelled the rise of scalable machine learning solutions to efficiently analyze and extract useful insights from such data. Meanwhile, spatial data has become ubiquitous, e.g., GPS data, with increasingly sheer sizes in recent years. The applications of big spatial data span a wide spectrum of int...
Predicting the presence or absence of spatial phenomena has been of great interest to scientists pursuing research in several applications including epidemic diseases detection, species occurrence prediction and earth observation. In this operation, a geographical space is divided by a two-dimensional grid, where the prediction (i.e, either 0 or 1)...
Geotagged data (e.g. images or news items) have empowered various important applications, e.g., search engines and news agencies. However, the lack of available geotagged data significantly reduces the impact of such applications. Meanwhile, existing geotagging approaches rely on the existence of prior knowledge, e.g., accurate training dataset for...
Understanding link travel times (LTT) has received significant attention in transportation and spatial computing literature but they often remain behind closed doors, primarily because the data used for capturing them is considered confidential. Consequently, free and open maps such as OpenStreetMap (OSM) or TIGER, while being remarkably accurate i...
This paper presents ST-Hadoop; the first full-fledged open-source MapReduce framework with a native support for spatio-temporal data. ST-Hadoop is a comprehensive extension to Hadoop and SpatialHadoop that injects spatio-temporal data awareness inside each of their layers, mainly, language, indexing, and operations layers. In the language layer, ST...
Arable land quality (ALQ) data are a foundational resource for national food security. With the rapid development of spatial information technologies, the annual acquisition and update of ALQ data covering the country have become more accurate and faster. ALQ data are mainly vector-based spatial big data in the ESRI (Environmental Systems Research...
This demo presents Sya; the first full-fledged spatial probabilistic knowledge base construction system. Sya is a comprehensive extension to the DeepDive system that enables exploiting the spatial relationships between extracted relations during the knowledge base construction process, and hence results in a better knowledge base output. Sya runs e...
In this paper, we present a novel decaying operator for Telco Big Data (TBD), coined TBD-DP (Data Postdiction). Unlike data prediction, which aims to make a statement about the future value of some tuple, our formulated data postdiction term, aims to make a statement about the past value of some tuple, which doesn't exist anymore as it had to be de...
This paper provides the first attempt for a full-fledged query optimizer for MapReduce-based spatial join algorithms. The optimizer develops its own taxonomy that covers almost all possible ways of doing a spatial join for any two input datasets. The optimizer comes in two flavors; cost-based and rule-based. Given two input data sets, the cost-base...
A telecommunication company (telco) is traditionally only perceived as the entity that provides telecommunication services, such as telephony and data communication access to users. However, the IP backbone infrastructure of such entities spanning densely urban spaces and widely rural areas, provides nowadays a unique opportunity to collect immense...
This demo presents ST-Hadoop; the first full-fledged open-source MapReduce framework with a native support for spatio-temporal data. ST-Hadoop injects spatio-temporal awareness in the Hadoop base code, which results in achieving order(s) of magnitude better performance than Hadoop and SpatialHadoop when dealing with spatio-temporal data and queries...
This paper demonstrates Stella; an efficient crowdsourcing-based geotagging framework for any types of objects. In this demonstration, we showcase the effectiveness of Stella in geotagging images via two different scenarios: (1) we provide a graphical interface to show the process of a geotagging process that have been done by using Amazon Mechanic...
In this tutorial, we present the recent work in the database community for handling Big Spatial Data. This topic became very hot due to the recent explosion in the amount of spatial data generated by smart phones, satellites and medical devices, among others. This tutorial goes beyond the use of existing systems as-is (e.g., Hadoop, Spark or Impala...
This paper provides the vision of a unified spatial crowdsourcing platform that is designed to efficiently tackle different types of spatial tasks which have been gaining a lot of popularity in recent years. Several examples of spatial tasks are ride-sharing services, delivery services, translation tasks, and crowd-sensing tasks. While existing cro...
This paper presents Sphinx, a full-fledged open-source system for big spatial data which overcomes the limitations of existing systems by adopting a standard SQL interface, and by providing a high efficient core built inside the core of the Apache Impala system. Sphinx is composed of four main layers, namely, query parser, indexer, query planner, a...
This paper presents ST-Hadoop; the first full-fledged open-source MapReduce framework with a native support for spatio-temporal data. ST-Hadoop is a comprehensive extension to Hadoop and SpatialHadoop that injects spatio-temporal data awareness inside each of their layers, mainly, language, indexing, and operations layers. In the language layer, ST...
This demo presents Scout; a full-fledged interactive data visualization system with native support for spatio-temporal data. Scout utilizes computing power of GPUs to achieve real-time query performance. The key idea behind Scout is a GPU-aware multi-version spatio-temporal index. The indexing and query processing modules of Scout are designed to c...
Spatial data partitioning (SDP) plays a powerful role in distributed storage and parallel computing for spatial data. However, due to skew distribution of spatial data and varying volume of spatial vector objects, it leads to a significant challenge to ensure both optimal performance of spatial operation and data balance in the cluster. To tackle t...
Predictive spatio-temporal queries are crucial in many applications. Traffic management is an example application, where predictive spatial queries are issued to anticipate jammed areas in advance. Also, location-aware advertising is another example application that targets customers expected to be in the vicinity of a shopping mall in the near fut...
This paper discusses the next generation of digital maps, by positing that maps in future will intelligently self-update themselves based on distinctive events extracted dynamically from social media streams or other crowd-sourced data. To realize this concept, the challenges include developing a scalable and efficient system to deal with a variety...
This paper presents GeoTrend; a system for scalable support of spatial trend discovery on recent microblogs, e.g., tweets and online reviews, that come in real time. GeoTrend is distinguished from existing techniques in three aspects: (1) It discovers trends in arbitrary spatial regions, e.g., city blocks. (2) It supports trending measures that eff...
Recently, many ride sharing systems have been commercially introduced (e.g., Uber, Flinc, and Lyft) forming a multi-billion dollars industry. The main idea is to match people requesting a certain ride to other people who are acting as drivers on their own spare time. The matching algorithm run by these services is very simple and ignores a wide sec...
In early 2000, we had the vision of ubiquitous location services, where each object is aware of its location, and continuously sends its location to a designated database server. This flood of location data opened the door for a myriad of location-based services that were considered visionary at that time, yet today they are a reality and have beco...
Twitter is one of the top-growing online communities in the last years. In this poster, we study the language usage and diversity in Twitter local communities. We identify local communities in Twitter on a country-level. For each community, we examine: (1) the language diversity, (2) the language dominance and how it differs from local to global vi...
Microblogs data, e.g., tweets, reviews, news comments, and social media comments, has gained considerable attention in recent years due to its popularity and rich contents. Nowadays, microblogs applications span a wide spectrum of interests, including analyzing events and users activities and critical applications like discovering health issues and...
The recent explosion in the amount of spatial data calls for specialized systems to handle big spatial data. In this survey, we summarize the state-of-the-art work in the area of big spatial data. We categorize the existing work in this area according to six different angles, namely, approach, architecture, language, indexing, querying, and visuali...
This two volume set LNCS 10041 and LNCS 10042 constitutes the proceedings of the 17th International Conference on Web Information Systems Engineering, WISE 2016, held in Shanghai, China, in November 2016.
The 39 full papers and 31 short papers presented in these proceedings were carefully reviewed and selected from 233 submissions. The papers cover...
There has been a recent marked increase in the amount of spatial data collected by smart phones, space telescopes, and medical devices, among others. The increased volume has brought into focus the need for specialized systems to handle big spatial data. The Era of Big Spatial Data: A Survey summarizes the state-of-the-art in this area. It classifi...
Online Social Networks (OSNs) play a significant role in the daily life of hundreds of millions of people. However, many user profiles in OSNs contain deceptive information. Existing studies have shown that lying in OSNs is quite widespread, often for protecting a user's privacy. In this paper, we propose a novel approach for detecting deceptive pr...
Concept Geo-tagging is the process of assigning a textual identifier that describes a real-world entity to a physical geographic location. A concept can either be a spatial concept where it possesses a spatial presence or be a non-spatial concept where it has no explicit spatial presence. Geo-tagging locations with non-spatial concepts that have no...
This paper presents Sphinx, a full-fledged distributed system which uses a standard SQL interface to process big spatial data. Sphinx adds spatial data types, indexes and query processing, inside the code-base of Cloudera Impala for efficient processing of spatial data. In particular, Sphinx is composed of four main components, namely, query parser...
Modern vehicles are increasingly being equipped with rich instrumentation that enables them to collect location aware data on a wide variety of travel related phenomena such as the real-world performance of engines and powertrain, driver preferences, context of the vehicle with respect to others nearby, and--indirectly--traffic on the transportatio...
SpatialHadoop is an extended MapReduce framework that supports global indexing that spatial partitions the data across machines providing orders of magnitude speedup, compared to traditional Hadoop. In this paper, we describe seven alternative partitioning techniques and experimentally study their effect on the quality of the generated index and th...
This demonstration presents HadoopViz; an extensible MapReduce-based system for visualizing Big Spatial Data. HadoopViz has two main unique features that distinguish it from other techniques. (1) It provides an extensible interface that allows users to visualize various types of data by defining five abstract functions, without delving into the det...