Mohamed Mokbel

Mohamed Mokbel
  • University of Minnesota

About

306
Publications
60,094
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
12,351
Citations
Current institution
University of Minnesota

Publications

Publications (306)
Article
Though data cleaning systems have earned great success and wide spread in both academia and industry, they fall short when trying to clean spatial data. The main reason is that state-of-the-art data cleaning systems mainly rely on functional dependency rules where there is sufficient co-occurrence of value pairs to learn that a certain value of an...
Article
Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traff...
Article
The availability of trajectory data combined with various real life practical applications have sparked the interest of the research community to design a plethora of algorithms for various trajectory analysis techniques. However, there is an apparent lack of full-fledged systems that provide the infrastructure support for trajectory analysis techn...
Article
Numerous important applications rely on detailed trajectory data. Yet, unfortunately, trajectory datasets are typically sparse with large spatial and temporal gaps between each two points, which is a major hurdle for their accuracy. This paper presents Kamel; a scalable trajectory imputation system that inserts additional realistic trajectory point...
Conference Paper
GPS-enabled devices, including vehicles, smartphones, wearable and tracking devices, as well as various check-in and social network data are continuously producing tremendous amounts of trajectory data, which are used consistently in many applications such as urban planning and map inference. Existing techniques for trajectory data imputation rely...
Preprint
Full-text available
Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traff...
Conference Paper
The ability to collect large numbers of trajectory data through GPS-enabled devices have enabled a myriad of very important applications that are widely used on a daily basis. This includes urban computing, transportation, and map APIs for routing and navigation. Unfortunately, a major hinder for all these applications is the accuracy of collected...
Article
Maps services are ubiquitous in widely used applications including navigation systems, ride sharing, and items/food delivery. Though there are plenty of efforts to support such services through designing more efficient algorithms, we believe that efficiency is no longer a bottleneck to these services. Instead, it is the accuracy of the underlying r...
Article
This demo presents QARTA; an open-source full-fledged system for highly accurate and scalable map services. QARTA employs machine learning techniques to: (a) construct its own highly accurate map in terms of both map topology and edge weights, and (b) calibrate its query answers based on contextual information, including transportation modality, un...
Article
Full-text available
The systems community in both academia and industry has tremendous success in building widely used general purpose systems for various types of data and applications. Examples include database systems, big data systems, data streaming systems, and machine learning systems. The vast majority of these systems are ill equipped in terms of supporting s...
Article
As pandemic wide spread results in locking down vital facilities, digital contact tracing is deemed as a key for re-opening. However, current efforts in digital contact tracing, running as mobile apps on users' smartphones, fall short in being effective and present two major weaknesses related to accessibility and apparent privacy concern augmentat...
Article
Full-text available
Due to its significant economic and environmental impact, sharing the ride among a number of drivers (i.e., car pooling) has recently gained significant interest from industry and academia. Hence, a number of ride sharing services have appeared along with various algorithms on how to match a rider request to a driver who can provide the ride sharin...
Preprint
Full-text available
As pandemic wide spread results in locking down vital facilities, digital contact tracing is deemed as a key for re-opening. However, current efforts in digital contact tracing, running as mobile apps on users' smartphones, fall short in being effective. This paper lays out the vision and guidelines for the next era of digital contact tracing, wher...
Preprint
Full-text available
Travel time estimation is an important component in modern transportation applications. The state of the art techniques for travel time estimation use GPS traces to learn the weights of a road network, often modeled as a directed graph, then apply Dijkstra-like algorithms to find shortest paths. Travel time is then computed as the sum of edge weigh...
Article
Full-text available
This paper presents GeoTrend+; a system approach to support scalable local trend discovery on recent microblogs, e.g., tweets, comments, online reviews, and check-ins, that come in real time. GeoTrend+ discovers top-k trending keywords in arbitrary spatial regions from recent microblogs that continuously arrive with high rates and a significant por...
Article
Full-text available
Microblogs data is the microlength user-generated data that is posted on the web, e.g., tweets, online reviews, comments on news and social media. It has gained considerable attention in recent years due to its widespread popularity, rich content, and value in several societal applications. Nowadays, microblogs applications span a wide spectrum of...
Conference Paper
The proliferation in amounts of generated data has propelled the rise of scalable machine learning solutions to efficiently analyze and extract useful insights from such data. Meanwhile, spatial data has become ubiquitous, e.g., GPS data, with increasingly sheer sizes in recent years. The applications of big spatial data span a wide spectrum of int...
Article
Autologistic regression is one of the most popular statistical tools to predict spatial phenomena in several applications, including epidemic diseases detection, species occurrence prediction, earth observation, and business management. In general, autologistic regression divides the space into a two-dimensional grid, where the prediction is perfor...
Article
Full-text available
In this paper, we present two novel decaying operators for Telco Big Data (TBD), coined TBD-DP and CTBD-DP that are founded on the notion of Data Postdiction. Unlike data prediction, which aims to make a statement about the future value of some tuple, our formulated data postdiction term, aims to make a statement about the past value of some tuple,...
Conference Paper
Event detection applications have gained significant attention with the rise of user-generated spatio-temporal data over the past decade. However, building event detection applications still encounter high cost and effort due to lack of support in existing data management systems. This paper envisions a holistic system approach to support an effici...
Article
Full-text available
Hadoop, employing the MapReduce programming paradigm, has been widely accepted as the standard framework for analyzing big data in distributed environments. Unfortunately, this rich framework has not been exploited for processing large-scale computational geometry operations. This paper introduces CG_Hadoop; a suite of scalable and efficient MapRed...
Article
The current explosion in spatial data raises the need for efficient spatial analysis tools to extract useful information from such data. However, existing tools are neither generic nor scalable when dealing with big spatial data. This demo presents Flash; a framework for generic and scalable spatial data analysis, with a special focus on spatial pr...
Article
With the increase in amount of remote sensing data, there have been efforts to efficiently process it to help ecologists and geographers answer queries. However, they often need to process this data in combination with vector data, for example, city boundaries. Existing efforts require one dataset to be converted to the other representation, which...
Article
The proliferation in amounts of generated data has propelled the rise of scalable machine learning solutions to efficiently analyze and extract useful insights from such data. Meanwhile, spatial data has become ubiquitous, e.g., GPS data, with increasingly sheer sizes in recent years. The applications of big spatial data span a wide spectrum of int...
Conference Paper
The proliferation in amounts of generated data has propelled the rise of scalable machine learning solutions to efficiently analyze and extract useful insights from such data. Meanwhile, spatial data has become ubiquitous, e.g., GPS data, with increasingly sheer sizes in recent years. The applications of big spatial data span a wide spectrum of int...
Conference Paper
Predicting the presence or absence of spatial phenomena has been of great interest to scientists pursuing research in several applications including epidemic diseases detection, species occurrence prediction and earth observation. In this operation, a geographical space is divided by a two-dimensional grid, where the prediction (i.e, either 0 or 1)...
Conference Paper
Geotagged data (e.g. images or news items) have empowered various important applications, e.g., search engines and news agencies. However, the lack of available geotagged data significantly reduces the impact of such applications. Meanwhile, existing geotagging approaches rely on the existence of prior knowledge, e.g., accurate training dataset for...
Conference Paper
Understanding link travel times (LTT) has received significant attention in transportation and spatial computing literature but they often remain behind closed doors, primarily because the data used for capturing them is considered confidential. Consequently, free and open maps such as OpenStreetMap (OSM) or TIGER, while being remarkably accurate i...
Article
Full-text available
This paper presents ST-Hadoop; the first full-fledged open-source MapReduce framework with a native support for spatio-temporal data. ST-Hadoop is a comprehensive extension to Hadoop and SpatialHadoop that injects spatio-temporal data awareness inside each of their layers, mainly, language, indexing, and operations layers. In the language layer, ST...
Article
Full-text available
Arable land quality (ALQ) data are a foundational resource for national food security. With the rapid development of spatial information technologies, the annual acquisition and update of ALQ data covering the country have become more accurate and faster. ALQ data are mainly vector-based spatial big data in the ESRI (Environmental Systems Research...
Conference Paper
This demo presents Sya; the first full-fledged spatial probabilistic knowledge base construction system. Sya is a comprehensive extension to the DeepDive system that enables exploiting the spatial relationships between extracted relations during the knowledge base construction process, and hence results in a better knowledge base output. Sya runs e...
Conference Paper
In this paper, we present a novel decaying operator for Telco Big Data (TBD), coined TBD-DP (Data Postdiction). Unlike data prediction, which aims to make a statement about the future value of some tuple, our formulated data postdiction term, aims to make a statement about the past value of some tuple, which doesn't exist anymore as it had to be de...
Conference Paper
This paper provides the first attempt for a full-fledged query optimizer for MapReduce-based spatial join algorithms. The optimizer develops its own taxonomy that covers almost all possible ways of doing a spatial join for any two input datasets. The optimizer comes in two flavors; cost-based and rule-based. Given two input data sets, the cost-base...
Conference Paper
Full-text available
A telecommunication company (telco) is traditionally only perceived as the entity that provides telecommunication services, such as telephony and data communication access to users. However, the IP backbone infrastructure of such entities spanning densely urban spaces and widely rural areas, provides nowadays a unique opportunity to collect immense...
Article
Full-text available
This demo presents ST-Hadoop; the first full-fledged open-source MapReduce framework with a native support for spatio-temporal data. ST-Hadoop injects spatio-temporal awareness in the Hadoop base code, which results in achieving order(s) of magnitude better performance than Hadoop and SpatialHadoop when dealing with spatio-temporal data and queries...
Article
This paper demonstrates Stella; an efficient crowdsourcing-based geotagging framework for any types of objects. In this demonstration, we showcase the effectiveness of Stella in geotagging images via two different scenarios: (1) we provide a graphical interface to show the process of a geotagging process that have been done by using Amazon Mechanic...
Article
In this tutorial, we present the recent work in the database community for handling Big Spatial Data. This topic became very hot due to the recent explosion in the amount of spatial data generated by smart phones, satellites and medical devices, among others. This tutorial goes beyond the use of existing systems as-is (e.g., Hadoop, Spark or Impala...
Conference Paper
This paper provides the vision of a unified spatial crowdsourcing platform that is designed to efficiently tackle different types of spatial tasks which have been gaining a lot of popularity in recent years. Several examples of spatial tasks are ride-sharing services, delivery services, translation tasks, and crowd-sensing tasks. While existing cro...
Conference Paper
This paper presents Sphinx, a full-fledged open-source system for big spatial data which overcomes the limitations of existing systems by adopting a standard SQL interface, and by providing a high efficient core built inside the core of the Apache Impala system. Sphinx is composed of four main layers, namely, query parser, indexer, query planner, a...
Conference Paper
Full-text available
This paper presents ST-Hadoop; the first full-fledged open-source MapReduce framework with a native support for spatio-temporal data. ST-Hadoop is a comprehensive extension to Hadoop and SpatialHadoop that injects spatio-temporal data awareness inside each of their layers, mainly, language, indexing, and operations layers. In the language layer, ST...
Conference Paper
This demo presents Scout; a full-fledged interactive data visualization system with native support for spatio-temporal data. Scout utilizes computing power of GPUs to achieve real-time query performance. The key idea behind Scout is a GPU-aware multi-version spatio-temporal index. The indexing and query processing modules of Scout are designed to c...
Article
Spatial data partitioning (SDP) plays a powerful role in distributed storage and parallel computing for spatial data. However, due to skew distribution of spatial data and varying volume of spatial vector objects, it leads to a significant challenge to ensure both optimal performance of spatial operation and data balance in the cluster. To tackle t...
Article
Full-text available
Predictive spatio-temporal queries are crucial in many applications. Traffic management is an example application, where predictive spatial queries are issued to anticipate jammed areas in advance. Also, location-aware advertising is another example application that targets customers expected to be in the vicinity of a shopping mall in the near fut...
Conference Paper
Full-text available
This paper discusses the next generation of digital maps, by positing that maps in future will intelligently self-update themselves based on distinctive events extracted dynamically from social media streams or other crowd-sourced data. To realize this concept, the challenges include developing a scalable and efficient system to deal with a variety...
Conference Paper
Full-text available
This paper presents GeoTrend; a system for scalable support of spatial trend discovery on recent microblogs, e.g., tweets and online reviews, that come in real time. GeoTrend is distinguished from existing techniques in three aspects: (1) It discovers trends in arbitrary spatial regions, e.g., city blocks. (2) It supports trending measures that eff...
Conference Paper
Full-text available
Recently, many ride sharing systems have been commercially introduced (e.g., Uber, Flinc, and Lyft) forming a multi-billion dollars industry. The main idea is to match people requesting a certain ride to other people who are acting as drivers on their own spare time. The matching algorithm run by these services is very simple and ignores a wide sec...
Article
In early 2000, we had the vision of ubiquitous location services, where each object is aware of its location, and continuously sends its location to a designated database server. This flood of location data opened the door for a myriad of location-based services that were considered visionary at that time, yet today they are a reality and have beco...
Conference Paper
Full-text available
Twitter is one of the top-growing online communities in the last years. In this poster, we study the language usage and diversity in Twitter local communities. We identify local communities in Twitter on a country-level. For each community, we examine: (1) the language diversity, (2) the language dominance and how it differs from local to global vi...
Conference Paper
Full-text available
Microblogs data, e.g., tweets, reviews, news comments, and social media comments, has gained considerable attention in recent years due to its popularity and rich contents. Nowadays, microblogs applications span a wide spectrum of interests, including analyzing events and users activities and critical applications like discovering health issues and...
Article
Full-text available
The recent explosion in the amount of spatial data calls for specialized systems to handle big spatial data. In this survey, we summarize the state-of-the-art work in the area of big spatial data. We categorize the existing work in this area according to six different angles, namely, approach, architecture, language, indexing, querying, and visuali...
Book
This two volume set LNCS 10041 and LNCS 10042 constitutes the proceedings of the 17th International Conference on Web Information Systems Engineering, WISE 2016, held in Shanghai, China, in November 2016. The 39 full papers and 31 short papers presented in these proceedings were carefully reviewed and selected from 233 submissions. The papers cover...
Book
There has been a recent marked increase in the amount of spatial data collected by smart phones, space telescopes, and medical devices, among others. The increased volume has brought into focus the need for specialized systems to handle big spatial data. The Era of Big Spatial Data: A Survey summarizes the state-of-the-art in this area. It classifi...
Article
Full-text available
Online Social Networks (OSNs) play a significant role in the daily life of hundreds of millions of people. However, many user profiles in OSNs contain deceptive information. Existing studies have shown that lying in OSNs is quite widespread, often for protecting a user's privacy. In this paper, we propose a novel approach for detecting deceptive pr...
Conference Paper
Full-text available
Concept Geo-tagging is the process of assigning a textual identifier that describes a real-world entity to a physical geographic location. A concept can either be a spatial concept where it possesses a spatial presence or be a non-spatial concept where it has no explicit spatial presence. Geo-tagging locations with non-spatial concepts that have no...
Conference Paper
Full-text available
This paper presents Sphinx, a full-fledged distributed system which uses a standard SQL interface to process big spatial data. Sphinx adds spatial data types, indexes and query processing, inside the code-base of Cloudera Impala for efficient processing of spatial data. In particular, Sphinx is composed of four main components, namely, query parser...
Conference Paper
Full-text available
Modern vehicles are increasingly being equipped with rich instrumentation that enables them to collect location aware data on a wide variety of travel related phenomena such as the real-world performance of engines and powertrain, driver preferences, context of the vehicle with respect to others nearby, and--indirectly--traffic on the transportatio...
Article
Full-text available
SpatialHadoop is an extended MapReduce framework that supports global indexing that spatial partitions the data across machines providing orders of magnitude speedup, compared to traditional Hadoop. In this paper, we describe seven alternative partitioning techniques and experimentally study their effect on the quality of the generated index and th...
Article
This demonstration presents HadoopViz; an extensible MapReduce-based system for visualizing Big Spatial Data. HadoopViz has two main unique features that distinguish it from other techniques. (1) It provides an extensible interface that allows users to visualize various types of data by defining five abstract functions, without delving into the det...

Network

Cited By