
Christos DoulkeridisUniversity of Piraeus · Department of Digital Systems
Christos Doulkeridis
Computer Engineer, NTUA
About
105
Publications
15,564
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,485
Citations
Citations since 2017
Introduction
Skills and Expertise
Publications
Publications (105)
As the number of moving objects increases, the challenges for achieving operational goals w.r.t. the mobility in many domains that are critical to economy and safety emerge dramatically. In domains such as air traffic management, this dictates a shift of operations’ paradigm from location based, as it is today, to trajectory based, where trajectori...
We present a system for online composite event recognition over streaming positions of commercial vehicles. Our system employs a data enrichment module, augmenting the mobility data with external information, such as weather data and proximity to points of interest. In addition, the composite event recognition module, based on a highly optimised lo...
We present a big data framework for the prediction of streaming trajectory data, enriched from other data sources and exploiting mined patterns of trajectories, allowing accurate long-term predictions with low latency. To meet this goal, we follow a multi-step methodology. First, we efficiently compress surveillance data in an online fashion, by co...
In this paper, we propose NoDA, an abstraction layer consisting of spatio-temporal data access operators, which is used to access NoSQL storage engines in a unified way. NoDA alleviates the burden from big data developers of learning the query language of each NoSQL store, and offers a unified view of the underlying NoSQL store. Our approach is ins...
We present a system for online composite event recognition over streaming positions of commercial vehicles. Our system employs a data enrichment module, augmenting the mobility data with external information, such as weather data and proximity to points of interest. In addition, the composite event recognition module, based on a highly optimised lo...
In this paper, we present the design and implementation of a link discovery (LD) framework targeting spatial and spatio-temporal data. Existing works are either very specific (focusing on limited spatial LD tasks), or even though being generic LD frameworks, they do not support spatial nor spatio-temporal relations. Motivated by such limitations, w...
The ever-increasing size of data emanating from mobile devices and sensors, dictates the use of distributed systems for storing and querying these data. Typically, such data sources provide some spatio-temporal information, alongside other useful data. The RDF data model can be used to interlink and exchange data originating from heterogeneous sour...
Trajectory clustering is an important operation of knowledge discovery from mobility data. Especially nowadays, the need for performing advanced analytic operations over massively produced data, such as mobility traces, in efficient and scalable ways is imperative. However, discovering clusters of complete trajectories can overlook significant patt...
An ever-increasing number of real-life applications produce spatiotemporal data that record the position of moving objects (persons, cars, vessels, aircrafts, etc.). In order to provide integrated views with other relevant data sources (e.g., weather, vessel databases, etc.), this data is represented in RDF and stored in knowledge bases with the fo...
Joining trajectory datasets is a significant operation in mobility data analytics and the cornerstone of various methods that aim to extract knowledge out of them. In the era of Big Data, the production of mobility data has become massive and, consequently, performing such an operation in a centralized way is not feasible. In this paper, we address...
An ever-increasing number of applications in critical domains, such as maritime and aviation, generate, collect, manage and process spatio-temporal data related to the mobility of entities. This wealth of data can be exploited for various purposes, towards improving the safety of operations, reducing economical costs, and increasing dependability:...
Recent state-of-the-art approaches and technologies for generating RDF graphs from non-RDF data, use languages designed for specifying transformations or mappings to data of various kinds of format. This paper presents a new approach for the generation of ontology-annotated RDF graphs, linking data from multiple heterogeneous streaming and archival...
In this paper, we study the problem of spatial link discovery (LD), focusing primarily on topological and proximity relations between spatial objects. The problem is timely due to the large number of sources that generate spatial data, including streaming sources (e.g., surveillance of moving objects) but also archival sources (such as static areas...
This article presents important challenges and progress toward the management of data regarding the maritime domain for supporting analysis tasks. The article introduces our objectives for big data–analysis tasks, thus motivating our efforts toward advanced data-management solutions for mobility data in the maritime domain. The article introduces d...
Motivated by real-life emerging needs in critical domains, this paper proposes a coherent and generic ontology for the representation of semantic trajectories, in association with related events and contextual information. The main contribution of the proposed ontology is the representation of semantic trajectories at different levels of spatio-tem...
Location-based social network users typically publish information about their location and activity (in the form of keywords) along time, thus providing the mobility data management research community with complex and voluminous data. In this work, we handle this kind of data as sequences in the Spatio-Temporal-Keyword (STK) domain. This modeling i...
Motivated by real-life emerging needs in critical domains, this paper proposes a coherent and generic ontology for the representation of semantic trajectories, in association to related events and contextual information, to support analytics. The main contribution of the proposed ontology is twofold: (a) The representation of semantic trajectories...
Cluster analysis over Moving Object Databases (MODs) is a challenging research topic that has attracted the attention of the mobility data mining community. In this paper, we study the temporal-constrained sub-trajectory cluster analysis problem, where the aim is to discover clusters of sub-trajectories given an ad-hoc, user-specified temporal cons...
In this paper, we present a system for scalable and real-time sentiment analysis of Twitter data. The proposed system relies on feature extraction from tweets, using both morphological features and semantic information. For the sentiment analysis task, we adopt a supervised learning approach, where we train various classifiers based on the extracte...
Top-k join is an essential tool for data analysis, since it enables selective retrieval of the k best combined results that come from multiple different input datasets. In the context of Big Data, processing top-k joins over huge datasets requires a scalable platform, such as the widely popular MapReduce framework. However, such a solution does not...
In a wide variety of daily activities, the need of selecting a group of k experts from a larger pool of n candidates (\(k<n\)) based on some criteria often arises. Indicative examples, among many others, include the selection of program committee members for a research conference, staffing an organization’s board with competent members, forming a s...
Given a relation that contains main products and a set of relations corresponding to accessory products that can be combined with a main product, the Exploratory Top-k Join query retrieves the k best combinations of main and accessory products based on user preferences. As a result, the user is presented with a set of k combinations of distinct mai...
User preferences play a significant role in market analysis. In the database literature there has been extensive work on query primitives, such as the well known top-k query that can be used for the ranking of products based on the preferences customers have expressed. Still, the fundamental operation that evaluates the similarity between products...
In this paper, we present the overall architecture of RoadRunner, a Hadoop-based framework that enhances the efficiency of rank-aware query processing by introducing various optimizations to Hadoop, without changing its internal operation. RoadRunner focuses on a specific class of queries that involve ranking, such as top-k queries and top-k joins,...
In modern applications, spatial objects are often annotated with textual descriptions, and users are offered the opportunity to formulate spatio-textual queries. The result set of such a query consists of spatio-textual objects ranked according to their distance from a desired location and to their textual relevance to the query. In this context, a...
In this paper, given a product database and a set of customer preferences, we address the problem of discovering a bounded set of r diverse products that attract the interests of different customers. This problem finds numerous applications in electronic marketplaces, e.g., for selecting the products that are placed in the home page of an online sh...
This paper studies the problem of computing the skyline of a vast-sized spatial dataset in SpatialHadoop, an extension of Hadoop that supports spatial operations efficiently. The problem is particularly interesting due to advent of Big Spatial Data that are generated by modern applications run on mobile devices, and also because of the importance o...
In this paper, we address the problem of discovering a ranked set of k distinct main objects combined with additional (accessory) objects that best fit the given preferences. This problem is challenging because it considers object combinations of variable size, where objects are combined only if the combination produces a higher score, and thus bec...
The trend towards in-memory analytics and CPUs with an increasing number of cores calls for new algorithms that can efficiently utilize the available resources. This need is particularly evident in the case of CPU-intensive query operators. One example of such a query with applicability in data analytics is the skyline query. In this paper, we pres...
Enterprises today acquire vast volumes of data from different sources and leverage this information by means of data analysis to support effective decision-making and provide new functionality and services. The key requirement of data analytics is scalability, simply due to the immense volume of data that need to be extracted, processed, and analyz...
In applications such as market analysis, it is of great interest to product manufacturers to have their products ranked as highly as possible for a significant number of customers. However, customer preferences change over time, and product manufacturers are interested in monitoring the evolution of the popularity of their products, in order to dis...
Top-k queries return to the user only the k best objects based on the individual user preferences and comprise an essential tool for rank-aware query processing. Assuming a stored data set of user preferences, reverse top-k queries have been introduced for retrieving the users that deem a given database object as one of their top-k results. Reverse...
The MapReduce framework for parallel processing of massive data sets has attracted considerable attention recently, mainly due to its salient features that include scalability, simplicity, and fault-tolerance. However, despite its merits, MapReduce follows a brute-force approach, which often results in performing redundant work. This is particularl...
Skyline queries help users make intelligent decisions over complex data. The main shortcoming of skyline queries is that the cardinality of the result set is not known a-priori. To overcome this limitation, the representative skyline query has been proposed, which retrieves a fixed set of k skyline points that best describe all skyline points. Even...
Recently, a trend has been observed towards supporting rank-aware query operators, such as top- k , that enable users to retrieve only a limited set of the most interesting data objects. As data nowadays is commonly stored distributed over multiple servers, a challenging problem is to support rank-aware queries in distributed environments. In this...
In this paper, we study efficient processing of rank joins in highly distributed systems, where servers store fragments of relations in an autonomous manner. Existing rank-join algorithms exhibit poor performance in this setting due to excessive communication costs or high latency. We propose a novel distributed rank-join framework that employs dat...
Recently there has been an increased interest in database management systems to incorporate and support more flexible query operators, such as top-k, that produce results of specified cardinality, thus avoiding huge and overwhelming result sets. Top-k queries retrieve the objects that best match the user requirements by employing user-specified sco...
Emerging applications over distributed, loosely coupled datasets require advanced query processing primitives that go beyond exact match queries. Such applications often need to handle multidimensional data, whether these dimensions are related to specific attributes of the data objects or are the result of advanced feature extraction algorithms. Q...
Skyline queries help users make intelligent decisions over complex data, when different and often conflicting criteria are considered. Such queries return a set of data points that are not dominated by any other point on all dimensions. Skyline queries have been studied in centralized systems and more recently in distributed environments, such as w...
In this chapter, a basic overview is given of P2P systems, architectures, and search strategies in P2P systems. More specific concepts that are outlined include the differences of structured and unstructured P2P systems, categories of P2P systems based on the centralization degree, basic search mechanisms for unstructured P2P systems, as well as de...
Query processing in P2P networks poses inherent challenges and demands non-traditional techniques due to the distribution of content and the lack of global knowledge. Query routing over the P2P network is the key mechanism for efficient query processing. In the case of multidimensional data, designing multidimensional query routing is a non-trivial...
Peer-to-peer systems constitute a promising solution for deploying novel applications, such as distributed image retrieval. Efficient search over widely distributed multimedia content requires techniques for distributed retrieval based on generic metric distance functions. In this paper, we propose a framework for distributed metric-based similarit...
Applications that require a high degree of distribution and loosely-coupled connectivity are ubiquitous in various domains, including scientific databases, bioinformatics, and multimedia retrieval. In all these applications, data is typically voluminous and multidimensional, and support for advanced query operators is required for effective queryin...
Recently, there has been an increased interest in incorporating in database management systems rank-aware query operators,
such as top-k queries, that allow users to retrieve only the most interesting data objects. In this paper, we propose a cache-based approach
for efficiently supporting top-k queries in distributed database management systems. I...
Nowadays, most applications return to the user a limited set of ranked results based on the individual user's preferences, which are commonly expressed through top-k queries. From the perspective of a manufacturer, it is imperative that her products appear in the highest ranked positions for many different user preferences, otherwise the product is...
Scalable search and retrieval over numerous web document collections distributed across different sites can be achieved by adopting a peer-to-peer (P2P) communication model. Terms and their document frequencies are the main components of text information retrieval and as such need to be computed, aggregated, and distributed throughout the system. T...
Scaling up data mining algorithms for data of both high dimensionality and cardinality has been lately recognized as one of the most challenging problems in data mining research. The reason is that typical data mining tasks, such as clustering, cannot produce high quality results when applied on high-dimensional and/or large (in terms of cardinalit...
In this paper, we study the generation of eficient execution plans for skyline query processing in large-scale distributed environments. In such a setting, each server stores autonomously a fraction of the data, thus all servers need to process the skyline query. An execution plan defines the order in which the individual skyline queries are process...
This paper addresses the problem of efficiently computing the skyline set of a relational join. Existing techniques either require to access all tuples of the input relations or demand specialized multi-dimensional access methods to generate the skyline join result. To avoid these inefficiencies, we introduce the novel SFSJ algorithm that fuses the...
Data generation increases at highly dynamic rates, making its storage, processing, and update costs at one central location excessive. The P2P paradigm emerges as a powerful model for organizing and searching large data repositories distributed over independent sources. Advanced query operators, such as skyline queries, are necessary in order to he...
In typical mobile applications, mobile users seek points of interest in their vicinity (e.g., nearby restaurants) that best match their preferences. We assume a set of points of interest described by a combination of static and dynamic attributes, and a set of mobile users mi, each associated with a weighting vector wi, which expresses mi's prefere...
Top-k spatial preference queries return a ranked set of the k best data objects based on the scores of feature objects in their spatial neighborhood. Despite the wide range of location-based applications that rely on spatial preference queries, existing algorithms incur non-negligible processing cost resulting in high response time. The reason is t...
Semantic Overlay Networks (SONs) have been recently proposed as a way to organize content in peer-to-peer (P2P) networks.
The main objective is to discover peers with similar content and then form thematically focused peer groups. Efficient content
retrieval can be performed by having queries selectively forwarded only to relevant groups of peers t...
Top-k queries are widely applied for retrieving a ranked set of the k most interesting objects based on the individual user preferences. As an example, in online marketplaces, customers (users) typically seek a ranked set of products (objects) that satisfy their needs. Reversing top-k queries leads to a query type that instead returns the set of cu...
Similarity search in metric spaces has several important applications both in centralized and distributed environments. In
centralized applications, such as similarity-based image retrieval, usually a server indexes its data with a state-of-the-art
centralized metric indexing technique, such as the M-Tree. In this paper, we propose a framework for...
The advent of the World Wide Web has made an enormous amount of information available to everyone and the widespread use of digital equip- ment enables end-users (peers) to produce their own digital content. This vast amount of information re- quires scalable data management systems. Peer-to-peer (P2P) systems have so far been well established in s...
Rank-aware query processing has become essential for many applications that return to the user only the top-k objects based on the individual user's preferences. Top-k queries have been mainly studied from the perspective of the user, focusing primarily on efficient query processing. In this work, for the first time, we study top-k queries from the...
Recently, the problem of efficiently supporting advanced query operators, such as nearest neighbor or range queries, over multidimensional data in widely distributed environments has attracted much attention. In unstructured peer-to-peer (P2P) networks, peers store data in an autonomous manner, thus multidimensional routing indices (MRI) are requir...
Similarity search in P2P systems has attracted a lot of attention recently and several important applications, like distributed
image search, can profit from the proposed distributed algorithms. In this paper, we address the challenging problem of efficient
processing of range queries in metric spaces, where data is horizontally distributed across...
Peer-to-peer (P2P) systems have been recently proposed for providing search and information retrieval facilities over distributed data sources, including web data. Terms and their document frequencies are the main building blocks of retrieval and as such need to be computed, aggregated, and distributed throughout the system. This is a tedious task,...
Contemporary data-intensive applications generate large datasets of very high dimensionality. Data management in high-dimensional spaces presents problems, such as the degradation of query pro- cessing performance, a phenomenon also known as the curse of dimensionality. Dimensionality reduction (DR) tackles this prob- lem, by efficiently embedding...
Traditional routing indices in peer-to-peer (P2P) networks are mainly designed for document retrieval applications and maintain aggregated one-dimensional values representing the number of documents that can be obtained in a certain direc- tion in the network. In this paper, we introduce the concept of multidimensional routing indices (MRIs), which...
Traditional web service discovery is strongly related to the use of service directories. Especially in the case of mobile web services, where both service requestors and providers are mobile, the dynamics impose the need for directory-based discovery. Context plays an eminent role with mobility, as a filtering mechanism that enhances service discov...
Due to applications and systems such as sensor networks, data streams, and peer-to-peer (P2P) networks, data generation and storage become increasingly distributed. Therefore a challenging problem is to support best-match query processing in highly distributed environments. In this paper, we present a novel framework for top-k query processing in l...
XML is emerging as the de-facto standard for semistructured con- tents and metadata. Searching this content in mobile environments is challenging, since centralized approaches are not appropriate in a very dynamic environment with limited resources available for keeping a centralized index up-to-date. A more appropriate so- lution is to organize th...
Lately the advances in centralized database management systems show a trend towards supporting rank-aware query operators, like top-k, that enable users to retrieve only the most interesting data objects. A challenging problem is to support rank-aware queries in highly distributed environments. In this paper, we present a novel approach, called SPE...
Recently, skyline queries have attracted much attention in the database research community. Space partitioning tech- niques, such as recursive division of the data space, have been used for skyline query processing in centralized, paral- lel and distributed settings. Unfortunately, such grid-based partitioning is not suitable in the case of a paral...
This paper addresses the challenging problem of similarity search over widely distributed ultra-high dimensional data. Such an appli- cation is retrieval of the top-k most similar documents in a widely distributed document collection, as in the case of digital libraries. Peer-to-peer (P2P) systems emerge as a promising solution to delve with conten...
The World Wide Web provides an enormous amount of images easily accessible to everybody. The main challenge is to provide
efficient search mechanisms for image content that are truly scalable and can support full coverage of web contents. In this
paper, we present an architecture that adopts the peer-to-peer (P2P) paradigm for indexing, searching a...