Christos Doulkeridis

Christos Doulkeridis
University of Piraeus · Department of Digital Systems

Computer Engineer, NTUA

About

105
Publications
15,564
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,485
Citations
Citations since 2017
24 Research Items
1231 Citations
2017201820192020202120222023050100150200250
2017201820192020202120222023050100150200250
2017201820192020202120222023050100150200250
2017201820192020202120222023050100150200250

Publications

Publications (105)
Article
Full-text available
As the number of moving objects increases, the challenges for achieving operational goals w.r.t. the mobility in many domains that are critical to economy and safety emerge dramatically. In domains such as air traffic management, this dictates a shift of operations’ paradigm from location based, as it is today, to trajectory based, where trajectori...
Article
We present a system for online composite event recognition over streaming positions of commercial vehicles. Our system employs a data enrichment module, augmenting the mobility data with external information, such as weather data and proximity to points of interest. In addition, the composite event recognition module, based on a highly optimised lo...
Conference Paper
Full-text available
We present a big data framework for the prediction of streaming trajectory data, enriched from other data sources and exploiting mined patterns of trajectories, allowing accurate long-term predictions with low latency. To meet this goal, we follow a multi-step methodology. First, we efficiently compress surveillance data in an online fashion, by co...
Conference Paper
In this paper, we propose NoDA, an abstraction layer consisting of spatio-temporal data access operators, which is used to access NoSQL storage engines in a unified way. NoDA alleviates the burden from big data developers of learning the query language of each NoSQL store, and offers a unified view of the underlying NoSQL store. Our approach is ins...
Preprint
We present a system for online composite event recognition over streaming positions of commercial vehicles. Our system employs a data enrichment module, augmenting the mobility data with external information, such as weather data and proximity to points of interest. In addition, the composite event recognition module, based on a highly optimised lo...
Conference Paper
In this paper, we present the design and implementation of a link discovery (LD) framework targeting spatial and spatio-temporal data. Existing works are either very specific (focusing on limited spatial LD tasks), or even though being generic LD frameworks, they do not support spatial nor spatio-temporal relations. Motivated by such limitations, w...
Article
Full-text available
The ever-increasing size of data emanating from mobile devices and sensors, dictates the use of distributed systems for storing and querying these data. Typically, such data sources provide some spatio-temporal information, alongside other useful data. The RDF data model can be used to interlink and exchange data originating from heterogeneous sour...
Preprint
Trajectory clustering is an important operation of knowledge discovery from mobility data. Especially nowadays, the need for performing advanced analytic operations over massively produced data, such as mobility traces, in efficient and scalable ways is imperative. However, discovering clusters of complete trajectories can overlook significant patt...
Conference Paper
An ever-increasing number of real-life applications produce spatiotemporal data that record the position of moving objects (persons, cars, vessels, aircrafts, etc.). In order to provide integrated views with other relevant data sources (e.g., weather, vessel databases, etc.), this data is represented in RDF and stored in knowledge bases with the fo...
Preprint
Joining trajectory datasets is a significant operation in mobility data analytics and the cornerstone of various methods that aim to extract knowledge out of them. In the era of Big Data, the production of mobility data has become massive and, consequently, performing such an operation in a centralized way is not feasible. In this paper, we address...
Article
An ever-increasing number of applications in critical domains, such as maritime and aviation, generate, collect, manage and process spatio-temporal data related to the mobility of entities. This wealth of data can be exploited for various purposes, towards improving the safety of operations, reducing economical costs, and increasing dependability:...
Conference Paper
Recent state-of-the-art approaches and technologies for generating RDF graphs from non-RDF data, use languages designed for specifying transformations or mappings to data of various kinds of format. This paper presents a new approach for the generation of ontology-annotated RDF graphs, linking data from multiple heterogeneous streaming and archival...
Article
In this paper, we study the problem of spatial link discovery (LD), focusing primarily on topological and proximity relations between spatial objects. The problem is timely due to the large number of sources that generate spatial data, including streaming sources (e.g., surveillance of moving objects) but also archival sources (such as static areas...
Chapter
This article presents important challenges and progress toward the management of data regarding the maritime domain for supporting analysis tasks. The article introduces our objectives for big data–analysis tasks, thus motivating our efforts toward advanced data-management solutions for mobility data in the maritime domain. The article introduces d...
Conference Paper
Motivated by real-life emerging needs in critical domains, this paper proposes a coherent and generic ontology for the representation of semantic trajectories, in association with related events and contextual information. The main contribution of the proposed ontology is the representation of semantic trajectories at different levels of spatio-tem...
Conference Paper
Location-based social network users typically publish information about their location and activity (in the form of keywords) along time, thus providing the mobility data management research community with complex and voluminous data. In this work, we handle this kind of data as sequences in the Spatio-Temporal-Keyword (STK) domain. This modeling i...
Conference Paper
Motivated by real-life emerging needs in critical domains, this paper proposes a coherent and generic ontology for the representation of semantic trajectories, in association to related events and contextual information, to support analytics. The main contribution of the proposed ontology is twofold: (a) The representation of semantic trajectories...
Article
Full-text available
Cluster analysis over Moving Object Databases (MODs) is a challenging research topic that has attracted the attention of the mobility data mining community. In this paper, we study the temporal-constrained sub-trajectory cluster analysis problem, where the aim is to discover clusters of sub-trajectories given an ad-hoc, user-specified temporal cons...
Conference Paper
Full-text available
In this paper, we present a system for scalable and real-time sentiment analysis of Twitter data. The proposed system relies on feature extraction from tweets, using both morphological features and semantic information. For the sentiment analysis task, we adopt a supervised learning approach, where we train various classifiers based on the extracte...
Conference Paper
Top-k join is an essential tool for data analysis, since it enables selective retrieval of the k best combined results that come from multiple different input datasets. In the context of Big Data, processing top-k joins over huge datasets requires a scalable platform, such as the widely popular MapReduce framework. However, such a solution does not...
Conference Paper
In a wide variety of daily activities, the need of selecting a group of k experts from a larger pool of n candidates (\(k<n\)) based on some criteria often arises. Indicative examples, among many others, include the selection of program committee members for a research conference, staffing an organization’s board with competent members, forming a s...
Article
Given a relation that contains main products and a set of relations corresponding to accessory products that can be combined with a main product, the Exploratory Top-k Join query retrieves the k best combinations of main and accessory products based on user preferences. As a result, the user is presented with a set of k combinations of distinct mai...
Article
User preferences play a significant role in market analysis. In the database literature there has been extensive work on query primitives, such as the well known top-k query that can be used for the ranking of products based on the preferences customers have expressed. Still, the fundamental operation that evaluates the similarity between products...
Conference Paper
In this paper, we present the overall architecture of RoadRunner, a Hadoop-based framework that enhances the efficiency of rank-aware query processing by introducing various optimizations to Hadoop, without changing its internal operation. RoadRunner focuses on a specific class of queries that involve ranking, such as top-k queries and top-k joins,...
Conference Paper
In modern applications, spatial objects are often annotated with textual descriptions, and users are offered the opportunity to formulate spatio-textual queries. The result set of such a query consists of spatio-textual objects ranked according to their distance from a desired location and to their textual relevance to the query. In this context, a...
Conference Paper
Full-text available
In this paper, given a product database and a set of customer preferences, we address the problem of discovering a bounded set of r diverse products that attract the interests of different customers. This problem finds numerous applications in electronic marketplaces, e.g., for selecting the products that are placed in the home page of an online sh...
Article
This paper studies the problem of computing the skyline of a vast-sized spatial dataset in SpatialHadoop, an extension of Hadoop that supports spatial operations efficiently. The problem is particularly interesting due to advent of Big Spatial Data that are generated by modern applications run on mobile devices, and also because of the importance o...
Article
In this paper, we address the problem of discovering a ranked set of k distinct main objects combined with additional (accessory) objects that best fit the given preferences. This problem is challenging because it considers object combinations of variable size, where objects are combined only if the combination produces a higher score, and thus bec...
Conference Paper
The trend towards in-memory analytics and CPUs with an increasing number of cores calls for new algorithms that can efficiently utilize the available resources. This need is particularly evident in the case of CPU-intensive query operators. One example of such a query with applicability in data analytics is the skyline query. In this paper, we pres...
Article
Enterprises today acquire vast volumes of data from different sources and leverage this information by means of data analysis to support effective decision-making and provide new functionality and services. The key requirement of data analytics is scalability, simply due to the immense volume of data that need to be extracted, processed, and analyz...
Conference Paper
In applications such as market analysis, it is of great interest to product manufacturers to have their products ranked as highly as possible for a significant number of customers. However, customer preferences change over time, and product manufacturers are interested in monitoring the evolution of the popularity of their products, in order to dis...
Conference Paper
Top-k queries return to the user only the k best objects based on the individual user preferences and comprise an essential tool for rank-aware query processing. Assuming a stored data set of user preferences, reverse top-k queries have been introduced for retrieving the users that deem a given database object as one of their top-k results. Reverse...
Conference Paper
The MapReduce framework for parallel processing of massive data sets has attracted considerable attention recently, mainly due to its salient features that include scalability, simplicity, and fault-tolerance. However, despite its merits, MapReduce follows a brute-force approach, which often results in performing redundant work. This is particularl...
Conference Paper
Skyline queries help users make intelligent decisions over complex data. The main shortcoming of skyline queries is that the cardinality of the result set is not known a-priori. To overcome this limitation, the representative skyline query has been proposed, which retrieves a fixed set of k skyline points that best describe all skyline points. Even...
Article
Recently, a trend has been observed towards supporting rank-aware query operators, such as top- k , that enable users to retrieve only a limited set of the most interesting data objects. As data nowadays is commonly stored distributed over multiple servers, a challenging problem is to support rank-aware queries in distributed environments. In this...
Article
Full-text available
In this paper, we study efficient processing of rank joins in highly distributed systems, where servers store fragments of relations in an autonomous manner. Existing rank-join algorithms exhibit poor performance in this setting due to excessive communication costs or high latency. We propose a novel distributed rank-join framework that employs dat...
Chapter
Recently there has been an increased interest in database management systems to incorporate and support more flexible query operators, such as top-k, that produce results of specified cardinality, thus avoiding huge and overwhelming result sets. Top-k queries retrieve the objects that best match the user requirements by employing user-specified sco...
Chapter
Emerging applications over distributed, loosely coupled datasets require advanced query processing primitives that go beyond exact match queries. Such applications often need to handle multidimensional data, whether these dimensions are related to specific attributes of the data objects or are the result of advanced feature extraction algorithms. Q...
Chapter
Skyline queries help users make intelligent decisions over complex data, when different and often conflicting criteria are considered. Such queries return a set of data points that are not dominated by any other point on all dimensions. Skyline queries have been studied in centralized systems and more recently in distributed environments, such as w...
Chapter
In this chapter, a basic overview is given of P2P systems, architectures, and search strategies in P2P systems. More specific concepts that are outlined include the differences of structured and unstructured P2P systems, categories of P2P systems based on the centralization degree, basic search mechanisms for unstructured P2P systems, as well as de...
Chapter
Query processing in P2P networks poses inherent challenges and demands non-traditional techniques due to the distribution of content and the lack of global knowledge. Query routing over the P2P network is the key mechanism for efficient query processing. In the case of multidimensional data, designing multidimensional query routing is a non-trivial...
Article
Full-text available
Peer-to-peer systems constitute a promising solution for deploying novel applications, such as distributed image retrieval. Efficient search over widely distributed multimedia content requires techniques for distributed retrieval based on generic metric distance functions. In this paper, we propose a framework for distributed metric-based similarit...
Book
Applications that require a high degree of distribution and loosely-coupled connectivity are ubiquitous in various domains, including scientific databases, bioinformatics, and multimedia retrieval. In all these applications, data is typically voluminous and multidimensional, and support for advanced query operators is required for effective queryin...
Conference Paper
Full-text available
Recently, there has been an increased interest in incorporating in database management systems rank-aware query operators, such as top-k queries, that allow users to retrieve only the most interesting data objects. In this paper, we propose a cache-based approach for efficiently supporting top-k queries in distributed database management systems. I...
Article
Full-text available
Nowadays, most applications return to the user a limited set of ranked results based on the individual user's preferences, which are commonly expressed through top-k queries. From the perspective of a manufacturer, it is imperative that her products appear in the highest ranked positions for many different user preferences, otherwise the product is...
Article
Scalable search and retrieval over numerous web document collections distributed across different sites can be achieved by adopting a peer-to-peer (P2P) communication model. Terms and their document frequencies are the main components of text information retrieval and as such need to be computed, aggregated, and distributed throughout the system. T...
Article
Full-text available
Scaling up data mining algorithms for data of both high dimensionality and cardinality has been lately recognized as one of the most challenging problems in data mining research. The reason is that typical data mining tasks, such as clustering, cannot produce high quality results when applied on high-dimensional and/or large (in terms of cardinalit...
Conference Paper
Full-text available
In this paper, we study the generation of eficient execution plans for skyline query processing in large-scale distributed environments. In such a setting, each server stores autonomously a fraction of the data, thus all servers need to process the skyline query. An execution plan defines the order in which the individual skyline queries are process...
Conference Paper
Full-text available
This paper addresses the problem of efficiently computing the skyline set of a relational join. Existing techniques either require to access all tuples of the input relations or demand specialized multi-dimensional access methods to generate the skyline join result. To avoid these inefficiencies, we introduce the novel SFSJ algorithm that fuses the...
Article
Full-text available
Data generation increases at highly dynamic rates, making its storage, processing, and update costs at one central location excessive. The P2P paradigm emerges as a powerful model for organizing and searching large data repositories distributed over independent sources. Advanced query operators, such as skyline queries, are necessary in order to he...
Article
In typical mobile applications, mobile users seek points of interest in their vicinity (e.g., nearby restaurants) that best match their preferences. We assume a set of points of interest described by a combination of static and dynamic attributes, and a set of mobile users mi, each associated with a weighting vector wi, which expresses mi's prefere...
Article
Full-text available
Top-k spatial preference queries return a ranked set of the k best data objects based on the scores of feature objects in their spatial neighborhood. Despite the wide range of location-based applications that rely on spatial preference queries, existing algorithms incur non-negligible processing cost resulting in high response time. The reason is t...
Chapter
Full-text available
Semantic Overlay Networks (SONs) have been recently proposed as a way to organize content in peer-to-peer (P2P) networks. The main objective is to discover peers with similar content and then form thematically focused peer groups. Efficient content retrieval can be performed by having queries selectively forwarded only to relevant groups of peers t...
Article
Full-text available
Top-k queries are widely applied for retrieving a ranked set of the k most interesting objects based on the individual user preferences. As an example, in online marketplaces, customers (users) typically seek a ranked set of products (objects) that satisfy their needs. Reversing top-k queries leads to a query type that instead returns the set of cu...
Conference Paper
Full-text available
Similarity search in metric spaces has several important applications both in centralized and distributed environments. In centralized applications, such as similarity-based image retrieval, usually a server indexes its data with a state-of-the-art centralized metric indexing technique, such as the M-Tree. In this paper, we propose a framework for...
Article
Full-text available
The advent of the World Wide Web has made an enormous amount of information available to everyone and the widespread use of digital equip- ment enables end-users (peers) to produce their own digital content. This vast amount of information re- quires scalable data management systems. Peer-to-peer (P2P) systems have so far been well established in s...
Conference Paper
Rank-aware query processing has become essential for many applications that return to the user only the top-k objects based on the individual user's preferences. Top-k queries have been mainly studied from the perspective of the user, focusing primarily on efficient query processing. In this work, for the first time, we study top-k queries from the...
Conference Paper
Full-text available
Recently, the problem of efficiently supporting advanced query operators, such as nearest neighbor or range queries, over multidimensional data in widely distributed environments has attracted much attention. In unstructured peer-to-peer (P2P) networks, peers store data in an autonomous manner, thus multidimensional routing indices (MRI) are requir...
Article
Full-text available
Similarity search in P2P systems has attracted a lot of attention recently and several important applications, like distributed image search, can profit from the proposed distributed algorithms. In this paper, we address the challenging problem of efficient processing of range queries in metric spaces, where data is horizontally distributed across...
Conference Paper
Full-text available
Peer-to-peer (P2P) systems have been recently proposed for providing search and information retrieval facilities over distributed data sources, including web data. Terms and their document frequencies are the main building blocks of retrieval and as such need to be computed, aggregated, and distributed throughout the system. This is a tedious task,...
Conference Paper
Full-text available
Contemporary data-intensive applications generate large datasets of very high dimensionality. Data management in high-dimensional spaces presents problems, such as the degradation of query pro- cessing performance, a phenomenon also known as the curse of dimensionality. Dimensionality reduction (DR) tackles this prob- lem, by efficiently embedding...
Conference Paper
Full-text available
Traditional routing indices in peer-to-peer (P2P) networks are mainly designed for document retrieval applications and maintain aggregated one-dimensional values representing the number of documents that can be obtained in a certain direc- tion in the network. In this paper, we introduce the concept of multidimensional routing indices (MRIs), which...
Article
Traditional web service discovery is strongly related to the use of service directories. Especially in the case of mobile web services, where both service requestors and providers are mobile, the dynamics impose the need for directory-based discovery. Context plays an eminent role with mobility, as a filtering mechanism that enhances service discov...
Conference Paper
Full-text available
Due to applications and systems such as sensor networks, data streams, and peer-to-peer (P2P) networks, data generation and storage become increasingly distributed. Therefore a challenging problem is to support best-match query processing in highly distributed environments. In this paper, we present a novel framework for top-k query processing in l...
Conference Paper
Full-text available
XML is emerging as the de-facto standard for semistructured con- tents and metadata. Searching this content in mobile environments is challenging, since centralized approaches are not appropriate in a very dynamic environment with limited resources available for keeping a centralized index up-to-date. A more appropriate so- lution is to organize th...
Conference Paper
Full-text available
Lately the advances in centralized database management systems show a trend towards supporting rank-aware query operators, like top-k, that enable users to retrieve only the most interesting data objects. A challenging problem is to support rank-aware queries in highly distributed environments. In this paper, we present a novel approach, called SPE...
Conference Paper
Full-text available
Recently, skyline queries have attracted much attention in the database research community. Space partitioning tech- niques, such as recursive division of the data space, have been used for skyline query processing in centralized, paral- lel and distributed settings. Unfortunately, such grid-based partitioning is not suitable in the case of a paral...
Conference Paper
Full-text available
This paper addresses the challenging problem of similarity search over widely distributed ultra-high dimensional data. Such an appli- cation is retrieval of the top-k most similar documents in a widely distributed document collection, as in the case of digital libraries. Peer-to-peer (P2P) systems emerge as a promising solution to delve with conten...
Conference Paper
Full-text available
The World Wide Web provides an enormous amount of images easily accessible to everybody. The main challenge is to provide efficient search mechanisms for image content that are truly scalable and can support full coverage of web contents. In this paper, we present an architecture that adopts the peer-to-peer (P2P) paradigm for indexing, searching a...