Dimitris Sacharidis

Dimitris Sacharidis
  • TU Wien

About

95
Publications
6,447
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,310
Citations
Introduction
Skills and Expertise
Current institution
TU Wien

Publications

Publications (95)
Article
Full-text available
For many use-cases, it is often important to explain the prediction of a black-box model by identifying the most influential training data samples. Existing approaches lack customization for user intent and often provide a homogeneous set of explanation samples, failing to reveal the model's reasoning from different angles. In this paper, we propos...
Preprint
Full-text available
Counterfactual explanations have emerged as an important tool to understand, debug, and audit complex machine learning models. To offer global counterfactual explainability, state-of-the-art methods construct summaries of local explanations, offering a trade-off among conciseness, counterfactual effectiveness, and counterfactual cost or burden impo...
Article
Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traff...
Chapter
This chapter discusses the ethical complications and challenges arising from the use of AI systems in our everyday lives. It outlines recent and upcoming regulations and policies regarding the use of AI systems, and dives into the topics of explainability and fairness. We argue that trustworthiness has at its heart explainability, and thus we prese...
Article
The amount of publicly available geo-referenced data has seen a dramatic explosion over the past few years. Human activities generate data and traces that are now often transparently annotated with location and contextual information. At the same time, it has become easier than ever to collect and combine rich and diverse information about location...
Preprint
In this work, we present Fairness Aware Counterfactuals for Subgroups (FACTS), a framework for auditing subgroup fairness through counterfactual explanations. We start with revisiting (and generalizing) existing notions and introducing new, more refined notions of subgroup fairness. We aim to (a) formulate different aspects of the difficulty of ind...
Preprint
Full-text available
This paper studies algorithmic fairness when the protected attribute is location. To handle protected attributes that are continuous, such as age or income, the standard approach is to discretize the domain into predefined groups, and compare algorithmic outcomes across groups. However, applying this idea to location raises concerns of gerrymanderi...
Article
Set similarity join is an important problem with many applications in data discovery, cleaning and integration. To increase robustness, fuzzy set similarity join calculates the similarity of two sets based on maximum weighted bipartite matching instead of set overlap. This allows pairs of elements, represented as sets or strings, to also match appr...
Article
The amount of publicly available geo-referenced data has seen a dramatic increase over the last years. Many user activities generate data that are annotated with location and contextual information. It has also become easier to collect and combine rich and diverse location information. In the context of geoadvertising, the use of geosocial data for...
Article
Full-text available
The Internet of Things (IoT) enables smart objects to connect and share information, thus unlocking the potential for end users to receive more and better information and services. In the Social IoT (SIoT), objects adopt a social behavior, where they establish social connections to other objects and can operate autonomously in order to accomplish a...
Chapter
The number of published scientific papers has been constantly increasing in the past decades. As several papers can have low impact or questionable quality, identifying the most valuable papers is an important task. In particular, a key problem is being able to distinguish among papers based on their short-term impact, i.e., identify papers that ar...
Article
Full-text available
The amount of publicly available geo-referenced data has seen a dramatic increase over the last years. Many user activities generate data that are annotated with location and contextual information. Moreover, it has become easier to collect and combine rich and diverse location information. In the context of geoadvertising, the use of geosocial dat...
Conference Paper
Full-text available
Research on database and information technologies has been rapidly evolving over the last couple of years. This evolution was lead by three major forces: Big Data, AI and Connected World that open the door to innovative research directions and challenges, yet exploiting four main areas: (i) computational and storage resource modeling and organizati...
Preprint
Full-text available
The constantly increasing rate at which scientific papers are published makes it difficult for researchers to identify papers that currently impact the research field of their interest. Hence, approaches to effectively identify papers of high impact have attracted great attention in the past. In this work, we present a method that seeks to rank pap...
Article
Full-text available
The amount of publicly available geo-referenced data has seen a dramatic explosion over the past few years. Many user activities generate data that are annotated with location and contextual information. Furthermore, it has become easier to collect and combine rich and diverse location information. In the context of geoadvertising, the use of geoso...
Conference Paper
Location-based recommender systems learn from historical movement traces of users in order to make recommendations for places to visit, events to attend, itineraries to follow. As with other systems assisting humans in their decisions, there is an increasing need to scrutinize the implications of algorithmically made location recommendations. The c...
Article
As the rate at which scientific work is published continues to increase, so does the need to discern high-impact publications. In recent years, there have been several approaches that seek to rank publications based on their expected citation-based impact. Despite this level of attention, this research area has not been systematically studied. Past...
Article
Social recommender systems exploit two sources of information for making recommendations, the historical rating behavior of users, and the social connections among them. The basic assumption is that if two users are friends, they are likely to share similar preferences. Many recommendation approaches are based on such correlations between the ratin...
Conference Paper
Personalization is typically based on preferences extracted from the interactions of users with the system. A recent trend is to also account for the social influence among users, which may play a non-negligible role in shaping one's individual preferences. The underlying assumptions are that friends tend to develop similar taste, i.e., homophily,...
Conference Paper
In many settings, it is required that items are recommended to a group of users instead of a single user. Most often, when the decision criteria and preferences of the group as a whole are not known, the gold standard is to aggregate individual member preferences or recommendations. Such techniques typically presuppose some process under which grou...
Conference Paper
In many settings it is required that items are recommended to a group of users instead of a single user. Conventionally, the objective is to maximize the overall satisfaction among group members. Recently, however, attention has shifted to ensuring that recommendations are fair in that they should minimize the feeling of dissatisfaction among membe...
Article
Driven by technological advances in hardware (positioning systems, environmental sensors), software (standards, tools, network services), and aided by various open movements (open, linked, government data) and the ever-growing mentality of sharing for the greater good (crowdsourcing, crowdfunding, collaborative and volunteered geographic informatio...
Conference Paper
Given a set of geospatial objects, the Best Region Search problem finds the optimal placement of a fixed-size rectangle so that the value of a user-defined utility function over the enclosed objects is maximized. The existing algorithm for this problem computes only the top result. However, this is often quite restrictive in practice and falls shor...
Article
Service selection is a challenging task, and a lot of effort has been devoted to tools that assist the user in choosing the service whose non-functional parameters better match her/his preferences. In many practical situations, the responsibility to decide which is the appropriate service is shared among multiple parties. A standard approach to thi...
Conference Paper
Thousands of posts are generated constantly by millions of users in social media, with an increasing portion of this content being geotagged. Keeping track of the whole stream of this spatio-textual content can easily become overwhelming for the user. In this paper, we address the problem of selecting a small, representative and diversified subset...
Article
Full-text available
Driven by technological advances in hardware (positioning systems, environmental sensors), software (standards, tools, network services), and aided by various open movements (open, linked, government data) and the ever-growing mentality of sharing for the greater good (crowdsourcing, crowdfunding, collaborative and volunteered geographic informatio...
Conference Paper
Digital location traces can help build insights about how citizens experience their cities, but also offer personalized products and experiences to them. Even as data abound, though, building an accurate picture about citizen whereabouts is not always straightforward, due to noisy or incomplete data. In this paper, we address the following problem:...
Conference Paper
In this paper, we address the problem of continuously maintaining a concise, diversified summary of the contents of a sliding window over a stream of geotagged posts. Selecting posts to include in the summary takes into account both the criteria of coverage and diversity, and the summary is updated dynamically when the window slides. Our proposed s...
Article
Full-text available
Trajectory data capture the traveling history of moving objects such as people or vehicles. With the proliferation of GPS and tracking technologies, huge volumes of trajectories are rapidly generated and collected. Under this, applications such as route recommendation and traveling behavior mining call for efficient trajectory retrieval. In this pa...
Conference Paper
Large amounts of user-generated content are posted daily on the Web, including textual, spatial and temporal information. Exploiting this content to detect, analyze and monitor events and topics that have a potentially large span in space and time requires efficient retrieval and ranking based on criteria including all three dimensions. In this pap...
Article
Full-text available
Considering a group of users, each specifying individual preferences over categorical attributes, the problem of determining a set of objects that are objectively preferable by all users is challenging on two levels. First, we need to determine the preferable objects based on the categorical preferences for each user, and second we need to reconcil...
Conference Paper
The summed-area table (SAT), also known as integral image, is a data structure extensively used in computer graphics and vision for fast image filtering. The parallelization of its construction has been thoroughly investigated and many algorithms have been proposed for GPUs. Generally speaking, state-of-the-art methods cannot efficiently solve this...
Conference Paper
We propose and study a novel type of keyword search for locations. Sets of locations are selected and ranked based on their co-occurrence in user trails in addition to satisfying a set of query keywords. We formally define the problem, outline our approach, and present experimental results.
Conference Paper
Diversification has recently attracted a lot of attention, as a means to retrieve objects that are both relevant to a query and sufficiently dissimilar to each other. Since it is a computationally expensive problem, greedy techniques that iteratively identify the most promising objects are typically used. We focus on the sub-task within one iterati...
Conference Paper
Given a set of attractors and repellers, the cohesion query returns the point in database that is as close to the attractors and as far from the repellers as possible. Cohesion queries find applications in various settings, such as facility location problems, location-based services. For example, when attractors represent favorable plases, e.g., to...
Conference Paper
Full-text available
Trajectory data capture the traveling history of moving objects such as people or vehicles. With the proliferation of GPS and tracking technology, huge volumes of trajectories are rapidly generated and collected. Under this, applications such as route recommendation and traveling behavior mining call for efficient trajectory retrieval. In this pape...
Conference Paper
Full-text available
The ubiquity of mobile location aware devices and the proliferation of social networks have given rise to Location-Aware Social Networks (LASN), where users form social connections and make geo-referenced posts. The goal of this paper is to identify users that can influence a large number of important other users, within a given spatial region. Ret...
Conference Paper
Analyzing tracking data of various types of moving objects is an interesting research problem with numerous real-world applications. Several works have focused on continuously monitoring the nearest neighbors of a moving object, while others have proposed similarity measures for finding similar trajectories in databases containing historical tracki...
Data
The ubiquity of mobile location aware devices and the proliferation of social networks have given rise to Location-Aware Social Networks (LASN), where users form social connections and make geo-referenced posts. The goal of this paper is to identify users that can influence a large number of important other users, within a given spatial region. Ret...
Conference Paper
Full-text available
Skyline queries return the set of non-dominated tuples, where a tuple is dominated if there exists another with better values on all attributes. In the past few years the problem has been studied extensively, and a great number of external memory algorithms have been proposed. We thoroughly study the most important scan-based methods, which perform...
Conference Paper
Full-text available
MicroRNAs (miRNAs) are small RNA molecules that inhibit the expression of particular genes, a function that makes them useful towards the treatment of many diseases. Computational methods that predict which genes are targeted by particular miRNA molecules are known as target prediction methods. In this paper, we present a MapReduce-based system, te...
Conference Paper
Full-text available
Given a set of objects and a set of user preferences, both defined over a set of categorical attributes, the Multiple Categorical Preferences (MCP) problem is to determine the objects that are considered preferable by all users. In a naive interpretation of MCP, matching degrees between objects and users are aggregated into a single score which ran...
Conference Paper
Full-text available
The problem of providing meaningful routing directions over road networks is of great importance. In many real-life cases, the fastest route may not be the ideal choice for providing directions in written, spoken text, or for an unfamiliar neighborhood, or in cases of emergency. Rather, it is often more preferable to offer "simple" directions that...
Article
This work introduces decentralized query processing techniques based on MIDAS, a novel distributed multidimensional index. In particular, MIDAS implements a distributed k-d tree, where leaves correspond to peers, and internal nodes dictate message routing. MIDAS requires that peers maintain little network information, and features mechanisms that s...
Article
Recent advances in computational biology have raised sequence matching requirements that result in new types of sequence database problems. In this work, we introduce an important class of such problems, the approximate regional sequence matching (ARSM) problem. Given a data and a pattern sequence, an ARSM result is an approximate occurrence of a r...
Conference Paper
Full-text available
Determining the appropriate service for a user request is a two step process. Initially, the available services whose description agrees with that of the request service are discovered. Then, the service selection process assists users in choosing the service that better matches their in-tention. In many practical situations, the responsibility to...
Article
Full-text available
The recent advances in the infrastructure of Geographic Information Systems (GIS), and the proliferation of GPS technology, have resulted in the abundance of geodata in the form of sequences of points of interest (POIs), waypoints etc. We refer to these sequences as route collections. In this work, we consider path queries on frequently updated rou...
Conference Paper
Full-text available
The Linked Data Paradigm is one of the most promising technologies for publishing, sharing, and connecting data on the Web, and offers a new way for data integration and interoperability. However, the proliferation of distributed, inter-connected sources of information and services on the Web poses significant new challenges for managing consistent...
Conference Paper
Full-text available
This work presents a pure multidimensional, indexing infrastructure for large-scale decentralized networks that operate in extremely dynamic environments where peers join, leave and fail arbitrarily. We propose a new peer-to-peer variant implementing a virtual distributed k-d tree, and develop efficient algorithms for multidimensional point and ran...
Conference Paper
Full-text available
In the dynamic Pickup and Delivery Problem with Transfers (dPDPT), a set of transportation requests that arrive at arbitrary times must be assigned to a fleet of vehicles. We use two cost metrics that cap-ture both the company's and the customer's viewpoints regarding the quality of an assignment. In most related problems, the rule of thumb is to a...
Conference Paper
Full-text available
This work presents MIDAS-RDF, a distributed P2P RDF/S repository that is built on top of a distributed multi-dimensional index structure. MIDAS-RDF features fast retrieval of RDF triples satisfying various pattern queries by translating them into multi-dimensional range queries, which can be processed by the underlying index in hops logarithmic to...
Article
As the web is increasingly used not only to find answers to specific information needs but also to carry out various tasks, enhancing the capabilities of current web search engines with effective and efficient techniques for web service retrieval and selection becomes an important issue. Existing service matchmakers typically determine the relevanc...
Conference Paper
Full-text available
The skyline query returns the most interesting tuples according to a set of explicitly defined preferences among attribute values. This work relaxes this requirement, and allows users to pose meaningful skyline queries without stating their choices. To compensate for missing knowledge, we first determine a set of uncertain preferences based on user...
Article
Full-text available
The concept of k-anonymity has received considerable attention due to the need of several organizations to release microdata without revealing the identity of individuals. Although all previous k-anonymity techniques assume the existence of a public database (PD) that can be used to breach privacy, none utilizes PD during the anonymization process....
Conference Paper
Full-text available
Several applications in areas such as biochemistry, GIS, involve storing and querying large volumes of sequential data stored as path collections. There is a number of interesting queries that can be posed on such data. This work focuses on reachability queries: given a path collection and two nodes v s , v t , determine whether a path from v s to...
Article
Full-text available
As we move from a Web of data to a Web of services, enhancing the capabilities of the current Web search engines with effective and efficient techniques for Web services retrieval and selection becomes an important issue. Traditionally, the relevance of a Web service advertisement to a service request is determined by com-puting an overall score th...
Article
Full-text available
The wavelet decomposition is a proven tool for constructing concise synopses of large data sets that can be used to obtain fast approximate answers. Existing research studies focus on selecting an optimal set of wavelet coefficients to store so as to minimize some error metric, without however seeking to reduce the size of the wavelet coefficients...
Conference Paper
Full-text available
The vast majority of work on skyline queries considers totally ordered domains, whereas in many applications some attributes are partially ordered, as for instance, domains of set values, hierarchies, intervals and preferences. The only work addressing this issue has limited progressiveness and pruning ability, and it is only applicable to static s...
Article
Full-text available
Received: date / Accepted: date Abstract In the outsourced database model, a data ow- ner publishes her database through a third-party server; i.e., the server hosts the data and answers user queries on behalf of the owner. Since the server may not be trusted, or may be compromised, users need a means to verify that answers received are both authen...
Conference Paper
Full-text available
Efficient and scalable discovery mechanisms are critical for enabling service-oriented architectures on the Semantic Web. The majority of currently existing approaches focuses on centralized architectures, and deals with efficiency typically by pre-computing and storing the results of the semantic matcher for all possible query concepts. Such appro...
Conference Paper
Semantic Web service descriptions are typically multi- parameter constructs. Discovering semantically relevant services, given a desirable service description, is typically addressed by performing a pairwise logic-based match be- tween the requested and offered parameters. However, little or no attention is given to combining these partial results...
Conference Paper
Full-text available
Given a query tuple q, the dynamic skyline query retrieves the tuples that are not dynamically dominated by any other in the data set with respect to q. A tuple dynamically dominates another, w.r.t. q, if it has closer to q’s values in all attributes, and has strictly closer to q’s value in at least one. The dynamic skyline query can be treated as...
Conference Paper
Full-text available
We consider an environment of numerous moving objects, equipped with location-sensing devices and capable of communicating with a central coordinator. In this setting, we investigate the problem of maintaining hot motion paths, i.e., routes frequently followed by multiple objects over the recent past. Motion paths approximate portions of objects' m...
Chapter
Data analysis systems require range-aggregate query answering of large multidimensional datasets. We provide the necessary framework to build a retrieval system capable of providing fast answers with progressively increasing accuracy in support of range-aggregate queries. In addition, with error forecasting, we provide estimations on the accuracy o...
Conference Paper
Summarization is an important task in data mining. A major chal- lenge over the past years has been the efficient construction of fixed-space synopses that provide a deterministic quality guaran- tee, often expressed in terms of a maximum-error metric. His- tograms and several hierarchical techniques have been proposed for this problem. However, th...
Conference Paper
Full-text available
In the k-medoid problem, given a dataset P, we are asked to choose k points in P as the medoids. The optimal medoid set minimizes the average Euclidean distance between the points in P and their closest medoid. Finding the optimal k medoids is NP hard, and existing algorithms aim at approximate answers, i.e., they compute medoids that achieve a sma...
Conference Paper
In this paper we perform an extensive theoretical and exper- imental study on common synopsis construction algorithms, with emphasis on wavelet based techniques, that take under consideration query workload statistics. Our goal is to com- pare, "expensive" quadratic time algorithms with "cheap" near-linear time algorithms, particularly when the lat...
Conference Paper
The wavelet decomposition is a proven tool for constructing concise synopses of massive data sets and rapid changing data streams, which can be used to obtain fast approximate, with accuracy guarantees, answers. In this work we present a generic formulation for the problem of constructing optimal wavelet synopses under space constraints for vari- o...
Conference Paper
Recent years have seen growing interest in effective algorithms for summarizing and querying massive, high-speed data streams. Randomized sketch synopses provide accurate approximations for general-purpose summaries of the streaming data distribution (e.g., wavelets). The focus of existing work has typically been on minimizing space requirements of...
Chapter
Following the constant technological advancements that provide more processing power and storage capacity, scientific applications have emerged as a new field of interest for the database community. Such applications, termed Online Science Applications (OSA), require continuous interaction with datasets of multidimensional nature, mainly for perfor...
Conference Paper
The Discrete Wavelet Transform is a proven tool for a wide range of database applications. However, despite broad acceptance, some of its properties have not been fully explored and thus not exploited, particularly for two common forms of multidimensional decomposition. We introduce two novel operations for wavelet transformed data, termed SHIFT an...
Article
Data analysis systems require range-aggregate query answering of large multidimensional datasets. We provide the necessary framework to build a retrieval system capable of providing fast answers with progressively increasing accuracy in support of range-aggregate queries. In addition, with error forecasting, we provide estimations on the accuracy o...
Article
Full-text available
Wavelets have been extensively used for approximate, progressive or even exact evaluation of queries. However, the complete wavelet transform is not always the optimal form to store the data. We exploit the properties of the full tree of the wavelet decomposition, in order to find a representation for the dataset that minimizes the retrieval cost a...

Network

Cited By