About
293
Publications
85,044
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
18,193
Citations
Introduction
Skills and Expertise
Publications
Publications (293)
The radical advances in mobile computing, the IoT technological evolution along with cyberphysical components (e.g., sensors, actuators, control centers) have led to the development of smart city applications that generate raw or pre-processed data, enabling workflows involving the city to better sense the urban environment and support citizens' ev...
Counterfactual explanations have emerged as an important tool to understand, debug, and audit complex machine learning models. To offer global counterfactual explainability, state-of-the-art methods construct summaries of local explanations, offering a trade-off among conciseness, counterfactual effectiveness, and counterfactual cost or burden impo...
News articles generated by online media are a major source of information. In this work, we present News Monitor, a framework that automatically collects news articles from a wide variety of online news portals and performs various analysis tasks. The framework initially identifies fresh news (first stories) and clusters articles about the same inc...
The Infant Mortality Rate (IMR) is defined as the number of infants for every thousand infants that do not survive until their first birthday. IMR is an important metric not only because it provides information about infant births in an area, but it also measures the general societal health status. In the United States of America, the IMR is higher...
News articles generated by online media are a major source of information. In this work, we present News Monitor, a framework that automatically collects news articles from a variety of web pages and performs various analysis tasks. The framework initially identifies fresh news and clusters articles about the same incidents. For every story, it ext...
News portals, such as Yahoo News or Google News, collect large amounts of news articles from a variety of sources on a daily basis. Only a small portion of these documents can be selected and displayed on the homepage. Thus, there is a strong preference for major, recent events. In this work, we propose a scalable First Story Detection (FSD) pipeli...
Travel time estimation is a critical task, useful to many urban applications at the individual citizen and the stakeholder level. This paper presents a novel hybrid algorithm for travel time estimation that leverages historical and sparse real-time trajectory data. Given a path and a departure time we estimate the travel time taking into account th...
The Infant Mortality Rate (IMR) is the number of infants per 1000 that do not survive until their first birthday. It is an important metric providing information about infant health but it also measures the society's general health status. Despite the high level of prosperity in the U.S.A., the country's IMR is higher than that of many other develo...
Social event planning has received a great deal of attention in recent years where various entities, such as event planners and marketing companies, organizations, venues, or users in Event-based Social Networks , organize numerous social events (e.g., festivals, conferences, promotion parties). Recent studies show that "attendance" is the most com...
Applications targeting smart cities tackle common challenges, however solutions are seldom portable from one city to another due to the heterogeneity of smart city ecosystems. A major obstacle involves the differences in the levels of available information. In this work, we present REMI, which is a mining framework that handles varying degrees of i...
In this demonstration we present \emph{Dione} a novel framework for automatic profiling and tuning big data applications. Our system allows a non-expert user to submit Spark or Flink applications to his/her cluster and Dione automatically determines the impact of different configuration parameters on the application's execution time and monetary co...
We present low-rank methods for event detection. We assume that normal observation come from a low-rank subspace, prior to being corrupted by a uniformly distributed noise. Correspondingly, we aim at recovering a representation of the subspace, and perform event detection by running point-to-subspace distance query in $\ell^\infty$, for each incomi...
A major challenge for social event organizers (e.g., event planning companies, venues) is attracting the maximum number of participants, since it has great impact on the success of the event, and, consequently, the expected gains (e.g., revenue, artist/brand publicity). In this paper, we introduce the Social Event Scheduling (SES) problem, which sc...
A major challenge for social event organizers (e.g., event planning and marketing companies, venues) is attracting the maximum number of participants, since it has great impact on the success of the event, and, consequently, the expected gains (e.g., revenue, artist/brand publicity). In this paper, we introduce the Social Event Scheduling (SES) pro...
The proliferation of smart technologies has produced significant changes in the way people interact in a city. Smart traffic monitoring systems allow citizens and city operators to acquire a real-time view of the city traffic state. Furthermore, alternative means of transport, such as bike sharing systems, have enjoyed tremendous success in many ma...
Social networks have become the de facto online resource for people to share, comment on and be informed about events pertinent to their interests and livelihood, ranging from road traffic or an illness to concerts and earthquakes, to economics and politics. This has been the driving force behind research endeavors that analyse such data. In this p...
This paper examines the connectivity among political networks on Twitter. We explore dynamics inside and between the far right and the far left, as well as the relation between the structure of the network and sentiment. The 2015 Greek political context offers a unique opportunity to investigate political communication in times of political intensi...
Online Social Networks (OSNs) constitute one of the most important communication channels and are widely utilized as news sources. Information spreads widely and rapidly in OSNs through the word-of-mouth effect. However, it is not uncommon for misinformation to propagate in the network. Misinformation dissemination may lead to undesirable effects,...
Applications targeting Smart Cities tackle common challenges, however solutions are seldom portable from one city to another due to the heterogeneity of city ecosystems. A major obstacle involves the differences in the levels of available information. In this demonstration we present REMI, a reusable elements framework to handle varying degrees of...
In any competitive business, success is based on the ability to make an item more appealing to customers than the competition. A number of questions arise in the context of this task: how do we formalize and quantify the competitiveness between two items? Who are the main competitors of a given item? What are the features of an item that most affec...
Recommending nearby Points of Interest (POI) has received growing interest in mobile location-based networks today, where users share content embedded with location information. In this work, we propose a novel caching framework to support personalised proactive caching for mobile location-based social networks. We propose "LOCAI", which uses a pro...
The flourish of Web-based Online Social Networks (OSNs) has led to numerous applications that exploit social relationships to boost the influence of content in the network. However, existing approaches focus on the social ties and ignore how the topic of a post and its structure relate to its popularity. Our work assists in filling this gap. The co...
With the massive prevalence of smartphones, mobile social sensing systems in which humans acting as social sensors respond to geo-located crowdsourcing tasks, became extremely popular. Such systems can provide significant benefits particularly during crisis management and emergency situations. However, not only querying users can be extremely costl...
In this demo we present INSIGHT, a system that provides traffic event detection in Dublin by exploiting Big Data and Crowdsourcing techniques. Our system is able to process and analyze input from multiple heterogeneous urban data sources.
Urban data management is already an essential element of modern cities. The authorities can build on the variety of automatically generated information and develop intelligent services that improve citizens daily life, save environmental resources or aid in coping with emergencies. From a data mining perspective, urban data introduce a lot of chall...
Modern cities generate a flood of rich and varied data. New information sources like public transport and wearable devices provide opportunities for novel applications that will improve citizens׳ quality of life by reducing transportation time, enhancing city planning, and improving air quality to name a few applications. From a data science perspe...
Event detection is a research area that attracted attention during the last years due to the widespread availability of social media data. The problem of event detection has been examined in multiple social media sources like Twitter, Flickr, YouTube and Facebook. The task comprises many challenges including the processing of large volumes of data...
Applying real-time, cost-effective Complex Event processing (CEP) in the cloud has been an important goal in recent years. Distributed Stream Processing Systems (DSPS) have been widely adopted by major computing companies such as Facebook and Twitter for performing scalable event processing in streaming data. However, dynamically balancing the load...
In recent years crowdsourcing systems have shown to provide important benefits to Smartcities, where ubiquitous citizens, acting as mobile human sensors, assist in responding to signals and providing real-time information about city events, to improve the quality of life for businesses and citizens. In this paper we present REquEST, our approach to...
Top-k dominating queries combine the natural idea of selecting the k best items with a comprehensive "goodness" criterion based on dominance. A point p(1) dominates p(2) if p(1) is as good as P-2 in all attributes and is strictly better in at least one. Existing works address the problem in settings where data objects are multidimensional points. H...
Micro-blogging services such as Twitter have gained enormous popularity over the last few years leading to massive volumes of user generated content. A portion of this content is shared via geo-aware mobile devices, such as smartphones. Pieces of information shared on such a device can be tagged with the user׳s location, conditional on the user׳s s...
Supporting real-time, cost-effective execution of
Complex Event processing applications in the cloud has been an
important goal for many scientists in recent years. Distributed
Stream Processing Systems (DSPS) have been widely adopted by
major computing companies as a powerful approach for largescale
Complex Event processing (CEP). However, determi...
We present a subsequence matching framework that allows for gaps in both query and target sequences, employs variable matching tolerance efficiently tuned for each query and target sequence, and constrains the maximum matching range. Using this framework, a dynamic programming method is proposed, called SMBGT, that, given a short query sequence Q a...
Users in social networks utilize hashtags for a variety of reasons. In many cases, hashtags serve retrieval purposes by labeling the content they accompany. More often than not, hashtags are used to promote content, ideas, or conversations producing viral memes. This paper addresses a specific case of hashtag classification: meme-filtering. We argu...
Twitter is one of the most prominent social media platforms nowadays. A primary reason that has brought the medium at the spotlight of academic attention is its real-time nature, with people constantly uploading information regarding their surroundings. This trait, coupled with the service's data access policy for researchers and developers, has al...
We present a Software Keyboard for smart touchscreen devices that learns its
owner's unique dictionary in order to produce personalized typing predictions.
The learning process is accelerated by analysing user's past typed
communication. Moreover, personal temporal user behaviour is captured and
exploited in the prediction engine. Computational and...
Detecting traffic events using the sensor network infrastructure is an important service in urban environments that enables the authorities to handle traffic incidents. However, irregular measurements in such settings can derive either from faulty sensors or from unpredictable events. In this paper, we propose an efficient solution to resolve in re...
Researchers, nowadays, have at their disposal valuable data from social networking applications, of which Twitter and Facebook are the most prominent examples. To retrieve this content, the Twitter service provides 2 distinct Application Programming Interfaces (APIs): a probe-based and a streaming one, each of which imposes different limitations on...
We give an overview of an intelligent urban traffic management system. Complex events related to congestions are detected from heterogeneous sources involving fixed sensors mounted on intersections and mobile sensors mounted on public transport vehicles. To deal with data veracity, sensor disagreements are resolved by crowdsourcing. To deal with da...
A large number of mainstream applications, like temporal search, event detection, and trend identification, assume knowledge of the timestamp of every document in a given textual collection. In many cases, however, the required timestamps are either unavailable or ambiguous. A charac- teristic instance of this problem emerges in the context of larg...
Wireless sensor networks enable cost-effective data collection for tasks such as precision agriculture and environment monitoring. However, the resource-constrained nature of sensor nodes, which often have both limited computational capabilities and battery lifetimes, means that applications that use them must make judicious use of these resources....
Wireless sensor networks enable cost-effective data collection for tasks such as precision agriculture and environment monitoring. However, the resource-constrained nature of sensor nodes, which often have both limited computational capabilities and battery lifetimes, means that applications that use them must make judicious use of these resources....
Browsing the web is one of the most common activities that users engage in nowadays, and downloading web resources of interest, such as images, documents, music, etc., is part of this process. However, users would rather temporarily save that resource to a default path that they have easy access to (e.g. their "Desktop") than select the actual dire...
Many recent sensor devices are being equipped with flash memories due to their unique advantages: non-volatile storage, small size, shock-resistance, fast read access and power efficiency. The ability of storing large amounts of data in sensor devices necessitates the need for efficient indexing structures to locate required information.
The challe...
Skyline queries have emerged as an expressive and informative tool, with minimal user input and thus, they have gained widespread attention. However, previous research works tackle the problem from an efficiency standpoint, i.e., returning the skyline as fast as possible, leaving it to the user to manually inspect the entire skyline result. Clearly...
Intelligent transport management involves the use of voluminous amounts of uncertain sensor data to identify and effectively manage issues of congestion and quality of service. In particular, urban traffic has been in the eye of the storm for many years now and gathers increasing interest as cities become bigger, crowded, and “smart”. In this work...
Smartphones are nowadays equipped with a number of sensors, such as WiFi, GPS, accelerometers, etc. This capability allows smartphone users to easily engage in crowdsourced computing services, which contribute to the solution of complex problems in a distributed manner. In this work, we leverage such a computing paradigm to solve efficiently the fo...
In this paper, we focus the attention on the operator placement problem in Wireless Sensor Networks (WSN). This problem is very relevant for in-network query processing over WSN, where query routing trees are decomposed into three sub-components that must be processed at query time, namely operator tree, operator placement assignment scheme and rou...
Microblogging platforms are at the core of what is known as the Live Web: the most dynamic, and fast changing portion of the web, where content is generated constantly by the users, in snippets of information. Therefore, the Live Web (or Now Web) is a good source of information for event detection, because it reflects what is happening in the physi...
Location is prevalent in most applications nowadays, and is considered a first class citizen in social networks. Locational information is of great significance since it can be used to map information from the online back to the physical world, to contextualize information, or to provide localized recommendations through Location-Based Services (LB...
The last edition of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD) was held in Athens, Greece, during September 5 9, 2011. The paper 'Comparing Apples and Oranges: Measuring Differences between Exploratory Data Mining Results' by Tatti and Vreeken, provides a means to highligh...
In any competitive business, success is based on the ability to make an item more appealing to customers than the competition. A number of questions arise in the context of this task: how do we formalize and quantify the competitiveness relationship between two items? Who are the true competitors of a given item? What are the features of an item th...
We present "Hum-a-song", a system built for music retrieval, and particularly for the Query-By-Humming (QBH) application. According to QBH, the user is able to hum a part of a song that she recalls and would like to learn what this song is, or find other songs similar to it in a large music repository. We present a simple yet efficient approach tha...
We present "Hum-a-song", a system built for music retrieval, and particularly for the Query-By-Humming (QBH) application. According to QBH, the user is able to hum a part of a song that she recalls and would like to learn what this song is, or find other songs similar to it in a large music repository. We present a simple yet efficient approach tha...
The recent years have seen a proliferation of community sensing or participatory sensing paradigms, where individuals rely on the use of smart and powerful mobile devices to collect, store and analyze data from everyday life. Due to this massive collection of the data, a key challenge to all such developments, is to provide a simple but efficient w...
Link analysis ranking methods are widely used for summarizing the connectivity structure of large networks. We explore a weighted version of two common link analysis ranking algorithms, PageRank and HITS, and study their applicability to assistive environment data. Based on these methods, we propose a novel approach for identifying representative o...
Thousands of documents are made available to the users via the web on a daily
basis. One of the most extensively studied problems in the context of such
document streams is burst identification. Given a term t, a burst is generally
exhibited when an unusually high frequency is observed for t. While spatial and
temporal burstiness have been studied...
Most of today’s smart-phones are geared towards a single user experience, whether it is reading a book, watching a movie, playing a game or listening to music. However, there has been a shift towards providing a more complex and social experience: applications are being developed and deployed to help users connect and share information with each ot...
Molecular similarity is an important tool in protein and drug design for analyzing the quantitative relationships between physicochemical properties of two molecules. We present a family of similarity measures which exploits the ability of wavelet transformation to analyze the spectral components of physicochemical properties and suggests a sensiti...
We propose an embedding-based framework for subsequence matching in time-series databases that improves the efficiency of processing subsequence matching queries under the Dynamic Time Warping (DTW) distance measure. This framework partially reduces subsequence matching to vector matching, using an embedding that maps each query sequence to a vecto...
We propose a novel subsequence matching framework that allows for gaps in both the query and target sequences, variable matching tolerance levels efficiently tuned for each query and target sequence, and also constrains the maximum match length. Using this framework, a space and time efficient dynamic programming method is developed: given a short...
The popularity of portable electronics such as smartphones, PDAs and mobile devices and their increasing processing capabilities has enabled the development of several real-time mobile applications that require low-latency, high-throughput response and scalability. Supporting real-time applications in mobile settings is especially challenging due t...
In this paper we present a powerful distributed framework for finding similar trajectories in a smart phone network, without disclosing the traces of participating users. Our framework, coined Smart Trace, exploits opportunistic and participatory sensing in order to quickly answer queries o