• Home
  • IBM
  • IBM Research Ireland
  • Spyros Kotoulas
Spyros Kotoulas

Spyros Kotoulas
IBM · IBM Research Ireland

PhD

About

128
Publications
17,878
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,952
Citations
Additional affiliations
December 2009 - October 2011
Vrije Universiteit Amsterdam
Position
  • PostDoc Position
October 2011 - December 2015
IBM
Position
  • Researcher

Publications

Publications (128)
Conference Paper
Full-text available
This paper describes an application of Bayesian Networks to model persons with multimorbidity using measurements of vital signs and lifestyle assessments. The model was developed as part of a project on the use of wearable and home sensors and tablet applications to help persons with multimorbidity and their carers manage their conditions in daily...
Article
This paper describes an application of Bayesian Networks to mo-del persons with multimorbidity using measurements of vital signs and lifestyle assessments. The model was developed as part of a project on the use of wearable and home sensors and tablet applications to help persons with multimorbidity and their carers manage their conditions in daily...
Article
Managing multimorbidity entails processing distributed, dynamic and heterogeneous data using diverse analytics tools. We present KITE, a Cloud-based infrastructure allowing the aggregation and processing of health data using a dynamic set of analytical components. We showcase KITE in the context of the ProACT project, aiming at advancing home-based...
Article
There is a growing interest in identifying, weighing and accounting for the impact of health determinants that lie outside of the traditional healthcare system, yet there is a remarkable paucity of data and sources to sustain these efforts. Decision support systems would greatly benefit from leveraging models which are able to extend and use such c...
Conference Paper
Full-text available
Health and social care professionals are under increasing pressure to assimilate the ever-growing volume of data from case notes and electronic medical records. In this paper, we propose and evaluate with domain experts a cognitive system for patient-centric care that leverages and combines natural language processing, semantics, and learning from...
Conference Paper
Full-text available
We propose a cognitive system for patient-centric care that leverages and combines natural language processing, semantics, and learning from users over time to support care professionals working with large volumes of unstructured patient notes. The proposed methods highlight entities embedded in the unstruc-tured data to provide a holistic semantic...
Conference Paper
Optimal heuristic search has been successful in many domains, including journey planning, route planning and puzzle solving. Existing work typically assumes that the cost of each action can easily be obtained. However, in many problems, the exact edge cost is expensive to compute. Existing search algorithms face a significant performance bottleneck...
Conference Paper
Full-text available
Conversational message thread identification regards a wide spectrum of applications, ranging from social network marketing to virus propagation, digital forensics, etc. Many different approaches have been proposed in literature for the identification of conversational threads focusing on features that are strongly dependent on the dataset. In this...
Article
High-performance data processing systems typically utilize numerous servers with large amounts of memory. An essential operation in such environment is the parallel join, the performance of which is critical for data intensive operations. In many real-world workloads, data skew is omnipresent. Techniques that do not cater for the possibility of dat...
Article
Full-text available
We present an approach to automatically classify clinical text at a sentence level. We are using deep convolutional neural networks to represent complex features. We train the network on a dataset providing a broad categorization of health information. Through a detailed evaluation, we demonstrate that our method outperforms several approaches wide...
Preprint
Full-text available
We present an approach to automatically classify clinical text at a sentence level. We are using deep convolutional neural networks to represent complex features. We train the network on a dataset providing a broad categorization of health information. Through a detailed evaluation, we demonstrate that our method outperforms several approaches wide...
Article
Large-scale analytics is a key application area for data processing and parallel computing research. One of the most common (and challenging) operations in this domain is the join. Though inner join approaches have been extensively evaluated in parallel and distributed systems, there is little published work providing analysis of outer joins, espec...
Article
We propose a cognitive system for patient-centric care that leverages and combines natural language processing, semantics, and learning from users over time to support care professionals working with large volumes of patient notes. The proposed methods highlight the entities embedded in the unstructured data to provide a holistic semantic view of a...
Conference Paper
Full-text available
We present a domain-agnostic system for Question Answering over multiple semi-structured and possibly linked datasets without the need of a training corpus. The system is motivated by an industry use-case where Enterprise Data needs to be combined with a large body of Open Data to fulfill information needs not satisfied by prescribed application da...
Conference Paper
Full-text available
Providing appropriate support for the most vulnerable individuals carries enormous societal significance and economic burden. Yet, finding the right balance between costs, estimated effectiveness and the experience of the care recipient is a daunting task that requires considering vast amount of information. We present a system that helps care team...
Conference Paper
Big Data analytics largely rely on being able to execute large joins efficiently. Though inner join approaches have been extensively evaluated in parallel and distributed systems, there is little published work providing analysis of outer joins, especially on the extremely popular MapReduce platform. In this paper, we studied several current algori...
Conference Paper
Full-text available
Efficiently detecting conversation threads from a pool of messages, such as social network chats, emails, comments to posts, news etc., is relevant for various applications, including Web Marketing, Information Retrieval and Digital Forensics. Existing approaches focus on text similarity using keywords as features that are strongly dependent on the...
Article
Nowadays, most users carry high computing power mobile devices where speech recognition is certainly one of the main technologies available in every modern smartphone, although battery draining and application performance (resource shortage) have a big impact on the experienced quality. Shifting applications and services to the cloud may help to im...
Article
Providing appropriate support for the most vulnerable individuals carries enormous societal significance and economic burden. Yet, finding the right balance between costs, estimated effectiveness and the experience of the care recipient is a daunting task that requires considering vast amount of information. We present a system that helps care team...
Article
Distributed RDF data management systems become increasingly important with the growth of the Semantic Web. Regardless, current methods meet performance bottlenecks either on data loading or querying when processing large amounts of data. In this work, we propose efficient methods for processing RDF using dynamic data re-partitioning to enable rapid...
Article
Full-text available
The Semantic Web comprises enormous volumes of semi-structured data elements. For interoperability, these elements are represented by long strings. Such representations are not efficient for the purposes of applications that perform computations over large volumes of such information. A common approach to alleviate this problem is through the use o...
Conference Paper
Full-text available
DALI is a practical system that exploits Linked Data to provide fede-rated entity search and spatial exploration across hundreds of information sources containing Open and Enterprise data pertaining to cities, which are stored in tabular files or in their original enterprise systems. Our system is able to lift data into a meaningful linked structur...
Article
Full-text available
Outer joins are ubiquitous in many workloads and Big Data systems. The question of how to best execute outer joins in large parallel systems is particularly challenging, as real world datasets are characterized by data skew leading to performance issues. Although skew handling techniques have been extensively studied for inner joins, there is littl...
Article
Cloud businesses need comprehensive visibility on hardware and software components, their utilization and their configuration. In addition, they need to integrate such information with their asset management systems and publicly available information such as hardware specifications. In this paper, we present an approach for cloud management and mon...
Conference Paper
Knowledge on the Web relies heavily on multi-relational representations, such as RDF and Schema.org. Automatically extracting knowledge from documents and linking existing databases are common approaches to construct multi-relational data. Complementary to such approaches, there is still a strong demand for manually encoding human expert knowledge....
Conference Paper
Distributed RDF data management systems become increasingly important with the growth of the Semantic Web. Currently, several such systems have been proposed, however, their indexing methods meet performance bottlenecks either on data loading or querying when processing large amounts of data. In this work, we propose a high throughout index to enab...
Conference Paper
Full-text available
High-performance analytical data processing systems often run on servers with large amounts of memory. A common data structure used in such environment is the hash tables. This paper focuses on investigating efficient parallel hash algorithms for processing large-scale data. Currently, hash tables on distributed architectures are accessed one key a...
Conference Paper
We organize and present the 5th version of the International Workshop on Web-scale Knowledge Representation, Retrieval and Reasoning (Web-KR 2014) as a continuous effort to discuss and provide possible theories and techniques to deal with the barriers for knowledge processing at Web scale. This workshop was held in conjunction with the 2014 ACM Int...
Conference Paper
Full-text available
Academia and industry are investigating novel approaches for processing vast amounts of data coming from enterprises, the Web, social media and sensor readings in an area that has come to be known as Big Data. Logic programming has traditionally focused on complex knowledge structures/programs. The question arises whether and how it can be applied...
Conference Paper
Full-text available
The performance of joins in parallel database management systems is critical for data intensive operations such as querying. Since data skew is common in many applications, poorly engineered join operations result in load imbalance and performance bottlenecks. State-of-the-art methods designed to handle this problem offer significant improvements o...
Conference Paper
Full-text available
We introduce the design of a fully parallel framework for quickly ana-lyzing large-scale RDF data over distributed architectures. We present three core operations of this framework: dictionary encoding, parallel joins and indexing processing. Preliminary experimental results on a commodity cluster show that we can load large RDF data very fast whil...
Chapter
Full-text available
The Semantic Web is finally leaving the lab. In this article, we examine some practical, industry-oriented Semantic Web systems and discuss the costs and benefits on this disruptive technology. We focus on applications for cities and citizens and present a set of key challenges and solutions made possible using semantics at scale. When applicable,...
Conference Paper
Full-text available
We propose an efficient method for fast processing large RDF data over distributed memory. Our approach adopts a two-tier index architecture on each computation node: (1) a light-weight primary index, to keep loading times low, and (2) a dynamic, multi-level secondary index, calculated as a by-product of query execution, to decrease or remove inter...
Article
Full-text available
The success of a society is often judged by its ability to support the most vulnerable. Supporting the most vulnerable individuals is extremely challenging from an information needs perspective, since it requires data from numerous domains and systems, including Social Care, Healthcare, Public Safety and Juridical systems. Information sharing on th...
Article
Full-text available
More and more urban data is published every day, and consequently, consumers want to take advantage of this body of knowledge. Unfortunately, metadata and schema information around this content is sparse. To effectively fulfill user information needs, systems must be able to capture user intent and context in order to evolve beyond current search a...
Article
Full-text available
Patient-Centric Care requires comprehensive visibility into the strengths and vulnerabilities of individuals and populations. The systems involved in Patient-Centric Care are numerous and heterogeneous, span medical, behavioral and social domains and must be coordinated across government and NGO stakeholders in Health Care, Social Care and more. We...
Conference Paper
Many systems rely on distributed caches with thousands of nodes to improve response times and off-load underlying systems. Large-scale caching presents challenges in terms of resource utilization, load balancing, robustness and flexibility of deployment. In this paper, we propose a novel distributed caching method based on dynamic IP address assign...
Conference Paper
Full-text available
Outer joins are ubiquitous in many workloads but are sensitive to load-balancing problems. Current approaches mitigate such problems caused by data skew by using (partial) replication. However, contemporary replication-based approaches (1) introduce overhead, since they usually result in redundant data movement, (2) are sensitive to parameter tunin...
Conference Paper
Full-text available
Outer joins are ubiquitous in databases and big data systems. The question of how best to execute outer joins in large parallel systems is particularly challenging as real world datasets are characterized by data skew leading to performance issues. Although skew handling techniques have been extensively studied for inner joins, there is little publ...
Article
Full-text available
The Semantic Web comprises enormous volumes of semi-structured data elements. For interoperability, these elements are represented by long strings. Such representations are not efficient for the purposes of Semantic Web applications that perform computations over large volumes of information. A typical method for alleviating the impact of this prob...
Conference Paper
Full-text available
We present an approach to access and consolidate complex information spanning multiple specialist domains and make it available to non-experts. We are using a combination of business rules and contextual exploration to reduce interface complexity and improve consumability. We present a use case and a prototype on top of a real-world enterprise solu...
Article
Full-text available
We present SPUD, a semantic environment for cataloguing, exploring, integrating, understanding, processing and transforming urban information. A series of challenges are identified: namely, the heterogeneity of the domain and the impracticality of a common model, the volume of information and the number of data sets, the requirement for a low entry...
Conference Paper
Full-text available
Comprehensive Care requires comprehensive visibility on the strengths and vulnerabilities of individuals and populations. The systems involved in Care are numerous and heterogeneous, span very broad domains, such as Social Care, Healthcare and Public Safety, and draw on specialist knowledge from many disciplines. We present a system, based on Linke...
Conference Paper
A growing number of applications require continuous processing of high-throughput data streams, e.g., financial analysis, network traffic monitoring, or Big Data analytics in smart cities. Stream processing applications typically have explicit quality-of-service requirements; yet, due to the high time-variability of stream characteristics, it is in...
Article
Full-text available
Research on smart cities has emerged as an interdisciplinary field covering IT infrastructures, crowdsourcing, and utility services optimization, among others. This special issue focuses on deployed technologies for smart cities based on Internet technologies.
Conference Paper
Full-text available
The performance of parallel distributed data management systems becomes increasingly important with the rise of Big Data. Parallel joins have been widely studied both in the parallel processing and the database communities. Nevertheless, most of the algorithms so far developed do not consider the data skew, which naturally exists in various applica...
Article
As a continuous effort for organizing discussions and providing possible theories and techniques to deal with the barriers for knowledge processing at Web scale, the 2013 International Workshop on Web-scale Knowledge Representation, Retrieval and Reasoning (Web-KR 2013) was held in conjunction with the 2013 ACM International Conference on Informati...
Conference Paper
Full-text available
Several sources of information, from people, systems, things, are already available in most modern cities. Processing these continuous flows of information and capturing insight poses unique technical challenges that span from response time constraints to data heterogeneity, in terms of format and throughput. To tackle these problems, we focus on a...
Conference Paper
Full-text available
Governments and enterprises are interested in the return-on-investment for exposing their data. This brings forth the problem of making data consumable, with minimal effort. Beyond search techniques, there is a need for effective methods to identify heterogeneous datasets that are closely related, as part of data integration or exploration tasks. T...
Article
The large amount of Semantic Web data and its fast growth pose a significant computational challenge in performing efficient and scalable reasoning. On a large scale, the resources of single machines are no longer sufficient and we are required to distribute the process to improve performance.In this article, we propose a distributed technique to p...
Article
While recent progress has been achieved in understanding the structure and dynamics of social tagging systems, we know little about the underlying user motivations for tagging, and how they influence resulting folksonomies and tags. This paper addresses ...
Conference Paper
Full-text available
In this paper, we present QuerioCity, a platform to catalog, index and query highly heterogenous information coming from complex systems, such as cities. A series of challenges are identified: namely, the heterogeneity of the domain and the lack of a common model, the vol-ume of information and the number of data sets, the requirement for a low ent...
Conference Paper
Full-text available
We describe a system that incrementally translates SPARQL queries to Pig Latin and executes them on a Hadoop cluster. This system is designed to work efficiently on complex queries with many self-joins over huge datasets, avoiding job failures even in the case of joins with unexpected high-value skew. To be robust against cost estimation errors, ou...
Conference Paper
Full-text available
As the Semantic Web becomes mainstream, the performance of triple stores becomes increasingly important. Up until now, there have been various benchmarks and experiments that have attempted to evaluate the response time and query throughput of individual stores to show the weaknesses and strengths of triple store implementation. However, these eval...
Conference Paper
The rapid and perpetual growth of knowledge on the Web has given rise to many grand challenges (such as scalability, inconsistency, uncertainty, distribution and dynamics) for traditional knowledge processing methods and systems. Knowledge representation, retrieval and reasoning methods need to evolve and adapt to the Web to face these challenges a...
Article
We present iSeM (intelligent Service Matchmaker), a precise hybrid and adaptive matchmaker for semantic Web services, which exploits functional service descriptions in terms of logical signature annotations as well as specifications of preconditions ...
Conference Paper
The Semantic Web is considered a data integration system for different content and applications, in which every item has a specified meaning that machines can understand and process without the intervention of a human. Triple stores are the backbone of this "web of data", allowing storage and retrieval of semi-structured data as linked data usually...
Article
Full-text available
We are witnessing an explosion of available data from the Web, government authorities, scientific databases, sensors and more. Such datasets could benefit from the introduction of rule sets encoding commonly accepted rules or facts, application-or domain-specific rules, commonsense knowledge etc. This raises the question of whether, how, and to wha...
Article
We are recently experiencing an unprecedented explosion of available data coming from the Web, sensors readings, scientific databases, government authorities and more. Such datasets could benefit from the introduction of rule sets encoding commonly accepted rules or facts, application- or domain-specific rules, commonsense knowledge etc. This raise...
Article
In a Smarter City, available resources are harnessed safely, sustainably and efficiently to achieve positive, measurable economic and societal outcomes. Data and information from people, systems and things is the single most scalable resource available to city stakeholders but difficult to publish, organize, discover and consume, especially in a re...
Article
In this paper, we are presenting a scalable method for nonmonotonic rule-based reasoning over Semantic Web Data, using MapReduce. Our work is motivated by the recent unparalleled explosion of available data coming from the Web, sensor readings, databases, ontologies and more. Such datasets could benefit from the introduction of rule sets encoding c...
Conference Paper
As the availability of large scale RDF data sets has grown, there has been a corresponding growth in researchers' and practitioners' interest in analyzing and investigating these data sets. However, given their size and messiness, there is significant overhead in setting up the infrastructure to store and query them. In this paper, we present Tripl...
Conference Paper
Full-text available
The goal of this tutorial is to introduce, motivate and detail techniques for integrating heterogeneous structured data from across the Web. Inspired by the growth in Linked Data publishing, our tutorial aims at educating Web researchers and practitioners about this new publishing paradigm. The tutorial will show how Linked Data enables uniform acc...
Article
Full-text available
The Semantic Web [1] extends the World Wide Web by providing well-defined semantics to information and services. Through these semantics machines can “understand ” the Web, making it possible to query and reason over Web information, treating the Web as if it were a giant semi-structured database.
Conference Paper
Full-text available
Semantic Web data exhibits very skewed frequency distri- butions among terms. Ecient large-scale distributed rea- soning methods should maintain load-balance in the face of such highly skewed distribution of input data. We show that term-based partitioning, used by most distributed reason- ing approaches, has limited scalability due to load-balanci...
Conference Paper
Full-text available
In previous work we have shown that the MapReduce framework for distributed computation can be deployed for highly scalable inference over RDF graphs under the RDF Schema semantics. Unfortunately, several key optimizations that enabled the scalable RDFS inference do not generalize to the richer OWL semantics. In this paper we analyze these problems...
Article
Abstract Many Semantic Web problems are dicult,to solve through common divide-and-conquer strategies, since they are hard to partition. We present Marvin, a parallel and distributed platform for processing large amounts of RDF data, on a network of loosely-coupled peers. We present our divide-conquer-swap strategy and show that this model converges...
Conference Paper
Full-text available
We address the problem of scalable distributed reasoning, proposing a technique for materialising the closure of an RDF graph based on MapReduce. We have implemented our approach on top of Hadoop and deployed it on a compute cluster of up to 64 commodity machines. We show that a naive implementation on top of MapReduce is straightforward but perfor...
Conference Paper
Full-text available
Traditional reasoning tools for the Semantic Web cannot cope with Web scale data. One major direction to improve performance is parallelization. This article surveys existing studies, basic ideas and mechanisms for parallel reasoning, and introduces three major parallel applications on the Semantic Web: LarKC, MaRVIN, and Reasoning- Hadoop. Further...
Conference Paper
Full-text available