Panos Vassiliadis

Panos Vassiliadis
University of Ioannina | UOI · Department of Computer Science and Engineering

PhD Computer Science, Nat. Techn. Univ. Athens
Schema evolution (see the homonymous project); Data analysis & storytelling (OLAP III Project); Data & S/W Engineering

About

227
Publications
95,886
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,976
Citations
Citations since 2017
43 Research Items
1612 Citations
2017201820192020202120222023050100150200250300
2017201820192020202120222023050100150200250300
2017201820192020202120222023050100150200250300
2017201820192020202120222023050100150200250300
Introduction
I have worked in the area of data warehousing (metadata management, OLAP, and quite emphatically, ETL) since the late '90s. Following a common thread in my work, I am currently investigating how the rigorous modeling of data, software and their interdependence can be exploited for the design, visualization and evolution management of data-intensive software ecosystems. Find out more at my homepage: http://www.cs.uoi.gr/~pvassil/
Education
January 1996 - June 2000
National Technical University of Athens
Field of study
  • Computer Science
September 1990 - June 1995
National Technical University of Athens
Field of study
  • Electrical and Computer Eng.

Publications

Publications (227)
Preprint
In this paper, we discuss methods to assess the interestingness of a query in an environment of data cubes. We assume a hierarchical multidimensional database, storing data cubes and level hierarchies. We provide a systematic taxonomy of the dimensions of interestingness, and specifically, relevance, surprise, novelty, and peculiarity. We propose s...
Chapter
Data narration is the activity of crafting narratives supported by facts extracted from data exploration and analysis, using interactive visualizations. While data narration has recently attracted much attention, the process of crafting data narratives is loosely documented and has not yet been formally described. In this article, we propose a comp...
Article
In this paper, we present the findings of a large study of the evolution of the schema of 195 Free Open Source Software projects. We identify families of evolutionary behaviors, or taxa, in FOSS projects. A large percentage of the projects demonstrate very few, if any, actions of schema evolution. Two other taxa involve the evolution via focused ac...
Article
Full-text available
In in-situ data management scenarios, large data files which do not fit in main memory, must be efficiently handled using commodity hardware, without the overhead of a preprocessing phase or the loading of data into a database. In this work, we study the challenges posed by the visual analysis tasks in in-situ scenarios in the presence of memory co...
Preprint
Full-text available
In this paper, we provide a comprehensive rigorous modeling for multidimensional spaces with hierarchically structured dimensions in several layers of abstractions and data cubes that live in such spaces. We model cube queries and their semantics and define typical OLAP operators like Selections, Roll-Up, Drill-Down, etc. The model serves as the ba...
Article
Full-text available
Assessment is the process of comparing the actual to the expected behavior of a business phenomenon and judging the outcome of the comparison. The assess querying operator has been recently proposed to support assessment based on the results of a query on a data cube. This operator requires (i) the specification of an OLAP query to determine a targ...
Article
Despite the particular standards, technologies, and trends (W3C, RESTful, microservices, etc.) that a team decides to follow for the development of a service-oriented system, most likely the team members will have to use one or more services that solve general-purpose problems like cloud computing, networking and content delivery, storage and datab...
Article
Modern data analysis applications, require the ability to provide on-demand integration of data sources while offering a flexible and user-friendly query interface. Traditional techniques for answering queries using views, focused on a rather static setting, fail to address such requirements. To overcome these issues, we propose a fully-fledged dat...
Conference Paper
Full-text available
In-situ processing has received a great deal of attention in recent years. In in-situ scenarios, big raw data files which do not fit in main memory, must be efficiently handled on-the-fly using commodity hardware, without the overhead of a preprocessing phase or the loading of data into a database system. This paper presents RawVis, an open source...
Article
Full-text available
Characteristic sets (CS) organize RDF triples based on the set of properties associated with their subject nodes. This concept was recently used in indexing techniques, as it can capture the implicit schema of RDF data. While most CS-based approaches yield significant improvements in space and query performance, they fail to perform well when answe...
Conference Paper
Full-text available
In-situ processing has received a great deal of attention in recent years. In in-situ scenarios, big raw data files which do not fit in main memory, must be efficiently handled using commodity hardware, without the overhead of a preprocessing phase or the loading of data into a database. In this work, we present an adaptive indexing scheme that ena...
Article
[See also http://www.cs.uoi.gr/~pvassil/projects/ploigia/info.html] Data exploration and visual analytics systems are of great importance in Open Science scenarios, where less tech-savvy researchers wish to access and visually explore big raw data files (e.g., json, csv) generated by scientific experiments using commodity hardware and without being...
Conference Paper
Data narration is the activity of producing stories supported by facts extracted from data analysis, possibly using interactive visual-izations. In spite of the increasing interest in data narration in several communities (e.g. journalism, business, e-government), there is no con-sensual definition of data narrative, let alone a conceptual or logic...
Conference Paper
Data narration has received increasing interest in several communities while lacking models and tools for handling, building and structuring data narratives. We present a simple prototype for supporting data narrative, based on a conceptual model defined in [4]. It guides a data narrator from scratch: fetch and explore data, abstract important mess...
Conference Paper
[ see more at http://www.cs.uoi.gr/~pvassil/projects/schemaBiographies/info.html and http://www.cs.uoi.gr/~pvassil/publications/2020_ER_FK-Evo/index.html ] In this paper, we study the evolution of tables in a schema with respect to the structure of the foreign keys to which tables are related. We organize a hierarchy of topological complexity for...
Conference Paper
The essence of refactoring is to improve source code quality, in a principled, behavior preserving, one step at the time, process. To this end, the developer has to figure out the refactoring steps, while working on a specific source code fragment. To facilitate this task, the documentation that explains each primitive refactoring typically provide...
Conference Paper
In this paper, we investigate the usage of naming conventions in SQL programming. To this end, we define a reference style, consisting of naming conventions that have been proposed in the literature. Then, we perform an empirical study that involves the database schemas of 21 open source projects. In our study, we evaluate the adherence of the name...
Conference Paper
Full-text available
This paper introduces the Traveling Analyst Problem (TAP), an original strongly NP-hard problem where an automated algorithm assists an analyst to explore a dataset, by suggesting the most interesting and coherent set of queries that are estimated to be completed under a time constraint. We motivate the problem, study its complexity, propose a simp...
Chapter
The essence of refactoring is to improve source code quality, in a principled, behavior preserving, one step at the time, process. To this end, the developer has to figure out the refactoring steps, while working on a specific source code fragment. To facilitate this task, the documentation that explains each primitive refactoring typically provide...
Chapter
In this paper, we investigate the usage of naming conventions in SQL programming. To this end, we define a reference style, consisting of naming conventions that have been proposed in the literature. Then, we perform an empirical study that involves the database schemas of 21 open source projects. In our study, we evaluate the adherence of the name...
Article
Full-text available
In this paper, we study the evolution of foreign keys in the context of schema evolution for relational databases. Specifically, we study the schema histories of a six free, open-source databases that contain foreign keys. Our findings verify previous results that schemata grow in the long run in terms of tables. To our surprise, we discovered that...
Chapter
In this paper, we discuss the problem of organizing the different ways of computing the interestingness of a particular cell derived from a cube in the context of a hierarchical, multidimensional space. We start from an in-depth study of the interestingness aspects in the study of human behavior and include in our survey the approaches taken by com...
Conference Paper
Developing a tool that provides support for different refactorings, through a set of refactoring detectors which identify opportunities for source code improvements, is not easy. Our experience in developing such a tool for refactoring object-oriented software revealed the Three-Step Refactoring Detector pattern. The main idea behind the pattern is...
Article
[ http://www.cs.uoi.gr/~pvassil/projects/olap_III/index.html ] This paper structures a novel vision for OLAPby fundamentally redefining several of the pillars on which OLAP has been based for the last 20 years. We redefine OLAP queries, in order to move to higher degrees of abstraction from roll-up's and drill-down's, and we propose a set of novel...
Conference Paper
Full-text available
[For more details, plz. refer to http://www.cs.uoi.gr/~pvassil/projects/olap_III/index.html ] This vision paper introduces several ideas around the optimization of Interactive Data Analysis (IDA) tasks. With an eye on traditional query optimization (QO) in Relational DataBase Management Systems (RDBMS), we suggest that IDA tasks should be specified...
Preprint
[This paper can be found in https://arxiv.org/abs/1812.07854 which is the live copy of the paper] This paper structures a novel vision for OLAP by fundamentally redefining several of the pillars on which OLAP has been based for the last 20 years. We redefine OLAP queries, in order to move to higher degrees of abstraction from roll-up's and drill-...
Chapter
Data exploration and visual analytics systems are of great importance in Open Science scenarios, where less tech-savvy researchers wish to access and visually explore big raw data files (e.g., json, csv) generated by scientific experiments using commodity hardware and without being overwhelmed in the tedious processes of data loading, indexing and...
Conference Paper
A basic prerequisite for any daily development task is to understand the source code that we are working with. To this end, the source code should be clean. Usually, it is up to us, the developers, to keep the source code clean. However, often there are parts of the code that are automatically generated. A typical such case are Graphical User Inter...
Conference Paper
Full-text available
Data exploration and visual analytics systems are of great importance in Open Science scenarios, where less tech-savvy researchers wish to access and visually explore big raw data files (e.g., json, csv) generated by scientific experiments using commodity hardware and without being overwhelmed in the tedious processes of data loading, indexing and...
Conference Paper
Full-text available
[Presented in DOLAP 2018, http://ceur-ws.org/Vol-2062/paper07.pdf and, also, http://www.cs.uoi.gr/~pvassil/publications/2018_DOLAP/ ] In this vision paper we structure a vision for the Business Intelligence of the near future in terms of a model with novel concepts and operators. We envision systems where the end-user requests information at a ve...
Article
Big Data architectures allow to flexibly store and process heterogeneous data, from multiple sources, in their original format. The structure of those data, commonly supplied by means of REST APIs, is continuously evolving. Thus data analysts need to adapt their analytical processes after each API release. This gets more challenging when performing...
Article
The increasing availability of diverse multidimensional data on the web has led to the creation and adoption of common vocabularies and practices that facilitate sharing, aggregating and reusing data from remote origins. One prominent example in the Web of Data is the RDF Data Cube vocabulary, which has recently attracted great attention from the i...
Article
Full-text available
In this paper, we study the factors that relate to the survival of a table in the context of schema evolution in open-source software. We study the history of the schema of eight open-source software projects that include relational databases and extract patterns related to the survival or death of their tables. Our study shows that the probability...
Conference Paper
In this paper, we focus on the study of the evolution of foreign keys in the broader context of schema evolution for relational databases. Specifically, we study the schema histories of a six free, open-source databases that contained foreign keys. Our findings concerning the growth of tables verify previous results that schemata grow in the long r...
Conference Paper
Evolving dependency magnets, i.e., software modules upon which a large number of other modules depend, is always a hard task. As Robert C. Martin has nicely summarized it (see http://www.oodesign.com/design-principles.html), fundamental problems of bad design that hinder evolution include immobility, i.e., difficulty in reuse, rigidity, i.e., the t...
Conference Paper
Full-text available
Plz., refer to http://www.cs.uoi.gr/~pmanousi/publications/2017_CAiSE/index.html Correctly identifying the embedded queries within the source code of an information system is a significant aid to developers and administrators, as it can facilitate the visualization of a map of the information system, the identification of areas affected by schema...
Conference Paper
Plz., visit http://www.cs.uoi.gr/~pvassil/publications/2017_CAiSE_Electrolysis/index.html and http://www.cs.uoi.gr/~pvassil/projects/schemaBiographies/index.html How can we plan development over an evolving schema? In this paper, we study the history of the schema of eight open source software projects that include relational databases and extra...
Conference Paper
Full-text available
[Presented in DOLAP 2017, Online at http://ceur-ws.org/Vol-1810/DOLAP_paper_09.pdf ] Big Data architectures allow to flexibly store and process heterogeneous data, from multiple sources, in their original format. The structure of those data, commonly supplied by means of REST APIs, is continuously evolving. Thus data analysts need to adapt their a...
Article
[Plz. refer to http://www.cs.uoi.gr/~pvassil/projects/schemaBiographies/index.html for details on the paper] A study of the update profile of tables, indicates that they are mostly rigid (without any updates to their schema at all) or quiet (with few updates), especially in databases that are more mature and heavily updated. Deletions are signific...
Conference Paper
Web services are black box dependency magnets. Hence, studying how they evolve is both important and challenging. In this paper, we focus on one of the most successful stories of the service-oriented paradigm in industry, i.e., the Amazon services. We perform a principled empirical study, that detects evolution patterns and regularities, based on L...
Conference Paper
Like all software systems, databases are subject to evolution as time passes. The impact of this evolution is tremendous as every change to the schema of a database affects the syntactic correctness and the semantic validity of all the surrounding applications and de facto necessitates their maintenance in order to remove errors from their source c...
Conference Paper
Full-text available
Multidimensional data are published in the web of data under common directives, such as the Resource Description Framework (RDF). The increasing volume and diversity of these data pose the challenge of finding relations between them in a most efficient and accurate way, by taking into advantage their overlapping schemes. In this paper we define two...
Article
Full-text available
Data-intensive ecosystems are conglomerations of data repositories surrounded by applications that depend on them for their operation. In this paper, we address the problem of performing what-if analysis for the evolution of the database part of a data-intensive ecosystem, in order to identify what other parts of an ecosystem are affected by a pote...
Article
In this paper we demonstrate that it is possible to enrich query answering with a short data movie that gives insights to the original results of an OLAP query. Our method, implemented in an actual system, CineCubes, includes the following steps. The user submits a query over an underlying star schema. Taking this query as input, the system comes u...
Conference Paper
Full-text available
In this paper, we study the version history of eight databases that are part of larger open source projects, and report on our observations on how evolution-related properties, like the possibility of deletion, or the amount of updates that a table undergoes, are related to observable table properties like the number of attributes or the time of bi...
Conference Paper
Full-text available
A class that provides a fat interface violates the interface segregation principle, which states that the clients of the class should not be coupled with methods that they do not need. Coping with this problem involves extracting interfaces that satisfy the needs of the clients. In this paper, we envision an interface extraction method that serves...
Conference Paper
Full-text available
The essence of refactoring is to improve software quality via the systematic combination of primitive refactorings. Yet, there are way too many refactorings. Choosing which refactorings to use, how to combine them and how to integrate them in more complex evolution tasks is really hard. Our vision is to provide the developer with a "trip advisor" f...
Article
Like all software systems, databases are subject to evolution as time passes. The impact of this evolution can be vast as a change to the schema of a database can affect the syntactic correctness and the semantic validity of all the surrounding applications. In this paper, we have performed a thorough, large-scale study on the evolution of database...
Conference Paper
Full-text available
Data-intensive ecosystems are conglomerations of one or more databases along with software applications that are built on top of them. This paper proposes a set of methods for providing visual maps of data-intensive ecosystems. We model the ecosystem as a graph, with modules (tables and queries embedded in the applications) as nodes and data provis...
Conference Paper
Full-text available
Lehman's laws of software evolution is a well-established set of observations (matured during the last forty years) on how the typical software systems evolve. However, the applicability of these laws on databases has not been studied so far. To this end, we have performed a thorough, large-scale study on the evolution of databases that are part of...
Conference Paper
Full-text available
The state of the art service search engines allow the users to pick the services they need, based on the quality properties, offered by these services. To this end, the users should inter-act with the search engines based on the quality models that are imposed by the engines. This is a significant restriction towards making the service-oriented par...
Article
Full-text available
Software cohesion concerns the degree to which the elements of a module belong together. Cohesive software is easier to understand, test and maintain. Improving cohesion is the target of several refactoring methods that have been proposed until now. These methods are tailored to operate by taking the source code into consideration. In the context o...
Conference Paper
Full-text available
Data-intensive ecosystems are conglomerations of data repos-itories surrounded by applications that depend on them for their opera-tion. To support the graceful evolution of the ecosystem's components we annotate them with policies for their response to evolutionary events. In this paper, we provide a method for the adaptation of ecosystems based o...
Conference Paper
Full-text available
In this paper we investigate how we can exploit the existence of a star schema in order to answer user OLAP queries with CineCube movies. Our method, implemented in an actual system, includes the following steps. The user submits a query over an underlying star schema. Taking this query as input, the system comes up with a set of queries complement...
Article
Full-text available
This document describes the final implementation and the evaluation of the CHOReOS middleware. Evaluation is achieved both via the use of the middleware on CHOReOS use-cases and via synthetic experiments and simulation. The conclusion was that the implementation of the CHOReOS middleware has achieved a good level of maturity for an open source proj...
Article
Extract-transform-load (ETL) workflows model the population of enterprise data warehouses with information gathered from a large variety of heterogeneous data sources. ETL workflows are complex design structures that run under strict performance requirements and their optimization is crucial for satisfying business objectives. In this paper, we dea...
Article
Full-text available
Self-service business intelligence is about enabling non-expert users to make well-informed decisions by enriching the decision process with situational data, i.e., data that have a narrow focus on a specific business problem and, typically, a short lifespan for a small group of users. Often, these data are not owned and controlled by the decisio...
Chapter
Full-text available
The CAiSE 98 paper “Architecture and Quality in Data Warehouses” and its expanded journal version [17] was the first to add a Zachman-like [35] explicit conceptual enterprise modeling perspective to the architecture of data warehouses. Until then, data warehouses were just seen as collections of – typically multidimensional and historized – materia...
Article
Full-text available
This is Part b of Deliverable D1.4, which specifies the final CHOReOS architectural style, that is, the types of components, connectors, and configurations that are composed within the Future Internet of services, as enabled by the CHOReOS technologies developed in WP2 to WP4 and integrated in the WP5 IDRE. The definition of the CHOReOS architectur...
Article
Full-text available
The Extract-Transform-Load (ETL) flows are essential for the success of a data warehouse and the business intelligence and decision support mechanisms that are attached to it. During both the ETL design phase and the entire ETL lifecycle, the ETL architect needs to design and improve an ETL design in a way that satisfies both performance and correc...
Conference Paper
Full-text available
The publishing of data with privacy guarantees is a task typically performed by a data curator who is expected to provide guarantees for the data he publishes in quantitative fashion, via a privacy criterion (e.g., k-anonymity, l-diversity). The anonymization of data is typically performed off-line. In this paper, we provide algorithmic tools that...
Article
Full-text available
The widespread of mobile ad hoc networking calls for a careful design of network functions in order to meet the application requirements and economize on the limited resources. In this paper we address the problem of distributing query messages among peers in mobile ad hoc networks. We assume that peers are organized in classes. Each peer possesses...
Book
This book constitutes the refereed proceedings of workshops, held at the 31st International Conference on Conceptual Modeling, ER 2012, in Florence, Italy in October 2012. The 32 revised papers presented together with 6 demonstrations were carefully reviewed and selected from 84 submissions. The papers are organized in sections on the workshops CMS...
Chapter
In this paper, the authors investigate the concept of designing user-centric transaction protocols toward achieving dependable coordination in AmI environments. As a proof-of-concept, this paper presents a protocol that takes into account the schedules of roaming users, which move from one AmI environment to another, avoiding abnormal termination o...
Chapter
The appropriate deployment of web service operations at the service provider site plays a critical role in the efficient provision of services to clients. In this paper, the authors assume that a service provider has several servers over which web service operations can be deployed. Given a workflow of web services and the topology of the servers,...
Article
Full-text available
Service-oriented computing is now acknowledged as a central paradigm for Internet computing, supported by tremendous research and technology development over the last 10 years. However, the evolution of the Internet, and in particular, the latest Future Internet vision, challenges the paradigm. Indeed, service-oriented computing has to face the ult...