Data & Cloud Research Group

About the lab

Innovating in Data Management & Analytics

With an extensive scientific background, the Data & Cloud Research Group delivers innovations through cutting-edge data management approaches across the data path and through advanced machine and deep learning techniques.

Enhancing Infrastructures Management

Specializing in cloud, edge / fog, IoT and 5G environments, the Data & Cloud Research Group focuses on novel, AIOps techniques facilitating service management for the complete service lifecycle within and across heterogeneous infrastructures.

Featured research (30)

Edge computing has become a prominent solution when it comes to mobile applications and data management, due to its ability to considerably reduce data transmission costs, and to analyze data requiring fewer computing resources, since the analysis occurs at lower data volumes, without having to relocate data to centralized infrastructures. One major challenge, indicates the optimal data placement regarding data-intensive applications and, in general, applications requiring vast transmission of large amount of data. In this paper, we propose a novel user mobility-based data placement strategy, considering a trade-off between latency and data migration, which has not been investigated before. We classify the users into three mobility classes; namely static, local or mobile, via the use of a causal-aware Deep Learning network. This information is then exploited in order to optimize the data placement through specific data placement and retrieval algorithms for each mobility class. We evaluate the performance of the proposed solution using simulations, and prove that our solution manages to reduce the average data accessing cost by 60% for static or local users and 10% for mobile users, while the average path length is reduced by 50% for static and local users, and by 12% for mobile users.
The tremendous growth and usage of social media in modern societies have led to the production of an enormous real-time volume of social texts and posts, including tweets, that are being produced by users. These collections of social data can be potentially useful, but the extent of meaningful data in these collections is still of high research and business interest. One of the main elements in several application domains, such as policy making, addresses the scope of public opinion analysis. The latter is recently realized through sentiment analysis and Natural Language Processing (NLP), for identifying and extracting subjective information from raw texts. An additional challenge refers to the exploitation and correlation of the sentiment that can be derived for different entities into the same text or even a sentence to analyze the different sentiments that can be expressed for specific products, services, and topics by considering all available information that can be depicted within a text in a holistic way. Hence, this paper evaluates the utilization of an Entity-Level Sentiment Analysis (ELSA) approach on Twitter Data. The latter seeks to enhance the knowledge derived from tweets with the ultimate objective the overall enhancement of the policy making procedures of modern organizations and businesses.
Big Data is a phenomenon that affects today’s world, with new data being generated every second. Today’s enterprises face major challenges from the increasingly diverse data, as well as from indexing, searching, and analyzing such enormous amounts of data. In this context, several frameworks and libraries for processing and analyzing Big Data exist. Among those frameworks Hadoop MapReduce, Mahout, Spark, and MLlib appear to be the most popular, although it is unclear which of them best suits and performs in various data processing and analysis scenarios. This paper proposes EverAnalyzer, a self-adjustable Big Data management platform built to fill this gap by exploiting all of these frameworks. The platform is able to collect data both in a streaming and in a batch manner, utilizing the metadata obtained from its users’ processing and analytical processes applied to the collected data. Based on this metadata, the platform recommends the optimum framework for the data processing/analytical activities that the users aim to execute. To verify the platform’s efficiency, numerous experiments were carried out using 30 diverse datasets related to various diseases. The results revealed that EverAnalyzer correctly suggested the optimum framework in 80% of the cases, indicating that the platform made the best selections in the majority of the experiments.
Traditional survival analysis estimates the instantaneous failure rate of an event and predicts survival probabilities distributions. In fact, in a set of censored data there may exist several sub-populations with various risk profiles or survival distributions, for which regular survival analysis approaches do not take into consideration. Consequently, there is a need for discovering such sub-populations with unambiguous risk profiles and survival distributions. In this work, we propose a modified version of the K-Medoids algorithm which can be used to efficiently cluster censored data and identify diverse groups with distinct lifetime distributions.
Serverless computing has emerged as a revolutionary model that enables the deployment of applications and services by raising the level of abstraction from the underline resources. Its main functionality is enlightened by the notion of Function-asa-Service (FaaS) as the core means to realize efficient serverless offerings. Following the shift from traditional architectures to microservices -by attaining flexibility, productivity, portability, and performance in industrial-scale IT projects, the serverless model introduces even more fine-grained services, named “nanoservices”, which facilitate required scalability by abstracting the deployment and management of the infrastructure resources. On the application space, advances in big data analysis contribute towards extracting actionable knowledge in various application domains. In this context, approaches for big data analysis aim at exploiting the added value of serverless architectures. To this end, we are presenting an extendable and generalized approach for facilitating the provision of Machine Learning Functions-as-a-Service (MLFaaS). The proposed approach outstrips the classical atomic and standard isolated services by facilitating composite services, i.e., workflows/pipelines of ML tasks, thus enabling the realization of the complete data path functions as required by data scientists. We demonstrate the operation of the proposed approach by modeling a real-world analytics scenario as an ML workflow pipeline and evaluate its performance in terms of performance. Furthermore, we address the challenge of utilizing a function oriented service template recommendation system, by expanding the serverless functional boundaries towards a holistic Quality-of-Service (QoS)-aware service function selection approach based on Artificial Intelligence techniques. These techniques propose the optimal number of functions to be implemented in a pipeline by exploiting the importance of response time as the primary key of the application’s performance.

Lab head

Dimosthenis Kyriazis
  • Department of Digital Systems
About Dimosthenis Kyriazis
  • Dimosthenis Kyriazis currently works at the Department of Digital Systems, University of Piraeus. Dimosthenis does research in Distributed Computing, Parallel Computing and Software Engineering.

Members (16)

Argyro Mavrogiorgou
  • University of Piraeus
Athanasios Kiourtis
  • University of Piraeus
George Manias
  • University of Piraeus
Chrysostomos Symvoulidis
  • University of Piraeus
Konstantinos Mavrogiorgos
  • University of Piraeus
Spyridon Kleftakis
  • University of Piraeus
Yannis Poulakis
  • University of Piraeus
Andreas Karabetian
  • University of Piraeus