![Sarunas Girdzijauskas](https://i1.rgstatic.net/ii/profile.image/409102862831617-1474549131896_Q128/Sarunas-Girdzijauskas-2.jpg)
Sarunas GirdzijauskasKTH Royal Institute of Technology | KTH · Department of Software and Computer systems
Sarunas Girdzijauskas
PhD in Computer Science
About
132
Publications
16,569
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,608
Citations
Introduction
I'm a professor at the School of Electrical Engineering and Computer Science (EECS) of the Royal Institute of Technology (KTH), Sweden as well as senior researcher at RISE SICS, Sweden.
My research focuses on Information Network Analytics, Decentralized Machine Learning, Data Mining, distributed systems & P2P overlays and recently on Blockchain technologies.
Publications
Publications (132)
In smart mobility, large networks of geographically distributed sensors produce vast amounts of high-frequency spatio-temporal data that must be processed in real time to avoid major disruptions. Traditional centralized approaches are increasingly unsuitable to this task, as they struggle to scale with expanding sensor networks, and reliability iss...
Wireless ray-tracing (RT) is emerging as a key tool for three-dimensional (3D) wireless channel modeling, driven by advances in graphical rendering. Current approaches struggle to accurately model beyond 5G (B5G) network signaling, which often operates at higher frequencies and is more susceptible to environmental conditions and changes. Existing o...
Decentralized Learning (DL) enables privacy-preserving collaboration among organizations or users to enhance the performance of local deep learning models. However, model aggregation becomes challenging when client data is heterogeneous, and identifying compatible collaborators without direct data exchange remains a pressing issue. In this paper, w...
Federated learning (FL) is a distributed learning paradigm that facilities a basic data-privacy level, as the clients do not have to share their raw data. Since the clients send local model updates, it increases the attack surface of FL—with possible attackers sharing poisoning updates with the aggregation server. In this work, we focus on the Sybi...
This is an extended version of our work in [16]. In this paper, we introduce two novel algorithms to collaboratively train Naive Bayes models across multiple private data sources: Federated Naive Bayes and Gossip Naive Bayes. Instead of directly providing access to their data, the data owners compute local updates that are then aggregated to build...
Self-supervised graph representation learning (SSGRL) is a representation learning paradigm used to reduce or avoid manual labeling. An essential part of SSGRL is graph data augmentation. Existing methods usually rely on heuristics commonly identified through trial and error and are effective only within some application domains. Also, it is not cl...
As the research community focuses on improving the reliability of deep learning, identifying out-of-distribution (OOD) data has become crucial. Detecting OOD inputs during test/prediction allows the model to account for discriminative features unknown to the model. This capability increases the model’s reliability since this model provides a class...
Cellular coverage quality estimation has been a critical task for self-organized networks. In real-world scenarios, deep-learning-powered coverage quality estimation methods cannot scale up to large areas due to little ground truth can be provided during network design & optimization. In addition they fall short in produce expressive embeddings to...
Federated Learning (FL) is a promising framework for distributed learning when data is private and sensitive. However, the state-of-the-art solutions in this framework are not optimal when data is heterogeneous and non-IID. We propose a practical and robust approach to personalization in FL that adjusts to heterogeneous and non-IID data by balancin...
We study the problem of training personalized deep learning models in a decentralized peer-to-peer setting, focusing on the setting where data distributions differ between the clients and where different clients have different local learning tasks. We study both covariate and label shift, and our contribution is an algorithm which for each client f...
Known vulnerabilities in software are solved through security patches; thus, applying such patches as soon as they are released is crucial to protect from cyber-attacks. The diffusion of open source software allowed to inspect the patches to understand whether they are security related or not. In this paper, we propose some solutions based on state...
Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral and psychological patterns due to challe...
Accurate routing network status estimation is a key component in Software Defined Networking. However, existing deep-learning-based methods for modeling network routing are not able to extrapolate towards unseen feature distributions. Nor are they able to handle scaled and drifted network attributes in test sets that include open-world inputs. To d...
We study the problem of training personalized deep learning models in a decentralized peer-to-peer setting, focusing on the setting where data distributions differ between the clients and where different clients have different local learning tasks. We study both covariate and label shift, and our contribution is an algorithm which for each client f...
Federated Learning (FL) is a promising framework for distributed learning when data is private and sensitive. However, the state-of-the-art solutions in this framework are not optimal when data is heterogeneous and non-Independent and Identically Distributed (non-IID). We propose a practical and robust approach to personalization in FL that adjusts...
Self-supervised Learning (SSL) aims at learning representations of objects without relying on manual labeling. Recently, a number of SSL methods for graph representation learning have achieved performance comparable to SOTA semi-supervised Graph Neural Networks (GNNs). One of the key challenges is data-augmentation, for which existing methods rely...
Graph representation learning on dynamic graphs has become an important task on several real-world applications, such as recommender systems, email spam detection, and so on. To efficiently capture the evolution of a graph, representation learning approaches employ deep neural networks, with large amount of parameters to train. Due to the large mod...
In this study, we present a meta-learning model to adapt the predictions of the network's capacity between viewers who participate in a live video streaming event. We propose the MELANIE model, where an event is formulated as a Markov Decision Process, performing meta-learning on reinforcement learning tasks. By considering a new event as a task, w...
IoT devices have been growing exponentially in the last few years. This growth makes them an attractive target for attackers due to their low computational power and limited security features. Attackers use IoT botnets as an instrument to perform DDoS attacks which caused major disruptions of Internet services in the last decade. While many works h...
Wearable sensors are widely used in activity recognition (AR) tasks with broad applicability in health and well-being, sports, geriatric care, etc. Deep learning (DL) has been at the forefront of progress in activity classification with wearable sensors. However, most state-of-the-art DL models used for AR are trained to discriminate different acti...
Self-supervised Learning (SSL) aims at learning representations of objects without relying on manual labeling. Recently, a number of SSL methods for graph representation learning have achieved performance comparable to SOTA semi-supervised GNNs. A Siamese network, which relies on data augmentation, is the popular architecture used in these methods....
Predicting the future trajectories of pedestrians is a challenging problem that has a range of application, from crowd surveillance to autonomous driving. In literature, methods to approach pedestrian trajectory prediction have evolved, transitioning from physics-based models to data-driven models based on recurrent neural networks. In this work, w...
In this paper we present a deep graph reinforcement learning model to predict and improve the user experience during a live video streaming event, orchestrated by an agent/tracker. We first formulate the user experience prediction problem as a classification task, accounting for the fact that most of the viewers at the beginning of an event have po...
In recommender systems (RSs), predicting the next item that a user interacts with is critical for user retention. While the last decade has seen an explosion of RSs aimed at identifying relevant items that match user preferences, there is still a range of aspects that could be considered to further improve their performance. For example, often RSs...
Real world data is mostly unlabeled or only few instances are labeled. Manually labeling data is a very expensive and daunting task. This calls for unsupervised learning techniques that are powerful enough to achieve comparable results as semi-supervised/supervised techniques. Contrastive self-supervised learning has emerged as a powerful direction...
Large scale contextual representation models have significantly advanced NLP in recent years, understanding the semantics of text to a degree never seen before. However, they need to process large amounts of data to achieve high-quality results. Joining and accessing all these data from multiple sources can be extremely challenging due to privacy a...
Real world data is mostly unlabeled or only few instances are labeled. Manually labeling data is a very expensive and daunting task. This calls for unsupervised learning techniques that are powerful enough to achieve comparable results as semi-supervised/supervised techniques. Contrastive self-supervised learning has emerged as a powerful direction...
Sequences of event intervals occur in several application domains, while their inherent complexity hinders scalable solutions to tasks such as clustering and classification. In this paper, we propose a novel spectral embedding representation of event interval sequences that relies on bipartite graphs. More concretely, each event interval sequence i...
In this study, we present a dynamic graph representation learning model on weighted graphs to accurately predict the network capacity of connections between viewers in a live video streaming event. We propose EGAD, a neural network architecture to capture the graph evolution by introducing a self-attention mechanism on the weights between consecuti...
In recommender systems (RSs), predicting the next item that a user interacts with is critical for user retention. While the last decade has seen an explosion of RSs aimed at identifying relevant items that match user preferences, there is still a range of aspects that could be considered to further improve their performance. For example, often RSs...
Predicting the future trajectories of pedestrians is a challenging problem that has a range of application, from crowd surveillance to autonomous driving. In literature, methods to approach pedestrian trajectory prediction have evolved, transitioning from physics-based models to data-driven models based on recurrent neural networks. In this work, w...
Progress in proteomics has enabled biologists to accurately measure the amount of protein in a tumor. This work is based on a breast cancer data set, result of the proteomics analysis of a cohort of tumors carried out at Karolinska Institutet. While evidence suggests that an anomaly in the protein content is related to the cancerous nature of tumor...
Graph representation learning (GRL) is a powerful technique for learning low-dimensional vector representation of high-dimensional and often sparse graphs. Most studies explore the structure and metadata associated with the graph using random walks and employ an unsupervised or semi-supervised learning schemes. Learning in these methods is context-...
The data shared over the Internet tends to originate from ubiquitous and autonomous sources such as mobile phones, fitness trackers, and IoT devices. Centralized and federated machine learning solutions represent the predominant way of providing smart services for users. However, moving data to central location for analysis causes not only many pri...
Graph representation learning (GRL) is a powerful technique for learning low-dimensional vector representation of high-dimensional and often sparse graphs. Most studies explore the structure and metadata associated with the graph using random walks and employ an unsupervised or semi-supervised learning schemes. Learning in these methods is context-...
Graph representation learning (GRL) is a powerful technique for learning low-dimensional vector representation of high-dimensional and often sparse graphs. Most studies explore the structure and metadata associated with the graph using random walks and employ an unsupervised or semi-supervised learning schemes. Learning in these methods is context-...
The data shared over the Internet tends to originate from ubiquitous and autonomous sources such as mobile phones, fitness trackers, and IoT devices. Centralized and federated machine learning solutions represent the predominant way of providing smart services for users. However, moving data to central location for analysis causes not only many pri...
Network representation learning (NRL) is a powerful technique for learning low-dimensional vector representation of high-dimensional and sparse graphs. Most studies explore the structure and metadata associated with the graph using random walks and employ an unsupervised or semi-supervised learning schemes. Learning in these methods is context-free...
Community networks are a growing network cooperation effort by citizens to build and maintain Internet infrastructure in regions that are not available. Adding that, to bring cloud services to community networks (CNs), microclouds were started as an edge cloud computing model where members cooperate using resources. Therefore, enhancing routing for...
Twitter Geo-tags that indicate the exact location of messages have many applications from localized opinion mining during elections to efficient traffic management in critical situations. However, less than 6% of Tweets are Geo-tagged, which limits the implementation of those applications. There are two groups of solutions: content and network-base...
Distributed pub/sub must make principal design choices with regards to overlay topologies and routing protocols. It is challenging to tackle both aspects together, and most existing work merely considers one. We argue the necessity to address both problems simultaneously since only the right combination of the two can deliver an efficient internet-...
Blockchains are attracting the attention of many technical, financial, and industrial parties, as a promising infrastructure for achieving secure peer-to-peer (P2P) transactional systems. At the heart of blockchains is proof-of-work (PoW), a trustless leader election mechanism based on demonstration of computational power. PoW provides blockchain s...
Given a large graph,the densest-subgraph problem asks to find a subgraph with maximum average degree. When considering the top-k version of this problem, a naïve solution is to iteratively find the densest subgraph and remove it in each iteration. However, such a solution is impractical due to high processing cost. The problem is further complicate...
Topic Detection (TD) refers to automatic techniques for locating topically related material in web documents [1]. Nowadays, massive amounts of documents are generated by users of Online Social networks (OSNs), in form of very short text, tweets and snippets of news. While Topic Detection in its traditional form is applied to a few documents contain...
Primitive partitioning strategies for streaming applications operate efficiently under two very strict assumptions: the resources are homogeneous and the messages are drawn from a uniform key distribution. These assumptions are often not true for the real-world use cases. Dealing with heterogeneity and non-uniform workload requires inferring the re...
In the past years, researchers developed approaches to detect spam in Online Social Networks (OSNs) such as URL blacklisting, spam traps and even crowdsourcing for manual classification. Although previous work has shown the effectiveness of using statistical learning to detect spam, existing work employs supervised schemes that require labeled trai...
Given a large graph, the densest-subgraph problem asks to find a subgraph with maximum average degree. When considering the top-k version of this problem, a naive solution is to iteratively find the densest subgraph and remove it in each iteration. However, such a solution is impractical due to high processing cost. The problem is further complicat...
Online social networks (OSNs) have successfully changed the way people interact. Online interactions among people span geographical boundaries and interweave with different human life activities. However, current OSNs identification schemes lack guarantees on quantifying the trustworthiness of online identities of users joining them. Therefore, dri...
DOSNs are distributed systems providing social networking services that become extremely popular in recent years. In DOSNs, the aim is to give the users control over their data and keeping data locally to enhance privacy. Therefore, identifying behavioral groups of users that share the same behavioral patterns in decentralized OSNs is challenging....
Online Social Networks exploit a lightweight process to identify their users so as to facilitate their fast adoption. However, such convenience comes at the price of making legitimate users subject to different threats created by fake accounts. Therefore, there is a crucial need to empower users with tools helping them in assigning a level of trust...