Marcin PaprzyckiInstytut Badań Systemowych Polskiej Akademii Nauk | IBSPAN · Intelligent Systems
Marcin Paprzycki
D.Sc.
About
538
Publications
195,512
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,694
Citations
Introduction
Additional affiliations
January 2013 - March 2013
August 2001 - August 2005
August 1990 - August 1997
Publications
Publications (538)
Modern systems often employ decentralised and distributed approaches. This can be attributed, among others, to the increasing complexity of system processes, which go beyond the capabilities of singular components. Additionally, with the growth in demand for system automation and high-level coordination, solutions belonging to the decentralised Art...
The landscape of computing technologies is changing rapidly, straining existing software engineering practices and tools. The growing need to produce and maintain increasingly complex multi-architecture applications makes it crucial to effectively accelerate and automate software engineering processes. At the same time, artificial intelligence (AI)...
Handling heterogeneity and unpredictability are two core problems in pervasive computing. The challenge is to seamlessly integrate devices with varying computational resources in a dynamic environment to form a cohesive system that can fulfill the needs of all participants. Existing work on systems that adapt to changing requirements typically focu...
This comprehensive survey serves as an indispensable resource for researchers embarking on the journey of fake news detection. By highlighting the pivotal role of dataset quality and diversity, it underscores the significance of these elements in the effectiveness and robustness of detection models. The survey meticulously outlines the key features...
Over the years, RDF streaming has been explored in research and practice from many angles, resulting in a wide range of RDF stream definitions. This variety presents a major challenge in discussing and integrating streaming systems due to a lack of a common language. This work attempts to address this critical research gap by systematizing RDF stre...
Recently, multiple applications of machine learning have been introduced. They include various possibilities arising when image analysis methods are applied to, broadly understood, video streams. In this context, a novel tool, developed for academic educators to enhance the teaching process by automating, summarizing, and offering prompt feedback o...
Recent years have been characterized by increasing interest in graph computations. This trend can be related to the large number of potential application areas. Moreover, increasing computational capabilities of modern computers allowed turning theory of graph algorithms into explorations of best methods for their actual realization. These factors,...
Currently, deploying machine learning workloads in the Cloud-Edge-IoT continuum is challenging due to the wide variety of available hardware platforms, stringent performance requirements , and the heterogeneity of the workloads themselves. To alleviate this, a novel, flexible approach for machine learning inference is introduced, which is suitable...
In this contribution, a novel optimization approach, derived from the behavioral patterns exhibited by Duroc pig herds, is proposed. In the developed metaheuristic, termed Artificial Duroc Pigs Optimization (ADPO), Ordered Fuzzy Numbers (OFN) have been applied to articulate and elucidate the behavioral dynamics of the pig herd. A series of experime...
With the rising popularity of artificial intelligence-based solutions, it is becoming important not only to deploy machine learning models/pipelines with a good accuracy, but also to be able to control and manage their documentation and information related to monitoring, performance tracking, etc. Moreover, crucial aspects of data that is to be use...
Reddit is the largest topically structured social network. Existing literature, reporting results of Reddit-related research, considers different phenomena, from social and political studies to recommender systems. The most common techniques used in these works, include natural language processing, e.g., named entity recognition, as well as graph n...
Cloud infrastructures operate in highly dynamic environments, and today, energy-focused optimization become crucial. Moreover, the concept of extended cloud infrastructure, which, among others, uses green energy, started to gain traction. This introduces a new level of dynamicity to the ecosystem, as “processing components” may “disappear” and “com...
Research into fake news detection has a long history, although it gained significant attention following the 2016 US election. During this time, the widespread use of social media and the resulting increase in interpersonal communication led to the extensive spread of ambiguous and potentially misleading news. Traditional approaches, relying solely...
This work concerns automation of the training process, using modern information technologies, including virtual reality (VR). The starting point is an observation that automotive and aerospace industries require effective methods of preparation of engineering personnel. In this context, the technological process of preparing operations of a CNC num...
Fall accidents in industrial and construction environments require an immediate reaction, to provide first aid. Shortening the time between the fall and the relevant personnel being notified can significantly improve the safety and health of workers. Therefore, in this work, an IoT system for real-time fall detection is proposed, using the ASSIST-I...
As the largest open social medium on the Internet, Reddit is widely studied in the scientific literature. Due to its structured form and division into topical subfora (subreddits), conducted research often concerns connections and interactions between users and/or whole, subreddit-structure-based communities. Overall, the relations between communit...
Continuous, real-time monitoring of occupational health and safety in high-risk workplaces such as construction sites can substantially improve the safety of workers. However, introducing such systems in practice is associated with a number of challenges, such as scaling up the solution while keeping its cost low. In this context, this work investi...
Recently, cloud computing has emerged as key way of delivering computing resources. Hence, research has focused on optimizing use of cloud resources. The following contribution presents an agent-based Extended Green Cloud Simulator, motivated by the Green Edge Processing project of the cloud company, CloudFerro. The simulator serves as a digital tw...
Pan-sharpening is a procedure to fuse the spatial detail of high-resolution multispectral images (HR-MSI) and low-resolution hyperspectral images (LR-HSI) to produce HR-MSI. Due to increase in high-resolution satellites, methods based on pan-sharpening are increasingly utilized all over the world. However, the majority of techniques consider pan-sh...
Nowadays, natural language processing (NLP) is one of the most popular areas of, broadly understood, artificial intelligence. Therefore, every day, new research contributions are posted, for instance, to the arXiv repository. Hence, it is rather difficult to capture the current "state of the field" and thus, to enter it. This brought the idea of ap...
Modern programming languages are very complex, diverse, and non-uniform in their structure, code composition, and syntax. Therefore, it is a difficult task for computer science students to retrieve relevant code snippets from large code repositories, according to their programming course requirements. To solve this problem, an AI-based approach is...
The concept of extended cloud requires efficient network infrastructure to support ecosystems reaching form the edge to the cloud(s). Standard network load balancing delivers static solutions that are insufficient for the extended clouds, where network loads change often. To address this issue, a genetic algorithm based load optimizer is proposed a...
Agent-based computing remains an active field of research with the goal of building (semi-)autonomous software for dynamic ecosystems. Today, this task should be realized using dedicated, specialized frameworks. Over almost 40 years, multiple agent platforms have been developed. While many of them have been “abandoned”, others remain active, and ne...
Multicollinearity occurs when there comes a high level of correlation between the independent variables. This correlation creates the problem because the independent variables should be independent. Higher the degree of correlation means more complex problems you will face while fitting the model and interpreting the results. In this paper, we have...
RDF data streaming has been explored by the Semantic Web community from many angles, resulting in multiple task formulations and streaming methods. However, for many existing formulations of the problem, reliably benchmarking streaming solutions has been challenging due to the lack of well-described and appropriately diverse benchmark datasets. Exi...
Mathematical models are used to study and predict the behavior of a variety of complex systems - engineering, physical, economic, social, environmental. Sensitivity studies are nowadays applied to some of the most complicated mathematical models from various intensively developing areas of applications. Sensitivity analysis is a modern promising te...
The concept of extended cloud requires efficient network infrastructure to support ecosystems reaching form the edge to the cloud(s). Standard approaches to network load balancing deliver static solutions that are insufficient for the extended clouds, where network loads change often. To address this issue, a genetic algorithm based load optimizer...
Availability of large amount of annotated data is one of the pillars of deep learning success. Although numerous big datasets have been made available for research, this is often not the case in real life applications (e.g. companies are not able to share data due to GDPR or concerns related to intellectual property rights protection). Federated le...
Recently, it has been stipulated that training larger and larger models, using ever increasing datasets is not sustainable in a long-run. Hence, the idea of Frugal Artificial Intelligence has been put forward. While there are many ways to make AI frugal, this contribution focuses on two of them, namely neural network pruning and binarization. Exper...
While development of very large models is the core of today’s artificial intelligence, very often the cost of model training is being raised. In this context, active learning is pointed to as a method to maximize model quality, while minimizing the amount of resources needed to train it. The aim of this contribution is to systematically compare per...
Nowadays, cataracts are one of the prevalent eye conditions that may lead to vision loss. Precise and prompt recognition of the cataract is the best method to prevent/treat it in early stages. Artificial intelligence-based cataract detection systems have been considered in multiple studies. There, different deep learning algorithms have been used t...
In practical realizations of a Federated Learning ecosystems, the parties cooperating during the training process, and that later use the trained/global model may consist of competing institutions. This can result in incentives for malicious behavior, which can infringe on the safety and data privacy of other participants. Additionally, even in cas...
With the recent advancements in technology, there has been a tremendous growth in the usage of images captured using satellites in various applications, like defense, academics, resource exploration, land-use mapping, and so on. Certain mission-critical applications need images of higher visual quality, but the images captured by the sensors normal...
While actual deployments of fifth generation (5G) networks are in their initial stages and the actual need for 5G in our daily lives remains an open question, their potential to deliver high speed, low latency, and dependable communication services remains promising. Nevertheless, sixth generation (6G) networks have been proposed as a way to enhanc...
Next Generation Internet of Things (NGIoT) addresses the deployment of complex, novel IoT ecosystems. These ecosystems are related to different technologies and initiatives, such as 5G/6G, AI, cybersecurity, and data science. The interaction with these disciplines requires addressing complex challenges related with the implementation of flexible so...
Detecting Personal Protective Equipment in images and video streams is a relevant problem in ensuring the safety of construction workers. In this contribution, an architecture enabling live image recognition of such equipment is proposed. The solution is deployable in two settings -- edge-cloud and edge-only. The system was tested on an active cons...
For many years, it was claimed that semantics should provide foundation of knowledge management in the enterprise. Today, it is easy to realize that this vision did not materialize. The aim of this work is to critically analyse the state of the art of use of semantic technologies in the enterprise and an attempt at diagnosing key problem(s).Keyword...
There are many areas where conventional supervised machine learning does not work well, for instance, in cases with a large, or systematically increasing, number of countably infinite classes. Zero-shot learning has been proposed to address this. In generalized settings, the zero-shot learning problem represents real-world applications where test i...
The vast body of scientific publications presents an increasing challenge of finding those that are relevant to a given research question, and making informed decisions on their basis. This becomes extremely difficult without the use of automated tools. Here, one possible area for improvement is automatic classification of publication abstracts acc...
Abundance of vastly heterogeneous, high-volume/high-velocity data producers/consumers, predominantly caused by proliferation of IoT-based solutions, results in an urgent need for efficient semantic interoperability solutions. Hence, the need to solve the problems of domain understanding, domain formal representation, and expression of mappings betw...
Researchers studying group behavior, social dynamics, or epidemiology lack an easy-to-use tool to run large-scale simulations. This contribution introduces a domain specific language (Agents Assembly; AASM) and a toolset for creating and running scalable simulations, using containerized environment. The proposed language supports describing abstrac...
Reusing ontologies in practice is still very challenging, especially when multiple ontologies are (jointly) involved. Moreover, despite recent advances, the realization of systematic ontology quality assurance remains a difficult problem. In this work, the quality of thirty biomedical ontologies, and the Computer Science Ontology are investigated,...
The aim of this contribution is to analyse practical aspects of the use of REST APIs and gRPC to realize communication tasks in applications in microservice-based ecosystems. On the basis of performed experiments, classes of communication tasks, for which given technology performs data transfer more efficiently, have been established. This, in turn...
In this work, the development of virtual reality software for “industrial applications” is considered. It is argued that, in this context, the vast experience from the development of computer games cannot be used directly. Especially, the specific nature of solutions dedicated to industrial applications requires taking into account their specificit...
New requirements, posed by the Next Generation IoT, demand design of novel reference architectures, providing foundation for implementation of Internet of Things (IoT) ecosystems. Building on cloud-native concepts (e.g. microservices, virtualisation, and containerization), a flexible architecture that answers requirements present in recent IoT depl...
The Management and Orchestration framework (MANO) is the main element of the Network Function Virtualization paradigm. It is in charge of managing the life cycle of virtualized functions, from instantiation to manageability, live configuration, and termination. This kind of framework was originally designed to orchestrate network functions over vir...
The aim of this contribution is to analyse practical aspects of the use of REST APIs and gRPC to realize communication tasks in applications in microservice-based ecosystems. On the basis of performed experiments, classes of communication tasks, for which given technology performs data transfer more efficiently, have been established. This, in turn...
Currently, mid 2022, one of important trends in machine learning is to move away from monster-size models, which need petabytes of data to train and, during training, use Giga Watts of energy. This movement (called Frugal AI) is caused, also by rapid growth of IoT deployments. There, intelligence at the edge involves resource constrained models. Th...
Federated learning (FL) was proposed to facilitate the training of models in a distributed environment. It supports the protection of (local) data privacy and uses local resources for model training. Until now, the majority of research has been devoted to "core issues", such as adaptation of machine learning algorithms to FL, data privacy protectio...
With the ongoing, gradual shift of large-scale distributed systems towards the edge-cloud continuum, the need arises for software solutions that are universal, scalable, practical, and grounded in well-established technologies. Simultaneously, semantic technologies, especially in the streaming context, are becoming increasingly important for enabli...
Availability of large amount of annotated data is one of the pillars of deep learning success. Although numerous big datasets have been made available for research, this is often not the case in real life applications (e.g. companies are not able to share data due to GDPR or concerns related to intellectual property rights protection). Federated le...
One of the important problems in federated learning is how to deal with unbalanced data. This contribution introduces a novel technique designed to deal with label skewed non-IID data, using adversarial inputs, created by the I-FGSM method. Adversarial inputs guide the training process and allow the Weighted Federated Averaging to give more importa...
Zero-shot learning is applied, for instance, when properly labeled training data is not available. A number of zero-shot algorithms have been proposed. However, since none of them seems to be an “overall winner”, development of a meta-classifier(s) combining “best aspects” of individual classifiers can be attempted. In this context, state-of-the-ar...
Reusing ontologies in practice is still very challenging, especially when multiple ontologies are involved. Moreover, despite recent advances, systematic ontology quality assurance remains a difficult problem. In this work, the quality of thirty biomedical ontologies, and the Computer Science Ontology, are investigated from the perspective of a pra...