Paweł KsieniewiczWrocław University of Science and Technology | WUT · Department of Systems and Computer Networks
Paweł Ksieniewicz
Ph.D., D.Sc
I am currently developing a pattern recognition lab at Department of Systems and Computer Networks of WUST.
About
78
Publications
9,763
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
776
Citations
Introduction
I am an Associate Professor at the Department of Systems and Computer Networks, Wrocław University of Science and Technology, Poland, where I'm heading the Pattern Recognition laboratory. My research is focused on ensemble classification systems, data stream processing, imbalanced data analysis and a wide topic of difficult data classification.
Additional affiliations
Education
March 2012 - June 2013
October 2008 - February 2012
Publications
Publications (78)
Practical applications of artificial intelligence increasingly often have to deal with the streaming properties of real data, which, considering the time factor, are subject to phenomena such as periodicity and more or less chaotic degeneration-resulting directly in the concept drifts. The modern concept drift detectors almost always assume immedia...
Simple neural network classification tasks are based on performing extraction as transformations of the set simultaneously with optimization of weights on individual layers. In this paper, the Representation 7 architecture is proposed, the primary assumption of which is to divide the inductive procedure into separate blocks – transformation and dec...
With the processing of data streams, come inevitable challenges, such as changes in the prior (class drift) and posterior (concept drift) probability distribution over the processing time. Both these phenomena have a negative impact on the quality of the classification. Heavily imbalanced problems, which are often typical for real-world application...
In recent years, deep neural networks have been employed increasingly often, which correlates with them receiving growing user trust. However, such systems cannot identify samples from unknown classes and often induce an incorrect decision with high confidence. This is aimed to be solved by open set recognition methods. The presented work looks at...
The following work addresses the problem of frameworks for data stream processing that can be used to evaluate the solutions in an environment that resembles real-world applications. The definition of structured frameworks stems from a need to reliably evaluate the data stream classification methods, considering the constraints of delayed and limit...
Concept drift in data stream processing remains an intriguing challenge and states a popular research topic. Methods that actively process data streams usually employ drift detectors, whose performance is often based on monitoring the variability of different stream properties. This publication provides an overview and analysis of metafeatures vari...
Tabular data is considered the last unconquered castle of deep learning, yet the task of data stream classification is stated to be an equally important and demanding research area. Due to the temporal constraints, it is assumed that deep learning methods are not the optimal solution for application in this field. However, excluding the entire -- a...
In the classification tasks, from raw data acquisition to the curation of a dataset suitable for use in evaluating machine learning models, a series of steps - often associated with high costs - are necessary. In the case of Natural Language Processing, initial cleaning and conversion can be performed automatically, but obtaining labels still requi...
In recent years Deep Neural Network-based systems are not only increasing in popularity but also receive growing user trust. However, due to the closed-world assumption of such systems, they cannot recognize samples from unknown classes and often induce an incorrect label with high confidence. Presented work looks at the evaluation of methods for O...
The research on methods trying to explain solutions based on Neural Networks (NN) is a vivid target of Machine Learning works in recent years. Such a need arose due to the black-box nature of neural models and their tendency to provide a high certainty for incorrect decisions regarding events outside the area covered by the training set. In this wo...
The development of technologies related to the TinyML concept observed in recent years forces us to consider the trade-off between inference time and recognition quality. Moreover, employing data stream processing methods to analyze large volumes of data in real-time becomes necessary. In this work, considering
Computer Vision
data streams with a...
Classifier ensembles have shown the ability to classify drifted data streams. The following paper proposes an ensemble consisting of a single hidden layer feedforward neural network and an Extreme Learning Machine. For this purpose, a new incremental version of the Extreme Learning Machine is also proposed. Motivations behind such an approach have...
Bulk construction of pattern classifiers, whether for optimizing input data configurations or method hyperparameters, is a computationally highly complex task. The main problem is the prediction quality evaluation function based on estimation using the selected experimental protocol. In the case of iterative optimization algorithms, such an evaluat...
Words don’t come easy, which fosters the use of generative artificial intelligence models in ongoing popularity of widely available applications such as ChatGPT. The result is an even greater flood of online content that takes time to process. It is where Natural Language Processing tools for classification come in handy. Distinguishing fake news,...
We often come across the seemingly obvious remark that the modern world is full of data. From the perspective of a regular Internet user, we perceive this as an abundance of content that we unintentionally consume every day, including links and amusing images that we receive from friends and content providers via webpages, social media, and other s...
The article presents the torchosr package - a Python package compatible with PyTorch library - offering tools and methods dedicated to Open Set Recognition in Deep Neural Networks. The package offers two state-of-the-art methods in the field, a set of functions for handling base sets and generation of derived sets for the Open Set Recognition task...
The problem’s complexity assessment is an essential element of many topics in the supervised learning domain. It plays a significant role in meta-learning – becoming the basis for determining meta-attributes or multi-criteria optimization – allowing the evaluation of the training set resampling without needing to rebuild the recognition model. The...
The classification of data stream susceptible to the concept drift phenomenon has been a field of intensive research for many years. One of the dominant strategies of the proposed solutions is the application of classifier ensembles with the member classifiers validated on their actual prediction quality. This paper is a proposal of a new ensemble...
The classification problem's complexity assessment is an essential element of many topics in the supervised learning domain. It plays a significant role in meta-learning -- becoming the basis for determining meta-attributes or multi-criteria optimization -- allowing the evaluation of the training set resampling without needing to rebuild the recogn...
Among the difficulties being considered in data stream processing, a particularly interesting one is the phenomenon of concept drift. Methods of concept drift detection are frequently used to eliminate the negative impact on the quality of classification in the environment of evolving concepts. This article proposes Statistical Drift Detection Ense...
The abundance of information in digital media, which in today's world is the main source of knowledge about current events for the masses, makes it possible to spread disinformation on a larger scale than ever before. Consequently, there is a need to develop novel fake news detection approaches capable of adapting to changing factual contexts and g...
stream-learn is a Python package compatible with scikit-learn and developed for the drifting and imbalanced data stream analysis. Its main component is a stream generator, which allows producing a synthetic data stream that may incorporate each of the three main concept drift types (i.e., sudden, gradual and incremental drift) in their recurring or...
The latest trends in computer networks bring new challenges and complex optimization problems, one of which is link dimensioning in Spectrally-Spatially Flexible Optical Networks. The time-consuming calculations related to determining the objective function representing the amount of accepted traffic require heuristics to search for good quality so...
Among the recently published works in the field of data stream analysis -- both in the context of classification task and concept drift detection -- the deficit of real-world data streams is a recurring problem. This article proposes a method for generating data streams with given parameters based on real-world static data. The method uses one-dime...
One of the significant problems of streaming data classification is the occurrence of concept drift, consisting of the change of probabilistic characteristics of the classification task. This phenomenon destabilizes the performance of the classification model and seriously degrades its quality. An appropriate strategy counteracting this phenomenon...
In this paper, we focus on the efficient dynamic routing in Spectrally-Spatially Flexible Optical Networks (ss-fon) realized using Single-Mode Fiber Bundles (smfbs). We study two scenarios – unprotected network (np) and network protected by dedicated path protection (dpp) against a single link failure. For these configurations, we propose a dedicat...
Ensembles of classifiers deserve attention because their stability and accuracy are usually superior compared to the single classifier. One of the aspects regarding the construction of multiple classifier systems is the fusion of each base model output. The state-of-the-art fusion of base classifiers approaches uses class labels, a rank array, or a...
Despite the fact that real-life data streams may often be characterized by the dynamic changes in the prior class probabilities, there is a scarcity of articles trying to clearly describe and classify this problem as well as suggest new methods dedicated to resolving this issue. The following paper aims to fill this gap by proposing a novel data st...
The following work aims to propose a new method of constructing an ensemble of classifiers diversified by the appropriate selection of the problem subspace. The experiments were performed on a numerical dataset in which three groups are present: healthy controls, glaucoma suspects, and glaucoma patients. Overall, it consists of medical records from...
This study aimed to assess the utility of optic nerve head (onh) en-face images, captured with scanning laser ophthalmoscopy (slo) during standard optical coherence tomography (oct) imaging of the posterior segment, and demonstrate the potential of deep learning (dl) ensemble method that operates in a low data regime to differentiate glaucoma patie...
Many researchers working on classification problems evaluate the quality of developed algorithms based on computer experiments. The conclusions drawn from them are usually supported by the statistical analysis and chosen experimental protocol. Statistical tests are widely used to confirm whether considered methods significantly outperform reference...
Fake news has now grown into a big problem for societies and also a major challenge for people fighting disinformation. This phenomenon plagues democratic elections, reputations of individual persons or organizations, and has negatively impacted citizens, (e.g., during the COVID-19 pandemic in the US or Brazil). Hence, developing effective tools to...
Fake news has now grown into a big problem for societies and also a major challenge for people fighting disinformation. This phenomenon plagues democratic elections, reputations of individual persons or organizations, and has negatively impacted citizens, (e.g., during the COVID-19 pandemic in the US or Brazil). Hence, developing effective tools to...
In the diversity of contemporary decision-making tasks, where the data is no longer static and changes over time, data stream processing has become an important issue in the field of pattern recognition. In addition, most of the real problems are not balanced, representing their classes in various improportions. Following paper proposes the Prior I...
A significant problem when building classifiers based on data stream is information about the correct label. Most algorithms assume access to this information without any restrictions. Unfortunately, this is not possible in practice because the objects can come very quickly and labeling all of them is impossible, or we have to pay for providing the...
Using fake news as a political or economic tool is not new, but the scale of their use is currently alarming, especially on social media. The authors of misinformation try to influence the users' decisions, both in the economic and political sphere. The facts of using disinformation during elections are well known. Currently, two fake news detectio...
In the era of a large number of tools and applications that constantly produce massive amounts of data, their processing and proper classification is becoming both increasingly hard and important. This task is hindered by changing the distribution of data over time, called the concept drift, and the emergence of a problem of disproportion between c...
Learning from imbalanced datasets is a challenging task for standard classification algorithms. In general, there are two main approaches to solve the problem of imbalanced data: algorithm-level and data-level solutions. This paper deals with the second approach. In particular, this paper shows a new proposition for calculating the weighted score f...
The problem of fake news has become one of the most challenging issues having an impact on societies. Nowadays, false information may spread quickly through social media. In that regard, fake news needs to be detected as fast as possible to avoid negative influence on people who may rely on such information while making important decisions (e.g., p...
Many real classification problems are characterized by a strong disturbance in a prior probability, which for the most of classification algorithms leads to favoring majority classes. The action most often used to deal with this problem is oversampling of the minority class by the smote algorithm. Following work proposes to employ a modification of...
The following paper considers pattern recognition-aided optimization of complex and relevant problem related to optical networks. For that problem, we propose a four-step dedicated optimization approach that makes use, among others, of a regression method. The main focus of that study is put on the construction of efficient regression model and its...
stream-learn is a Python package compatible with scikit-learn and developed for the drifting and imbalanced data stream analysis. Its main component is a stream generator, which allows to produce a synthetic data stream that may incorporate each of the three main concept drift types (i.e. sudden, gradual and incremental drift) in their recurring or...
We focus on optimization of dynamic spectrally-spatially flexible optical networks (SS-FONs), in which distance-adaptive, spectral super-channel (SCh) transmission is realized over weakly-coupled multi-core fibers (MCFs). In such networks, the inter-core crosstalk (XT) effect in MCFs impairs the quality of transmission (QoT) of optical signals, whi...
Following work focuses on optimization of dynamic spectrally-spatially flexible optical networks aided using supervised learning methods. Such kind of networks have distance-adaptive, spectral super-channel transmission realized over weakly-coupled multi-core fibers. Article proposes employing a pattern recognition approach with the goal to estimat...
The problem of the fake news publication is not new and it already has been reported in ancient ages, but it has started having a huge impact especially on social media users. Such false information should be detected as soon as possible to avoid its negative influence on the readers and in some cases on their decisions, e.g., during the election....
Imbalanced data classification is still a focus of intense research, due to its ever-growing presence in the real-life decision tasks. In this article, we focus on a classifier ensemble for imbalanced data classification. The ensemble is formed on the basis of the individual classifiers trained on supervise-selected feature subsets. There are sever...
In this work we explored capabilities of improving deep learning models performance by reducing the dataset imbalance. For our experiments a highly imbalanced ECG dataset MIT-BIH was used. Multiple approaches were considered. First we introduced mutliclass UMCE, the ensemble designed to deal with imbalanced datasets. Secondly, we studied the impact...
From one year to another, more and more vast amounts of data is being created in different fields of application. Great deal of those sources require real-time processing and analyzing, which leads to increased interest in streaming data classification field of machine learning. It is not rare, that many of those applications deal with somehow skew...
Following work tries to utilize a hybrid approach of combining Random Subspace method and smote oversampling to solve a problem of imbalanced data classification. Paper contains a proposition of the ensemble diversified using Random Subspace approach, trained with a set oversampled in the context of each reduced subset of features. Algorithm was ev...
The nature of analysed data may cause the difficulty of the many practical data mining tasks. This work is focusing on two of the important research topics associated with data analysis, i.e., data stream classification as well as data analysis with imbalanced class distributions. We propose the novel classification method, employing a classifier s...
Due to variety of modern real-life tasks, where analyzed data is often not a static set, the data stream mining gained a substantial focus of machine learning community. Main property of such systems is the large amount of data arriving in a sequential manner, which creates an endless stream of objects. Taking into consideration the limited resourc...
Following paper presents Exposer Ensemble (ee), being a combined classifier based on the original model of quantized subspace class distribution. It presents a method of establishing and processing the Planar Exposer – base representation of discrete class distribution over given subspace, and a proposition how to effectively fuse discriminatory po...
The difficulty of the many classification tasks lies in the analyzed data nature, as disproportionate number of examples from different class in a learning set. Ignoring this characteristics causes that canonical classifiers display strongly biased performance on imbalanced datasets. In this work a novel classifier ensemble forming technique for im...
The big data is usually described by so-called 5Vs (Volume, Velocity, Variety, Veracity, Value). The business success in the big data era strongly depends on the smart analytical software which can help to make efficient decisions (Value for enterprise). Therefore, the decision support software should take into consideration especially that we deal...
Contemporary classification systems have to make a decision not only on the basis of the static data, but on the data in motion as well. Objects being recognized may arrive continuously to a classifier in the form of data stream. Usually, we would like to start exploitation of the classifier as soon as possible, the models which can improve their m...
For the contemporary enterprises, possibility of appropriate business decision making on the basis of the knowledge hidden in stored data is the critical success factor. Therefore, the decision support software should take into consideration that data usually comes continuously in the form of so-called data stream, but most of the traditional data...
Remote sensing and hyperspectral data analysis are areas offering wide range of valuable practical applications. However, they generate massive and complex data that is very difficult to be analyzed by a human being. Therefore, methods for efficient data representation and data mining are of high interest to these fields. In this paper we introduce...
This work reports the research on active learning approach applied to the data stream classification. The chosen characteristics of the proposed frameworks were evaluated on the basis of the wide range of computer experiments carried out on the three benchmark data streams. Obtained results confirmed the usability of proposed method to the data str...
Data obtained by hyperspectral imaging gives us enough information to recreate the human vision, and also to extend it by a new methods to extract features coded in a light spectra. This work proposes a set of functions, based on abstraction of natural photoreceptors. The proposed method was employed as the feature extraction for the classification...
The paper concentrates on the problem of limited by the state institutions access to meteorological data, relevant to scientific research, that are regarded only as commercial product. The search for the new source of access was based on the case study of the Baltic Sea area. The aim of the project was to create a meteorological database, independe...
Nowadays, the hyperspectral imaging is the focus of intense research, because its applications can be very useful in the natural
disaster monitoring and agricultural monitoring to enumerate only a few. The main problem of systems using hyperspectral imaging
is the cost of labelling, because it requires the domain experts, who label the region or pr...
This work is focusing on the hyperspectral imaging classification, which is nowadays a focus of intense research. The hyperspectral imaging is widely used in agriculture, mineralogy, or food processing to enumerate only a few important domains. The main problem of such image classification is access to the ground truth, because it needs the experie...
Hyperspectral image analysis is a dynamically developing branch of computer vision due to the numerous practical applications and high complexity of data. There exist a need for introducing novel machine learning methods, that can tackle high dimensionality and large number of classes in these images. In this paper, we introduce a novel ensemble me...
Headache, medically known as cephalalgia, may have a wide range of symptoms and its types may be related and mixed. Its proper diagnosis is difficult and automatic diagnosis is usually rather imprecise, therefore, the problem is still the focus of intensive research. In the paper we propose headache diagnosis method which makes the decision on the...
Hyperspectral image analysis is among one of the current trends in computer vision and machine learning. Due to the high dimensionality, large number of classes, presence of noise and complex structure, this is not a trivial task. There exists a need for more precise and computationally efficient algorithms for hyperspectral image segmentation and...