Michal Wozniak

Michal Wozniak
Wroclaw University of Science and Technology | WUT · Department of Systems and Computer Networks

Professor

About

317
Publications
55,836
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,020
Citations
Introduction
Professor, Department of Systems and Computer Networks, Wroclaw University of Technology Fields of interest - machine learning, especially inductive learning, data and web mining, learning on distributed and streaming data - pattern recognition, especially combined and compound classifiers, concept drift, recognition with context - telemedicine and medical decision support - computer and networks security, especially IDS, IPS, and anti-spam filters design - distributed algorithms
Additional affiliations
October 1996 - present
Wroclaw University of Science and Technology
Position
  • professor, leader of Machine Learning research grup, vice-chief of the Department of Systems and Computer Networks
Description
  • personal webpage http://www.kssk.pwr.wroc.pl/wozniak/?lang=en my research group webpage http://www.kssk.pwr.wroc.pl/machine-learning-team/1469/?lang=en
Education
October 1992 - October 1996
Wroclaw University of Science and Technology
Field of study
  • Computer Science
October 1987 - June 1992
Wroclaw University of Science and Technology
Field of study
  • Biomedical Engineering

Publications

Publications (317)
Preprint
Full-text available
The abundance of information in digital media, which in today's world is the main source of knowledge about current events for the masses, makes it possible to spread disinformation on a larger scale than ever before. Consequently, there is a need to develop novel fake news detection approaches capable of adapting to changing factual contexts and g...
Article
Full-text available
One of the most critical data analysis tasks is the streaming data classification, where we may also observe the concept drift phenomenon, i.e., changing the decision model’s probabilistic characteristics. From a practical point of view, we may face this type of banking, medicine, or cybersecurity task to enumerate only a few. A vital characteristi...
Article
Full-text available
Modern analytical systems must process streaming data and correctly respond to data distribution changes. The phenomenon of changes in data distributions is called concept drift , and it may harm the quality of the used models. Additionally, the possibility of concept drift appearance causes that the used algorithms must be ready for the continuous...
Preprint
Full-text available
Increasing neural network depth is a well-known method for improving neural network performance. Modern deep architectures contain multiple mechanisms that allow hundreds or even thousands of layers to train. This work is trying to answer if extending neural network depth may be beneficial in a life-long learning setting. In particular, we propose...
Article
Fake news detection is a challenging and complex task. Yet, several approaches to deal with this problem have already been proposed. The majority of solutions employ the NLP-based approach, where various architectures of a deep artificial neural network are proposed. However, as the experiments show, different NLP-based solutions have great perform...
Article
One of the vital problems with the imbalanced data classifier training is the definition of an optimization criterion. Typically, since the exact cost of misclassification of the individual classes is unknown, combined metrics and loss functions that roughly balance the cost for each class are used. However, this approach can lead to a loss of info...
Preprint
Full-text available
One of the significant problems of streaming data classification is the occurrence of concept drift, consisting of the change of probabilistic characteristics of the classification task. This phenomenon destabilizes the performance of the classification model and seriously degrades its quality. An appropriate strategy counteracting this phenomenon...
Article
Full-text available
Real-world classification domains, such as medicine, health and safety, and finance, often exhibit imbalanced class priors and have asynchronous misclassification costs. In such cases, the classification model must achieve a high recall without significantly impacting precision. Resampling the training data is the standard approach to improving cla...
Article
Classifier ensemble pruning is a strategy through which a subensemble can be identified via optimizing a predefined performance criterion. Choosing the optimum or suboptimum subensemble decreases the initial ensemble size and increases its predictive performance. In this article, a set of heuristic metrics will be analyzed to guide the pruning proc...
Chapter
Generative Adversarial Networks (GANs) are among the most popular contemporary machine learning algorithms. Despite remarkable successes in their developments, existing GANs cannot offer the appropriate tools to monitor their performance in a continual learning scenario when data distribution changes. We propose a complete framework for monitoring...
Article
Classifier ensembles are characterized by the high quality of classification, thanks to their generalizing ability. Most existing ensemble algorithms use all learning samples to learn the base classifiers that may negatively impact the ensemble’s diversity. Also, the existing ensemble pruning algorithms often return suboptimal solutions that are bi...
Article
Full-text available
Artificial intelligence (AI) has found a myriad of applications in many domains of technology, and more importantly, in improving people’s lives. Sadly, AI solutions have already been utilized for various violations and theft, even receiving the name AI or Crime (AIC). This poses a challenge: are cybersecurity experts thus justified to attack malic...
Preprint
Full-text available
Modern analytical systems must be ready to process streaming data and correctly respond to data distribution changes. The phenomenon of changes in data distributions is called concept drift, and it may harm the quality of the used models. Additionally, the possibility of concept drift appearance causes that the used algorithms must be ready for the...
Chapter
Real data streams often, in addition to the possibility of concept drift occurrence, can display a high imbalance ratio. Another important problem with real classification tasks, often overlooked in the literature, is the cost of obtaining labels. This work aims to connect three rarely combined research directions i.e., data stream classification,...
Article
Nowadays, societies, businesses and citizens are strongly dependent on information, and information has become one of the most crucial (societal and economical) values. People expect that both traditional and online media provide trustful and reliable news and content. The right to be informed is one of the fundamental requirements for making the r...
Chapter
The article presents models for detecting fake news and the results of the analyzes of the application of these models. The precision, f1-score, recall metrics were proposed as a measure of the model quality assessment. Neural network architectures, based on the state-of-the-art solutions of the Transformer type were applied to create the models. T...
Chapter
Many different decision problems require taking a compromise between the various goals we want to achieve into account. A specific group of features often decides the state of a given object. An example of such a task is the feature selection that allows increasing the decision’s quality while minimizing the cost of features or the total budget. Th...
Preprint
Full-text available
Real-world classification domains, such as medicine, health and safety, and finance, often exhibit imbalanced class priors and have asynchronous misclassification costs. In such cases, the classification model must achieve a high recall without significantly impacting precision. Resampling the training data is the standard approach to improving cla...
Article
Many researchers working on classification problems evaluate the quality of developed algorithms based on computer experiments. The conclusions drawn from them are usually supported by the statistical analysis and chosen experimental protocol. Statistical tests are widely used to confirm whether considered methods significantly outperform reference...
Article
The imbalanced data classification remains a vital problem. The key is to find such methods that classify both the minority and majority class correctly. The paper presents the classifier ensemble for classifying binary, non-stationary and imbalanced data streams where the Hellinger Distance is used to prune the ensemble. The paper includes an expe...
Preprint
Full-text available
The imbalanced data classification remains a vital problem. The key is to find such methods that classify both the minority and majority class correctly. The paper presents the classifier ensemble for classifying binary, non-stationary and imbalanced data streams where the Hellinger Distance is used to prune the ensemble. The paper includes an expe...
Chapter
Anomaly detection has been a challenging topic for decades and it still is open to new contributions nowadays. More specifically, the detection of anomalies (not only hardware ones but also those affecting the software) suffers from many problems when monitoring cyber-physical systems. One such usual problem is the much fewer data samples of anomal...
Book
This book constitutes the refereed proceedings of the Second Symposium on Machine Learning and Metaheuristics Algorithms, and Applications, SoMMA 2020, held in Chennai, India, in October 2020. Due to the COVID-19 pandemic the conference was held online. The 12 full papers and 7 short papers presented in this volume were thoroughly reviewed and se...
Preprint
Full-text available
Fake news has now grown into a big problem for societies and also a major challenge for people fighting disinformation. This phenomenon plagues democratic elections, reputations of individual persons or organizations, and has negatively impacted citizens, (e.g., during the COVID-19 pandemic in the US or Brazil). Hence, developing effective tools to...
Article
Fake news has now grown into a big problem for societies and also a major challenge for people fighting disinformation. This phenomenon plagues democratic elections, reputations of individual persons or organizations, and has negatively impacted citizens, (e.g., during the COVID-19 pandemic in the US or Brazil). Hence, developing effective tools to...
Chapter
Data difficulties as imbalanced class distribution cause that the methods which can produce reliable predictive models remain a focus of intense research. This work attempts employing the concept of Decision Templates for the mentioned classification task. Additionally, a modification to the original method is introduced, which uses many decision t...
Chapter
A significant problem when building classifiers based on data stream is information about the correct label. Most algorithms assume access to this information without any restrictions. Unfortunately, this is not possible in practice because the objects can come very quickly and labeling all of them is impossible, or we have to pay for providing the...
Chapter
Using fake news as a political or economic tool is not new, but the scale of their use is currently alarming, especially on social media. The authors of misinformation try to influence the users' decisions, both in the economic and political sphere. The facts of using disinformation during elections are well known. Currently, two fake news detectio...
Chapter
Streaming data analysis is currently a rapidly growing research direction. One of the serious problems hindering the data stream classification is the fact that during the exploitation of the model, its probabilistic characteristics may change. This phenomenon is called concept drift. Until today, multiple methods have been proposed to overcome the...
Article
free access till end of October 2020 -> use this link https://authors.elsevier.com/a/1blq25a7-GjBOl This work aims to connect two rarely combined research directions, i.e., non-stationary data stream classification and data analysis with skewed class distributions. We propose a novel framework employing stratified bagging for training base classif...
Article
Automatic control of physiological variables is one of the most active areas in biomedical engineering. This paper is centered in the prediction of the analgesic variables evolution in patients undergoing surgery. The proposal is based on the use of hybrid intelligent modelling methods. The study considers the Analgesia Nociception Index (ANI) to a...
Article
Learning from imbalanced data is among the most popular topics in the contemporary machine learning. However, the vast majority of attention in this field is given to binary problems, while their much more difficult multiclass counterparts are relatively unexplored. Handling data sets with multiple skewed classes poses various challenges and calls...
Article
Full-text available
One of the crucial problems of designing a classifier ensemble is the proper choice of the base classifier line-up. Basically, such an ensemble is formed on the basis of individual classifiers, which are trained in such a way to ensure their high diversity or they are chosen on the basis of pruning which reduces the number of predictive models in o...
Article
Full-text available
The imbalanced data classification is one of the most crucial tasks facing modern data analysis. Especially when combined with other difficulty factors, such as the presence of noise, overlapping class distributions, and small disjuncts, data imbalance can significantly impact the classification performance. Furthermore, some of the data difficulty...
Article
Multiple classifier systems (MCSs) constitute one of the most competitive paradigms for obtaining more accurate predictions in the field of machine learning. Systems of this type should be designed efficiently in all of their stages, from data preprocessing to multioutput decision fusion. In this article, we present a framework for utilizing the po...
Chapter
Imbalanced data analysis remains one of the critical challenges in machine learning. This work aims to adapt the concept of Dynamic Classifier Selection (dcs) to the pattern classification task with the skewed class distribution. Two methods, using the similarity (distance) to the reference instances and class imbalance ratio to select the most con...
Chapter
Multi-class imbalanced classification tasks are characterized by the skewed distribution of examples among the classes and, usually, strong overlapping between class regions in the feature space. Furthermore, frequently the goal of the final system is to obtain very high precision for each of the concepts. All of these factors contribute to the com...
Chapter
The problem of fake news has become one of the most challenging issues having an impact on societies. Nowadays, false information may spread quickly through social media. In that regard, fake news needs to be detected as fast as possible to avoid negative influence on people who may rely on such information while making important decisions (e.g., p...
Chapter
The classification of imbalanced data streams is gaining more and more interest. However, apart from the problem that one of the class is not well represented, there are problems typical for data stream classification, such as limited resources, lack of access to the true labels and the possibility of occurrence of the concept drift. Possibility of...
Preprint
Full-text available
The imbalanced data classification is one of the most crucial tasks facing modern data analysis. Especially when combined with other difficulty factors, such as the presence of noise, overlapping class distributions, and small disjuncts, data imbalance can significantly impact the classification performance. Furthermore, some of the data difficulty...
Chapter
Imbalanced data streams have gained significant popularity among the researchers in recent years. This area of research is not only still greatly underdeveloped, but there are also numerous inherent difficulties that need to be addressed when creating algorithms that could be utilized in such dynamic environment and achieve satisfactory results whe...
Chapter
Learning from the non-stationary imbalanced data stream is a serious challenge to the machine learning community. There is a significant number of works addressing the issue of classifying non-stationary data stream, but most of them do not take into consideration that the real-life data streams may exhibit high and changing class imbalance ratio,...
Chapter
Full-text available
Learning from imbalanced data is a vital challenge for pattern classification. We often face the imbalanced data in medical decision tasks where at least one of the classes is represented by only a very small minority of the available data. We propose a novel framework for training base classifiers and preparing the dynamic selection dataset (dsel)...
Article
Full-text available
Computational intelligence is a very active and fruitful research of artificial intelligence with a broad spectrum of applications. Remote sensing data has been a salient field of application of computational intelligence algorithms, both for the exploitation of the data and for the research/development of new data analysis tools. In this editorial...
Chapter
Full-text available
Multiple classifier systems have proven superiority over individual ones to solve classification tasks. One of the main issues in those solution relies in data size, when the amount of data to be analyzed becomes huge. In this paper, the performance of ensemble system to succeed by using only portions of the available data is analyzed. For this, ex...
Chapter
The classification of data streams is a frequently considered problem. The data coming in over time has a tendency to change its characteristics over time and usually we also encounter some difficulties in data distributions as inequality of the number of learning examples from considered classes. The combination of these two phenomena is an additi...
Chapter
Imbalanced data classification is still remaining thje important topic and during the past decades, plenty of works are devoted to this field of study. More and more real-life based imbalanced class problems inspired researchers to come up with new solutions with better performance. Various techniques are employed such as data handling approaches,...
Book
This book constitutes the refereed proceedings of the First Symposium on Machine Learning and Metaheuristics Algorithms, and Applications, held in Trivandrum, India, in December 2019. The 17 full papers and 6 short papers presented in this volume were thoroughly reviewed and selected from 53 qualified submissions. The papers cover such topics as m...
Book
This book highlights recent research on computer recognition systems, one of the most promising directions in artificial intelligence. Offering the most comprehensive study on this field to date, it gathers 36 carefully selected articles contributed by experts on pattern recognition. Presenting recent research on methodology and applications, the...
Chapter
The problem of the fake news publication is not new and it already has been reported in ancient ages, but it has started having a huge impact especially on social media users. Such false information should be detected as soon as possible to avoid its negative influence on the readers and in some cases on their decisions, e.g., during the election....
Chapter
Imbalanced data classification is still a focus of intense research, due to its ever-growing presence in the real-life decision tasks. In this article, we focus on a classifier ensemble for imbalanced data classification. The ensemble is formed on the basis of the individual classifiers trained on supervise-selected feature subsets. There are sever...
Chapter
The purpose of ensemble pruning is to reduce the number of predictive models in order to improve efficiency and predictive performance of the ensemble. In clustering-based approach, we are looking for groups of similar models, and then we prune each of them separately in order to increase overall diversity of the ensemble. In this paper we propose...
Article
Full-text available
Instance reduction techniques are data preprocessing methods originally developed to enhance the nearest neighbor rule for standard classification. They reduce the training data by selecting or generating representative examples of a given problem. These algorithms have been designed and widely analyzed in multi-class problems providing very compet...
Chapter
Although, imbalanced data analysis gained significant attention in the past years, it still remains an underdeveloped area of research posing many difficulties due to the difference in the number of objects in the examined classes, rendering traditional, accuracy driven machine learning methods useless. With many modern real-life applications being...
Chapter
Learning from imbalanced data is still considered as one of the most challenging areas of machine learning. Among plethora of methods dedicated to alleviating the challenge of skewed distributions, two most distinct ones are data-level sampling and cost-sensitive learning. The former modifies the training set by either removing majority instances o...
Chapter
Full-text available
The nature of analysed data may cause the difficulty of the many practical data mining tasks. This work is focusing on two of the important research topics associated with data analysis, i.e., data stream classification as well as data analysis with imbalanced class distributions. We propose the novel classification method, employing a classifier s...
Article
Due to variety of modern real-life tasks, where analyzed data is often not a static set, the data stream mining gained a substantial focus of machine learning community. Main property of such systems is the large amount of data arriving in a sequential manner, which creates an endless stream of objects. Taking into consideration the limited resourc...
Article
Currently, knowledge discovery in databases is an essential first step when identifying valid, novel and useful patterns for decision making. There are many real-world scenarios, such as bankruptcy prediction, option pricing or medical diagnosis, where the classification models to be learned need to fulfill restrictions of monotonicity (i.e. the ta...
Article
Full-text available
Imbalanced data classification remains a focus of intense research, mostly due to the prevalence of data imbalance in various real-life application domains. A disproportion among objects from different classes may significantly affect the performance of standard classification models. The first problem is the high imbalance ratios that pose a serio...
Chapter
Usually, during data stream classifier learning, we assume that labels of all incoming examples are available without any delay and they are used to update employing predictive model. Unfortunately, this assumption about access to all class labels is naive and it requires relatively high budget for labeling. It causes that methods which can train d...
Chapter
The difficulty of the many classification tasks lies in the analyzed data nature, as disproportionate number of examples from different class in a learning set. Ignoring this characteristics causes that canonical classifiers display strongly biased performance on imbalanced datasets. In this work a novel classifier ensemble forming technique for im...
Preprint
Full-text available
Currently, knowledge discovery in databases is an essential step to identify valid, novel and useful patterns for decision making. There are many real-world scenarios, such as bankruptcy prediction, option pricing or medical diagnosis, where the classification models to be learned need to fulfil restrictions of monotonicity (i.e. the target class l...
Article
In this paper we deal with the problem of addressing multi-class problems with decomposition strategies. Based on the divide-and-conquer principle, a multi-class problem is divided into a number of easier to solve sub-problems. In order to do so, binary decomposition is considered to be the most popular approach. However, when using this strategy w...
Conference Paper
Full-text available
Learning from imbalanced data is a challenge that machine learning community is facing over last decades, due to its ever-growing presence in real-life problems. While there is a significant number of works addressing the issue of handling binary and skewed datasets, its multi-class counterpart have not received as much attention. This problem is m...
Chapter
Medical data mining problems are usually characterized by examples of some of the classes appearing more frequently. Such a learning difficulty is known as imbalanced classification problems. This contribution analyzes the application of algorithms for tackling multi-class imbalanced classification in the field of vertebral column diseases classifi...
Chapter
Correct recognition of the possible changes in data streams, called concept drifts plays a crucial role in constructing the appropriate model learning strategy. This paper focuses on the unsupervised learning model for non-stationary data streams, where two significant modifications of the ClustTree algorithm are presented. They allow the clusterin...
Article
Full-text available
Ozone is one of the pollutants with most negative effects on human health and in general on the biosphere. Many data-acquisition networks collect data about ozone values in both urban and background areas. Usually, these data are incomplete or corrupt and the imputation of the missing values is a priority in order to obtain complete datasets, solvi...
Book
This Edited Volume gathers a selection of refereed and revised papers originally presented at the Third International Symposium on Signal Processing and Intelligent Recognition Systems (SIRS’17), held on September 13–16, 2017 in Manipal, India. The papers offer stimulating insights into biometrics, digital watermarking, recognition systems, image a...