Michał Koziarski

Michał Koziarski
AGH University of Science and Technology in Kraków | AGH

About

38
Publications
21,053
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
551
Citations

Publications

Publications (38)
Article
One of the vital problems with the imbalanced data classifier training is the definition of an optimization criterion. Typically, since the exact cost of misclassification of the individual classes is unknown, combined metrics and loss functions that roughly balance the cost for each class are used. However, this approach can lead to a loss of info...
Article
Full-text available
Real-world classification domains, such as medicine, health and safety, and finance, often exhibit imbalanced class priors and have asynchronous misclassification costs. In such cases, the classification model must achieve a high recall without significantly impacting precision. Resampling the training data is the standard approach to improving cla...
Article
Data imbalance remains one of the factors negatively affecting the performance of contemporary machine learning algorithms. One of the most common approaches to reducing the negative impact of data imbalance is preprocessing the original dataset with data-level strategies. In this paper we propose a unified framework for imbalanced data over- and u...
Preprint
Full-text available
Real-world classification domains, such as medicine, health and safety, and finance, often exhibit imbalanced class priors and have asynchronous misclassification costs. In such cases, the classification model must achieve a high recall without significantly impacting precision. Resampling the training data is the standard approach to improving cla...
Preprint
Cancer diseases constitute one of the most significant societal challenges. In this paper we introduce a novel histopathological dataset for prostate cancer detection. The proposed dataset, consisting of over 2.6 million tissue patches extracted from 430 fully annotated scans, 4675 scans with assigned binary diagnosis, and 46 scans with diagnosis g...
Preprint
Data imbalance remains one of the factors negatively affecting the performance of contemporary machine learning algorithms. One of the most common approaches to reducing the negative impact of data imbalance is preprocessing the original dataset with data-level strategies. In this paper we propose a unified framework for imbalanced data over- and u...
Article
Learning from imbalanced data is among the most popular topics in the contemporary machine learning. However, the vast majority of attention in this field is given to binary problems, while their much more difficult multiclass counterparts are relatively unexplored. Handling data sets with multiple skewed classes poses various challenges and calls...
Article
Full-text available
The imbalanced data classification is one of the most crucial tasks facing modern data analysis. Especially when combined with other difficulty factors, such as the presence of noise, overlapping class distributions, and small disjuncts, data imbalance can significantly impact the classification performance. Furthermore, some of the data difficulty...
Article
Full-text available
Histograms of oriented gradients (HOG) are still one of the most frequently used low-level features for pattern recognition in images. Despite their great popularity and simple implementation performance of the HOG features almost always has been measured on relatively high quality data which are far from real conditions. To fill this gap we experi...
Preprint
In this paper we propose two novel data-level algorithms for handling data imbalance in the classification task: first of all a Synthetic Minority Undersampling Technique (SMUTE), which leverages the concept of interpolation of nearby instances, previously introduced in the oversampling setting in SMOTE, and secondly a Combined Synthetic Oversampli...
Preprint
Data imbalance remains one of the open challenges in the contemporary machine learning. It is especially prevalent in case of medical data, such as histopathological images. Traditional data-level approaches for dealing with data imbalance are ill-suited for image data: oversampling methods such as SMOTE and its derivatives lead to creation of unre...
Preprint
Full-text available
The imbalanced data classification is one of the most crucial tasks facing modern data analysis. Especially when combined with other difficulty factors, such as the presence of noise, overlapping class distributions, and small disjuncts, data imbalance can significantly impact the classification performance. Furthermore, some of the data difficulty...
Article
Full-text available
Data imbalance remains one of the most widespread problems affecting contemporary machine learning. The negative effect data imbalance can have on the traditional learning algorithms is most severe in combination with other dataset difficulty factors, such as small disjuncts, presence of outliers and insufficient number of training observations. Af...
Chapter
While most existing image recognition benchmarks consist of relatively high quality data, in the practical applications images can be affected by various types of distortions. In this paper we experimentally evaluate the extent to which image distortions affect classification based on HOG feature descriptors. In an experimental study based on sever...
Conference Paper
Full-text available
In this work, we propose an algorithm for training deep neural networks for classification of breast cancer in histopathological images affected by data unbalance with support of active learning. The output of the neural network on unlabeled samples is used to calculate weighted information entropy. It is utilized as uncertainty score for automatic...
Conference Paper
Full-text available
In this work, we propose an algorithm for training deep neural networks for classification of breast cancer in histopathological images affected by data unbalance with support of active learning. The output of the neural network on unlabeled samples is used to calculate weighted information entropy. It is utilized as uncertainty score for automatic...
Chapter
In this work, we propose an algorithm for training deep neural networks for classification of breast cancer in histopathological images affected by data unbalance with support of active learning. The output of the neural network on unlabeled samples is used to calculate weighted information entropy. It is utilized as uncertainty score for automatic...
Preprint
Data imbalance remains one of the most widespread problems affecting contemporary machine learning. The negative effect data imbalance can have on the traditional learning algorithms is most severe in combination with other dataset difficulty factors, such as small disjuncts, presence of outliers and insufficient number of training observations. Sa...
Article
Full-text available
Imbalanced data classification remains a focus of intense research, mostly due to the prevalence of data imbalance in various real-life application domains. A disproportion among objects from different classes may significantly affect the performance of standard classification models. The first problem is the high imbalance ratios that pose a serio...
Chapter
Full-text available
In this paper we experimentally evaluated the impact of data imbalance on the convolutional neural networks performance in the histopathological image recognition task. We conducted our analysis on the Breast Cancer Histopathological Database. We considered four phenomena associated with data imbalance: how does it affect classification performance...
Article
Full-text available
Due to the advances made in recent years, methods based on deep neural networks have been able to achieve a state-of-the-art performance in various computer vision problems. In some tasks, such as image recognition, neural-based approaches have even been able to surpass human performance. However, the benchmarks on which neural networks achieve the...
Article
Full-text available
Imbalanced data classification is one of the most widespread challenges in contemporary pattern recognition. Varying levels of imbalance may be observed in most real datasets, affecting the performance of classification algorithms. Particularly, high levels of imbalance make serious difficulties, often requiring the use of specially designed method...
Article
Full-text available
In a world, in which the acceptance and the social community membership is highly desired, the ability to predict social group evolution appears to be a fascinating research task, yet very complex. Therefore, the problem decomposition has been performed, and a new, adaptable and generic method for group evolution prediction in social networks (call...
Article
Full-text available
Ensemble classification remains one of the most popular techniques in contemporary machine learning, being characterized by both high efficiency and stability. An ideal ensemble comprises mutually complementary individual classifiers which are characterized by the high diversity and accuracy. This may be achieved, e.g., by training individual class...
Data
Ensemble classification remains one of the most popular techniques in contemporary machine learning, being characterized by both high efficiency and stability. An ideal ensemble comprises mutually complementary individual classifiers which are characterized by the high diversity and accuracy. This may be achieved, e.g., by training individual class...
Article
Full-text available
Data classification in presence of noise can lead to much worse results than expected for pure patterns. In this paper we investigate this problem in the case of deep convolutional neural networks in order to propose solutions that can mitigate influence of noise. The main contributions presented in this paper are experimental examination of influe...
Conference Paper
Full-text available
The difficulty of the many practical decision problem lies in the nature of analyzed data. One of the most important real data characteristic is imbalance among examples from different classes. Despite more than two decades of research, imbalanced data classification is still one of the vital challenges to be addressed. The traditional classificati...
Conference Paper
Full-text available
Presence of noise poses a common problem in image recognition tasks. In this paper we propose and analyse architecture of convolutional neural network capable of image denoising. We evaluate its performance with various types of artificial distortions present, with both known and unknown noise conditions. Finally, we measure how including denoising...
Conference Paper
Full-text available
Ensemble learning is being considered as one of the most well-established and efficient techniques in the contemporary machine learning. The key to the satisfactory performance of such combined models lies in the supplied base learners and selected combination strategy. In this paper we will focus on the former issue. Having classifiers that are of...
Conference Paper
Classification of distorted patterns poses real problem for majority of classifiers. In this paper we analyse robustness of deep neural network in classification of such patterns. Using specific convolutional network architecture, an impact of different types of noise on classification accuracy is evaluated. For highly distorted patterns to improve...

Network

Cited By

Projects

Projects (2)
Project
The project will focus on the possibility of overcoming the above-mentioned difficulties by using multi-criteria optimization methods, returning a set of Pareto-optimal solutions, enabling the user to select a specific classification model, proposing automatic methods of its selection, or aggregation of acceptable models using the combined classification paradigm. In this project, we form a hypothesis that: It is possible to propose classifier learning algorithms using multicriteria optimization, returning a set of Pareto-optimal models, with individual prediction quality at least as good as the quality of classifiers trained using aggregated criteria.
Project
This project covers the topic of designing efficient machine learning methods for the multi-class scenarios suffering from uneven distribution of training samples in classes. Typically supervised learning methods are designed to work with reasonably balanced data set, but many real world applications have to face imbalanced data sets. A data set is said to be imbalanced when several classes are under-represented (minority classes) in comparison with others (majority classes). The problem of imbalanced data is usually found the wide range of the practical application as in banking (fraud detection), computer security (IDS/IPS or spam filtering) or medicine. The presented literature survey allows us to conclude that there is a need to develop novel methodologies for handling multi-class imbalanced problems and exploring the characteristics of examples within class structures. Learning from imbalanced data is among the contemporary challenges in machine learning and multi-class imbalance stands out as the most difficult scenario. In binary imbalanced learning the relationships between classes are easy to be defined: one class is the majority one, while the other is the minority one. In this project we form a hypothesis that it is possible to design efficient multi-class methods for such compound imbalance problems that could process all of classes at once. We plan to identify general rules for designing efficient methods for learning from multi-class imbalanced data, proposing novel algorithms for this task and developing dedicated software packages that could be used in this area of research. For evaluating the quality of the proposed methods we will use mainly the experimental investigations. Currently the analytical approach for learning from imbalanced data is highly limited due to the number of simplifications that must be undertaken. All experiments will be done mainly in KNIME environment (that allows to develop and join software from Java, R and Matlab languages). The following task will be performed: • Developing methods devoted to a local difficulty of imbalace data for multi-class classification task • Proposition of new algorithms for data prepocessing to decrease the imbalance ratio for multi-class classification task • Developing new methods of imbalance data classification for multi-class classification task • Proposition of new imbalance data classification methods based on classifier ensemble approach for multi-class classification task • Computer implementations and experimental evaluation of proposed methods devoted to imbalanced data for multi-class classification task The usage of mentioned methods in analysis of imbalanced data for multi-class is currently underrated. The proposed research tasks aim at filling this gap. Therefore the algorithms which will be developed during this project could be used by the companied related with data analysis.