About the lab

Machine Learning Group

Research interests:
Machine learning and data mining
Information fusion and combined classifiers (multiple classifier systems, classifier ensembles)
Big data analytics
Data stream classification and concept drift (novelty) detection
One-class classification
Imbalanced data analysis
Methods of improving and stabilizing weak classifiers
Hybrid and compound classification
Active learning
Distributed and parallel computing systems for data mining
Applications to real-life problems

Featured projects (1)

The project will focus on the possibility of overcoming the above-mentioned difficulties by using multi-criteria optimization methods, returning a set of Pareto-optimal solutions, enabling the user to select a specific classification model, proposing automatic methods of its selection, or aggregation of acceptable models using the combined classification paradigm. In this project, we form a hypothesis that: It is possible to propose classifier learning algorithms using multicriteria optimization, returning a set of Pareto-optimal models, with individual prediction quality at least as good as the quality of classifiers trained using aggregated criteria.

Featured research (13)

Contemporary man is addicted to digital media and tools supporting his daily activities, which causes the massive increase of incoming data, both in volume and frequency. Due to the observed trend, unsupervised machine learning methods for data stream clustering have become a popular research topic over the last years. At the same time, semi-supervised constrained clustering is rarely considered in data stream clustering. To address this gap in the field, the authors propose adaptations of k-means constrained clustering algorithms for employing them in imbalanced data stream clustering. In this work, proposed algorithms were evaluated in a series of experiments concerning synthetic and real data clustering and verified their ability to adapt to occurring concept drifts.KeywordsData streamsPair-wise constrainedClusteringImbalanced data
One of the most critical data analysis tasks is the streaming data classification, where we may also observe the concept drift phenomenon, i.e., changing the decision model’s probabilistic characteristics. From a practical point of view, we may face this type of banking, medicine, or cybersecurity task to enumerate only a few. A vital characteristic of these problems is that the classes we are interested in (e.g., fraudulent transactions, treats, or serious diseases) are usually infrequent, which hinders the classification system design. The paper presents a novel algorithm DSCB (Deterministic Sampling Classifier with weighted Bagging) employs data preprocessing methods and weighted bagging technique to classify non-stationary imbalanced data stream. It builds models based on an incoming data chunk, but it also takes previously arrived instances into account. The proposed approach has been evaluated based on a wide range of computer experiments carried out on real and artificially generated data streams with various imbalance ratios, label noise levels, and concept drift types. The results confirmed that the weighted bagging ensemble coupled with data preprocessing could outperform state-of-the-art methods.
Modern analytical systems must process streaming data and correctly respond to data distribution changes. The phenomenon of changes in data distributions is called concept drift , and it may harm the quality of the used models. Additionally, the possibility of concept drift appearance causes that the used algorithms must be ready for the continuous adaptation of the model to the changing data distributions. This work focuses on non-stationary data stream classification, where a classifier ensemble is used. To keep the ensemble model up to date, the new base classifiers are trained on the incoming data blocks and added to the ensemble while, at the same time, outdated models are removed from the ensemble. One of the problems with this type of model is the fast reaction to changes in data distributions. We propose the new Chunk Adaptive Restoration framework that can be adapted to any block-based data stream classification algorithm. The proposed algorithm adjusts the data chunk size in the case of concept drift detection to minimize the impact of the change on the predictive performance of the used model. The experimental research, backed up with the statistical tests, has proven that Chunk Adaptive Restoration significantly reduces the model’s restoration time.
Artificial intelligence (AI) has found a myriad of applications in many domains of technology, and more importantly, in improving people’s lives. Sadly, AI solutions have already been utilized for various violations and theft, even receiving the name AI or Crime (AIC). This poses a challenge: are cybersecurity experts thus justified to attack malicious AI algorithms, methods and systems as well, to stop them? Would that be fair and ethical? Furthermore, AI and machine learning algorithms are prone to be fooled or misled by the so-called adversarial attacks. However, adversarial attacks could be used by cybersecurity experts to stop the criminals using AI, and tamper with their systems. The paper argues that this kind of attacks could be named Ethical Adversarial Attacks (EAA), and if used fairly, within the regulations and legal frameworks, they would prove to be a valuable aid in the fight against cybercrime.
Nowadays, societies, businesses and citizens are strongly dependent on information, and information has become one of the most crucial (societal and economical) values. People expect that both traditional and online media provide trustful and reliable news and content. The right to be informed is one of the fundamental requirements for making the right decisions on a small scale (e.g., during shopping) and large scale (e.g., during general or presidential elections). However, information is not always reliable because digital content may be manipulated, and its spreading could also be used for disinformation. This is true especially with the proliferation of online media, where news travels fast and is often based on User Generated Content (UGC), while there is often little time and few resources for the information to be carefully cross-checked. Moreover, disinformation and media manipulation can be part of hybrid warfare and malicious propaganda. Such false content should be detected as soon as possible to avoid its negative influence on the readers, and, in some cases, on political decisions. Part of these challenges and vivid problems can be addressed by innovative machine learning, artificial intelligence and soft computing methods. Therefore, the main aim of our special issue on Applying Machine Learning for Combating Fake News and Internet/Media Content Manipulation in the Applied Soft Computing journal was to gather a set of high-quality papers presenting new approaches and solutions for media and content manipulation and disinformation detection. We also encourage papers concerning the problem of early detection of radicalization and hate speech based on fake information and/or manipulated content. We were very positively surprised by the strong feedback and the considerable number of submissions we received, from which we finally could select 14 for publication. The papers in this issue can roughly be divided into three categories: survey papers, fake news detection in social media, and image manipulation detection. It is also worth mentioning that the work on this special issue was accompanied by the webinar “Machine Learning to combat Fake News and Media Manipulation” which took place on April 20, 2021,1 organized jointly by the Applied Soft Computing journal, Patterns Journal, Elsevier editorial team and invited editors of this special issue.

Lab head

Michal Wozniak
  • Department of Systems and Computer Networks
About Michal Wozniak
  • Professor, Department of Systems and Computer Networks, Wroclaw University of Technology Fields of interest - machine learning, especially inductive learning, data and web mining, learning on distributed and streaming data - pattern recognition, especially combined and compound classifiers, concept drift, recognition with context - telemedicine and medical decision support - computer and networks security, especially IDS, IPS, and anti-spam filters design - distributed algorithms

Members (14)

Paweł Ksieniewicz
  • Wroclaw University of Science and Technology
Mariusz Topolski
  • Wroclaw University of Science and Technology
Pawel Zyblewski
  • Wroclaw University of Science and Technology
Piotr Sobolewski
  • Wroclaw University of Science and Technology
Dariusz Jankowski
  • Wroclaw University of Science and Technology
Jakub Klikowski
  • Wroclaw University of Science and Technology
Jȩdrzej Kozal
  • Wroclaw University of Science and Technology
Filip Guzy
  • Wroclaw University of Science and Technology