Marcin Woźniak’s research while affiliated with Silesian University of Technology and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (299)


Towards Designing a Vision Transformer-Based Deep Neural Network for Emotion and Gender Detection from Human Speech Signals
  • Chapter

December 2024

·

5 Reads

Parthib Dhal

·

Ujaan Datta

·

Marcin Woźniak

·

[...]

·

Voice is very complicated and dynamic when compared to other physical characteristics of the human, as it can be used to speak in a variety of different languages with varying accents and in various states of emotion. Our ears receive signals, which our brain analyses and determines how we feel and how the other person feels. Despite being unimportant to us, it becomes a difficult chore that any computing device can replicate. There are numerous uses for automatic gender, mood, and speaker identification systems in the fields of social media, robotics, etc. After encoding the human speech signals into spectrogram images, our proposed deep learning model employs a Vision Transformer for both emotion and gender classification tasks. This model has been implemented for the first time in the domain of biomedical signal processing. The proposed Vision Transformer model is tested on the widely used RAVDESS, TESS, and URDU emotion detection datasets. We are able to detect human emotions on the RAVDESS dataset with 73.94% accuracy and gender on the same dataset with 98.76% accuracy, which is a tremendous accomplishment. On the TESS and URDU emotion detection datasets, however, we are able to classify emotions with 99.92% and 94.02% accuracies as well, respectively.




Block diagram of the CatBoost with MLP for breast cancer analysis.
Heatmap showing the correlation among the features in the dataset.
The architecture of the CatBoost with a multi-layer perceptron model.
SHAP values of various features in the decision process.
Dependency graphs are associated with the most significant features.

+7

XAI-driven CatBoost multi-layer perceptron neural network for analyzing breast cancer
  • Article
  • Full-text available

November 2024

·

38 Reads

·

1 Citation

Early diagnosis of breast cancer is exceptionally important in signifying the treatment results, of women’s health. The present study outlines a novel approach for analyzing breast cancer data by using the CatBoost classification model with a multi-layer perceptron neural network (CatBoost+MLP). Explainable artificial intelligence techniques are used to cohere with the proposed CatBoost with the MLP model. The proposed model aims to enhance the interpretability of predictions in breast cancer diagnosis by leveraging the benefits of CatBoost classification technique in feature identification and also contributing towards the interpretability of the decision model. The proposed CatBoost+MLP has been evaluated using the Shapley additive explanations values to analyze the feature significance in decision-making. Initially, the feature engineering is done using the analysis of variance technique to identify the significant features. The MLP model alone and the CatBoost+MLP model are being analyzed using divergent performance metrics, and the results obtained are compared with contemporary breast cancer identification techniques.

Download

Automating cancer diagnosis using advanced deep learning techniques for multi-cancer image classification

October 2024

·

65 Reads

·

3 Citations

Cancer detection poses a significant challenge for researchers and clinical experts due to its status as the leading cause of global mortality. Early detection is crucial, but traditional cancer detection methods often rely on invasive procedures and time-consuming analyses, creating a demand for more efficient and accurate solutions. This paper addresses these challenges by utilizing automated cancer detection through AI-based techniques, specifically focusing on deep learning models. Convolutional Neural Networks (CNNs), including DenseNet121, DenseNet201, Xception, InceptionV3, MobileNetV2, NASNetLarge, NASNetMobile, InceptionResNetV2, VGG19, and ResNet152V2, are evaluated on image datasets for seven types of cancer: brain, oral, breast, kidney, Acute Lymphocytic Leukemia, lung and colon, and cervical cancer. Initially, images undergo segmentation techniques, proceeded by contour feature extraction where parameters such as perimeter, area, and epsilon are computed. The models are rigorously evaluated, with DenseNet121 achieving the highest validation accuracy as 99.94%, 0.0017 as loss, and the lowest Root Mean Square Error (RMSE) values as 0.036056 for training and 0.045826 for validation. These results revealed the capability of AI-based techniques in improving cancer detection accuracy, with DenseNet121 emerging as the most effective model in this study.


Platform Interface
Enhancement effect of skiing action video images
Intelligent proofreading results for action regulation
Video annotation example
Key point detection results on the skeleton of skiing actions
An Intelligent Proofreading for Remote Skiing Actions Based on Variable Shape Basis

September 2024

·

24 Reads

Mobile Networks and Applications

The current proofreading algorithms for action regulation mainly recover the 3D structure and action information of non-rigid objects from image sequences by factorization. Most of algorithms assume that the camera model is an affine model. This assumption only holds if the size and depth of the object change very little relative to the distance from the object to the camera, which is in the case of fixed-shape basis. When the object is very close to the camera, this assumption causes a large reconstruction error. This paper solves this problem by the intelligent proofreading algorithms for remote skiing teaching actions based on variable shape basis. Firstly, the improved Retinex algorithm is used to enhance the multi-frame video images of skiing actions to make the action details more prominent. Then, measurement matrix is calculated after eliminating the translation vector by coordinate transformation. Under the condition of rank constraint, the measurement matrix is decomposed by singular value decomposition algorithm, and the correct shape basis structure of 3D action features can be obtained by using the variable shape basis. Finally, by randomly initializing a parameter, the optimized parameter and the least square algorithm are used to optimize the randomly initialized parameter further. The iteration until the convergence of the objective function can be used to calculate the deformation degree of the actions. The test results show that this algorithm improves the proofreading accuracy of action regulation in skiing teaching, and the proofreading results of various uploaded sliding actions are correct, which can be applied to remote skiing teaching and community learning.




Implementing vision transformer for classifying 2D biomedical images

May 2024

·

362 Reads

·

7 Citations

In recent years, the growth spurt of medical imaging data has led to the development of various machine learning algorithms for various healthcare applications. The MedMNISTv2 dataset, a comprehensive benchmark for 2D biomedical image classification, encompasses diverse medical imaging modalities such as Fundus Camera, Breast Ultrasound, Colon Pathology, Blood Cell Microscope etc. Highly accurate classifications performed on these datasets is crucial for identification of various diseases and determining the course of treatment. This research paper presents a comprehensive analysis of four subsets within the MedMNISTv2 dataset: BloodMNIST, BreastMNIST, PathMNIST and RetinaMNIST. Each of these selected datasets is of diverse data modalities and comes with various sample sizes, and have been selected to analyze the efficiency of the model against diverse data modalities. The study explores the idea of assessing the Vision Transformer Model’s ability to capture intricate patterns and features crucial for these medical image classification and thereby transcend the benchmark metrics substantially. The methodology includes pre-processing the input images which is followed by training the ViT-base-patch16-224 model on the mentioned datasets. The performance of the model is assessed using key metrices and by comparing the classification accuracies achieved with the benchmark accuracies. With the assistance of ViT, the new benchmarks achieved for BloodMNIST, BreastMNIST, PathMNIST and RetinaMNIST are 97.90%, 90.38%, 94.62% and 57%, respectively. The study highlights the promise of Vision transformer models in medical image analysis, preparing the way for their adoption and further exploration in healthcare applications, aiming to enhance diagnostic accuracy and assist medical professionals in clinical decision-making.


A Sentiment Analysis Method for Big Social Online Multimodal Comments Based on Pre-trained Models

May 2024

·

30 Reads

·

1 Citation

Mobile Networks and Applications

In addition to a large amount of text, there are also many emoticons in the comment data on social media platforms. The multimodal nature of online comment data increases the difficulty of sentiment analysis. A big data sentiment analysis technology for social online multimodal (SOM) comments has been proposed. This technology uses web scraping technology to obtain SOM comment big data from the internet, including text data and emoji data, and then extracts and segments the text big data, preprocess part of speech tagging. Using the attention mechanism-based feature extraction method for big SOM comment data and the correlation based expression feature extraction method for SOM comment, the emotional features of SOM comment text and expression package data were obtained, respectively. Using the extracted two emotional features as inputs and the ELMO pre-training model as the basis, a GE-Bi LSTM model for SOM comment sentiment analysis is established. This model combines the ELMO pre training model with the Glove model to obtain the emotional factors of social multimodal big data. After recombining them, the GE-Bi LSTM model output layer is used to output the sentiment analysis of big SOM comment data. The experiment shows that this technology has strong extraction and segmentation capabilities for SOM comment text data, which can effectively extract emotional features contained in text data and emoji packet data, and obtain accurate emotional analysis results for big SOM comment data.


Citations (81)


... With the rapid development of deep learning, it has made some remarkable achievements in the research of computer vision, such as target detection [27][28][29] and HDR imaging [30] directions. While in facial expression recognition, traditional methods are based mainly on image processing, feature extraction, and machine learning techniques. ...

Reference:

LCANet: a model for analysis of students real-time sentiment by integrating attention mechanism and joint loss function
Uncertainty estimation in HDR imaging with Bayesian neural networks
  • Citing Article
  • December 2024

Pattern Recognition

... It was also explored that EfficientNetB0 [25] due to its unusual scaling approach, that is, it balances width, depth and resolution, will optimize both accuracy and computational efficiency. We also included MobileNetV2 and NASNetMobile [26], two models intended for mobile and constrained resources. Depth-wise separable convolutions on MobileNetV2 have produced lightweight yet effective feature extraction, and NASNetMobile is a neural architecture search model with competitive accuracy at reduced computational costs. ...

Automating cancer diagnosis using advanced deep learning techniques for multi-cancer image classification

... With the rapid development of deep learning, it has made some remarkable achievements in the research of computer vision, such as target detection [27][28][29] and HDR imaging [30] directions. While in facial expression recognition, traditional methods are based mainly on image processing, feature extraction, and machine learning techniques. ...

Dynamic center point learning for multiple object tracking under Severe occlusions
  • Citing Article
  • June 2024

Knowledge-Based Systems

... It is often used in conjunction with deep learning models. In this study, four types of machine learning methods were preferred (Ahmed et al., 2023;Halder et al., 2024). The aim was to increase the success rate by generalising the classification process. ...

Implementing vision transformer for classifying 2D biomedical images

... F. S. Konstantakopoulos et al. [25] conducted a detailed survey on food image recognition. To automatically detect and recognize chewable food items based on their eating sounds, Y. Kumar et al. [26] proposed an approach using signal processing and DL approaches. D. Xue et al. [27] developed a weighted EL approach to classify histopathology images. ...

Automated detection and recognition system for chewable food items using advanced deep learning models

... These parameters provide the in-depth co-ordinate characteristics as well as intensity distribution of the bounded regions. Apart from this, the parameters are also helpful in differentiating between the normal as well as fractured spine based on their geometric properties which facilitates in automatic detection and classification task in medical images [25][26][27]. Content courtesy of Springer Nature, terms of use apply. Rights reserved. ...

Enhancing parasitic organism detection in microscopy images through deep learning and fine-tuned optimizer

... Natural scene text detection has emerged as a pivotal task in computer vision, finding applications in diverse domains such as optical character recognition and visually impaired navigation [1]. With the continuous development of deep learning technology, natural scene text detection has achieved very good results in various tasks [14][15][16][17], and accordingly, it has ushered in a new development opportunity. In recent research, two predominant approaches have been prominent: segmentation-based and regression-based [3]. ...

KGSR: A kernel guided network for real-world blind super-resolution
  • Citing Article
  • March 2024

Pattern Recognition

... Similarly, Hong et al. [16] have achieved the prediction of LiBs RUL by employing an RUL prediction method based on a recursive least squares algorithm and particle filter. Sikora et al. [17] predicted the RUL of a LiBs by constructing a mathematical model of a coil insulation system and proposed an evolutionary optimization method utilizing the Red Fox optimization algorithm. In addition, they proposed a method for diagnosing the RUL of LiBs using a fuzzy voltage wave testinsulation system. ...

Digital Twin Heuristic Positioning of Insulation in Multimodal Electric Systems
  • Citing Article
  • February 2024

IEEE Transactions on Consumer Electronics

... Studying these classification tasks is essential for addressing diverse challenges in agricultural production, such as animal and plant protection, soil moisture prediction, and crop yield prediction. With the ever-increasing availability of data and computing power, deep learning is expected to play a crucial role in advancing automation in agriculture (Woźniak & ijaz, 2024). ...

Editorial: Recent advances in big data, machine, and deep learning for precision agriculture

... The combination of genetic mutations and environmental factors, such as exposure to radiation [6], certain chemicals, and some viruses [7], can damage the DNA of blood cells [8], leading to their malignant transformation of disease [9]. Blood, a vital fluid coursing through the human body, serves as a lifeline, distributing basic nutrients and oxygen tissues while eliminating waste products [10]. Consisting of four components, comprising plasma and the liquid component, it transports hormones, proteins, and waste products from the body [11]. ...

Editorial: Recent Advances in Deep Learning and Medical Imaging for Cancer Treatment