53 reads in the past 30 days
Advancing Handwritten Musical Notation Recognition Using Deep Learning: A Convolutional Neural Network-Based Approach with Improved AccuracyApril 2024
·
256 Reads
·
1 Citation
Published by World Scientific
Online ISSN: 1793-6381
·
Print ISSN: 0218-0014
53 reads in the past 30 days
Advancing Handwritten Musical Notation Recognition Using Deep Learning: A Convolutional Neural Network-Based Approach with Improved AccuracyApril 2024
·
256 Reads
·
1 Citation
42 reads in the past 30 days
Emotion Recognition from Facial Expression Using Hybrid CNN–LSTM NetworkJuly 2023
·
850 Reads
·
10 Citations
39 reads in the past 30 days
Using Hybrid Transformer and Convolutional Neural Network for Malware Detection in Internet of ThingsMarch 2025
·
61 Reads
32 reads in the past 30 days
Unveiling Parkinson’s: Handwriting Symptoms with Explainable and Interpretable CNN ModelJanuary 2025
·
77 Reads
·
1 Citation
29 reads in the past 30 days
Evaluating Explainability in Transfer Learning Models for Pulmonary Nodules Classification: A Comparative Analysis of Generalizability and InterpretabilityMay 2025
·
64 Reads
The International Journal of Pattern Recognition and Artificial Intelligence (IJPRAI) welcomes both theory-oriented and innovative applications articles on new developments and is of interest to both researchers in academia and industry. The current scope of this journal includes:
• Pattern Recognition • Machine Learning • Deep Learning • Document Analysis • Image Processing • Signal Processing • Computer Vision • Biometrics • Biomedical Image Analysis • Artificial Intelligence
In addition to regular papers describing original research work, survey articles on timely and important research topics are highly welcome. Special issues with focused topics within the scope of this journal are also published.
May 2025
Srilatha Yelamati
·
Srikanth Thota
May 2025
Qiufeng Wang
·
Qianying Guo
·
Kui Zhang
·
Lin Liu
May 2025
·
1 Read
Underwater aquatic systems play a crucial role in maintaining ecological balance and supporting marine biodiversity. However, due to low visibility, color distortion, and scattering effects caused by light absorption, efficient monitoring and detecting objects in such environments remain challenging. Deep learning-based image processing techniques have revolutionized underwater exploration by providing robust solutions for enhancing image quality, extracting meaningful features, and enabling precise classification. Integrating advanced image enhancement methods with deep learning architectures facilitates accurate detection and monitoring of aquatic species, objects, and anomalies. This study introduces a novel approach that synergistically combines the Multiscale Retinex (MSR) and Dark Channel Prior (DCP) approaches for underwater image enhancement in the form of the Dark Retinex Fusion (DRF) model. The DRF model is further integrated with a YOLO-based Transformer framework, leveraging attention mechanisms to enhance feature extraction and classification. The proposed DRF-YOLO-based Transformer framework effectively reduces haze, enhances contrast, and balances colors for an underwater environment. It incorporates advanced spatial precision features in the YOLO backbone and applies the attention module from the Transformer model that captures the long-range dependencies for better contextual understanding. The model was tested on underwater object datasets, achieving an accuracy of 98% and a loss of 0.2, outperforming traditional methods. Additionally, the framework demonstrated resilience to overfitting and local minima, maintaining consistent performance under varying conditions.
May 2025
Guanghua Chen
·
Bin Fang
May 2025
M. Ramkumar
·
P. Karthigaikumar
·
M.S. Gowtham
·
Sathish Kumar Nagarajan
May 2025
Haocheng Luo
May 2025
Zhongda Cao
May 2025
Yangsong He
·
Yao Ge
·
Gengyu Zhan
·
Yanhong Gu
May 2025
Zhang Yunlei
·
Li Ke
·
Shen Tingli
·
Peng Pei
May 2025
Wensheng Tang
·
Zesan Liu
·
Tian Tian
·
Yanjie Wang
May 2025
Zishuo Gao
·
Ge Gao
·
Chen Liu
·
[...]
·
Yi Hu
May 2025
Guojin Pei
·
Zekun Wang
·
Xinxing Yang
·
[...]
·
Jian Chu
May 2025
Devi Palanisamy
·
S. Anila
May 2025
Qing Yan
·
Guoshi Wang
·
Dazhi Zhu
·
Jinxun Li
May 2025
·
1 Read
Multiple-symbol noncoherent detection (MSND) with the aid of Neural Networks (NNs) for low-density parity-check (LDPC) coded multiple phase shift keying (MPSK) signals is studied for Unmanned aerial vehicle (UAV) communications. In the traditional MSND scheme, the number of the candidate sequences grows exponentially with respect to the length of the symbol observation period. Implementing the optimal bit log-likelihood ratio (LLR) for decoding is challenging, even when the observed symbol period is two. In this paper, we first proposed an improved scheme to reduce the number of the candidate sequences by phase combination, the phase is uniformly quantized into L discrete values. We find that the performance requirements can be well met when the phase quantization order is only 4. Then we utilize Back Propagation neural networks (BPN) to compute the bit LLR. To enhance the training efficiency of our NNs and achieve better performance, we also uniformly quantize the carrier phase offset (CPO) into discrete states. The decoding convergence is accelerated significantly compared to the improved traditional scheme. The complexity is reduced to a certain extent within the acceptable range of performance loss.
May 2025
Priyan Malarvizhi Kumar
·
C Gokulnath
·
Jeeva Selvaraj
·
Balasubramanian Prabhu Kavin
May 2025
·
1 Read
The recognition of small-size targets presents a significant challenge in computer vision, as their reduced dimensions often lead to diminished detection accuracy. To address this issue, this paper proposes an enhanced small-size target recognition method based on an improved version of You Only Look Once Version 8 (YOLOv8). The proposed improvements include integrating the dynamic attention mechanism of BiFormer (Vision Transformer with Bi-level Routing Attention), which leverages the sparsity of dynamic and query perception to enable more flexible and adaptive content perception. Additionally, the Weighted Intersection over Union (WIOU) loss function is introduced to address the imbalance in Bounding Box Regression (BBR) between samples, enhancing the overall accuracy of the model. Furthermore, a specialized detection head for small targets and a confidence-adaptive module are added at the detection head’s end, improving feature extraction and continuous tracking capabilities for small targets, especially under conditions of low visibility and target occlusion. Experimental results demonstrate that the improved model significantly enhances the detection of incomplete and small-sized targets, providing robust performance in scenarios with occlusion and reduced visibility. This study emphasizes the potential of the enhanced YOLOv8 model in real-world applications, providing new improvement ideas for the development of occluded and small target recognition.
May 2025
·
5 Reads
Ensuring process quality in supply chains is a critical challenge due to the complexity and variability of production processes. Conventional methods often struggle to accurately detect and address quality issues in such dynamic environments. This paper proposes a novel anomaly detection and control method based on image feature modeling to enhance process monitoring and decision-making in supply chain operations. By leveraging advanced image processing techniques, key features of production processes are extracted and modeled, enabling accurate identification of deviations and anomalies. Experimental results demonstrate that the proposed method significantly improves detection accuracy and response time compared to traditional approaches. This study contributes to the development of intelligent quality control solutions, offering scalability and robustness for real-world supply chain applications.
May 2025
·
4 Reads
Medical image processing plays a crucial role in the early detection and classification of diseases, particularly breast cancer, which is a leading cause of death among women. Mammography is widely used for breast cancer diagnosis, but accurately interpreting mammograms remains challenging even for expert radiologists. To address this, we propose a novel framework for automatic breast cancer diagnosis that integrates mass segmentation, feature selection, and a hybrid deep learning-based classifier. Our approach introduces the enhanced black widow optimization (EBWO) algorithm for mass segmentation, effectively identifying the region of interest (ROI) in mammograms. The Improved UNet model is used for deep feature extraction, enhancing feature representations from the segmented masses. To tackle the issue of high-dimensional data, we employ the modified snow ablation optimization (MSAO) algorithm for optimal feature selection, ensuring the best features are chosen for classification. The optimal threshold graph neural network (OT-GNN) is then utilized for classifying breast cancer into three categories: normal, benign, and malignant. In comparison with existing methods, our framework demonstrates superior performance. We validate the proposed UNet+MSAO+OT-GNN method using augmented datasets such as DDSM and INbreast, achieving IOU scores of 99.912% and 99.909%, respectively. Our method outperforms existing classifiers, achieving accuracy scores of 99.523% and 99.859% on DDSM and INbreast datasets, respectively, showcasing significant improvements in both segmentation and classification accuracy. This highlights the effectiveness and novelty of our approach in comparison to traditional methods.
May 2025
·
2 Reads
The intensification and automation of the pig farming industry have created an urgent need for cost-effective and efficient identification of individual pigs. Pig identification is crucial for disease prevention and control, pork quality traceability, genetic breeding, and insurance services. To address the challenges faced by existing noncontact pig face recognition models in overcoming strong environmental interference in pigsties and the minimal differences among pig faces, this paper proposes a convolutional neural network based on multi-granularity semantic analysis (MGSNet). By integrating pixel-level, component-level, and object-level semantic features, the model significantly improves recognition performance in complex scenarios. Specifically, the model addresses challenges such as environmental interference and high similarity among individual pigs. Experimental results show that the algorithm achieves a high test accuracy of 92.50% on a dataset of 10 pigs collected from actual pig farms, with lightweight network parameters. Through deconvolution and gradient-weighted class activation mapping techniques, the feature extraction process of the model is visually interpretable, providing reliable technical support for farmers. The research findings can be directly applied to precision feeding, disease monitoring, breeding optimization, and other scenarios, promoting the comprehensive adoption of smart agriculture.
The demand for tea from consumers exhibits a diversified and personalized trend. Initiating from the scientific analysis of tea taste, this paper addresses issues such as the incomplete cognition standard of tea market taste, varying evaluation criteria for tea, and the development of a young tea consumption group, utilizing the Nonlinear Auto-Regressive (NAR) model, a quantitative evaluation of tea taste is conducted. In this study, raw and ripe tea samples from Pu-erh tea were used as training data, and the NAR was employed for deep learning, error analysis, and comparison. The aim was to predict the taste resulting from different content ratios of chemical components in tea, further verifying the feasibility and scientific accuracy of the quantitative evaluation of tea taste based on the NAR. This study not only broadens the research field of NAR neural network, but also further enables tea enterprises to better provide consumers with diversified tea taste, and provides an important reference for tea taste evaluation.
May 2025
·
3 Reads
Image–text relation (ITR) in social media plays a crucial role in mining the semantics of the posts. Vision and language pre-trained models (PTMs) or multimodal PTMs have been used to create multimodal embeddings. The conventional practice of fine-tuning pre-trained models with labeled data for specific image–text relation tasks often falls short due to misalignment between general pre-training objectives and task-specific requirements. In this research, we introduce a cutting-edge pre-trained framework tailored for aligning image–text relation semantics. Our novel framework leverages unlabeled data to enhance learning of image–text relation representations through deep multimodal clustering and multimodal contrastive learning tasks. Our method significantly narrows the disparity between generic Vision-Language Pre-trained Models (VL-PTMs) and image–text relation tasks, showcasing an impressive performance boost of up to 10.4 points in linear probe tests. By achieving state-of-the-art results on image–text relation datasets, our pre-training framework stands out for its effectiveness in capturing and aligning image–text semantics. The visualizations generated by class activation map (CAM) also demonstrate that our models provide more accurate image–text semantic correspondence. The code is available on the website: https://github.com/qingyuannk/ITR.
May 2025
·
1 Read
The traditional genetic algorithm (GA) known as the island model parallel genetic algorithm (IPGA) has limitations regarding speed and accuracy, especially when applied to large-scale datasets in parallel computing environments. This study addresses the optimization challenges associated with IPGA in the context of big data. Initially, an enhanced adaptive covariance matrix evolution strategy (CMA-ES) is implemented to replace the conventional Gaussian evolution strategy used in traditional IPGA. Additionally, a normalization function is integrated into the CMA-ES framework to constrain the range of random values during the iterative optimization process, thereby improving both the speed and accuracy of IPGA in managing big data computations. Furthermore, the MapReduce paradigm is utilized within a Hadoop cluster to optimize the computational process of IPGA, making it more suitable for distributed parallel operations on extensive datasets. Experimental findings indicate that the proposed methodologies significantly enhance the speed and accuracy of IPGA optimization, particularly in the context of large-scale and ultra-large-scale datasets, with a marked improvement in computational speed.
May 2025
·
4 Reads
This paper addresses the challenge of Source-free Domain Adaptation (SFDA), where knowledge is transferred from a labeled source domain to an unlabeled target domain without requiring access to the source data during adaptation. Traditional Unsupervised Domain Adaptation (UDA) methods typically depend on source data availability during training, which raises concerns related to privacy, security, and scalability. Our proposed approach eliminates this dependency by leveraging only a pre-trained source model for adaptation to the target domain. We introduce a comprehensive framework that incorporates iterative centroid refinement for pseudo-labeling, enhanced self-supervised learning strategies, advanced regularization techniques, and dynamic loss weighting mechanisms. These innovations improve feature alignment and classification performance in the target domain. Extensive experiments conducted on diverse datasets, including digital and object benchmarks, demonstrate that our method consistently outperforms state-of-the-art techniques in both accuracy and robustness. Additionally, this study delves into the theoretical foundations of SFDA, providing insights into its efficacy and exploring its practical applications across various domains.
May 2025
·
2 Reads
The Internet of Underwater Things (IoUT) is an emerging revolution in underwater monitoring and communication, offering real-time detection of objects and data collection in harsh aquatic conditions. Severe communication problems, including signal attenuation, high latency, limited bandwidth, and noise/interference vulnerability, reduce underwater network performance and impede IoUT deployment. The proposed work introduces three new techniques to tackle these problems. The first work introduces the Sequential Memory Fusion Network (SMFN) for predicting environmental changes using historical and real-time data and predicting factors such as signal degradation and noise levels. In the second work, the Adaptive Condition-Based Binary Phase-Shift Keying system (ACB-BPSK) dynamically switches between modulation schemes of high-noise and low-noise modes with the help of real-time feedback to achieve signal robustness and optimized data rates. ACB-EBPSK handles adaptation in modulation techniques according to environmental conditions for enhancing wireless communication’s reliability and efficiency. The third contribution is an Energy-Efficient Reinforcement Learning (EERL) framework for routing decisions, which learns from network feedback, packet loss, and congestion-allows inserting energy into the routing policy and prolonging the lifetime of battery-operated devices considering the dynamic natures of network conditions. The developed techniques’ advantages include increased communication reliability, higher data transmission rates, and prolonged network lifetime.
Journal Impact Factor™
CiteScore™
Immediacy Index
Eigenfactor®
Article Influence Score
SNIP
SJR