Springer

Multimedia Tools and Applications

Published by Springer Nature

Online ISSN: 1573-7721

·

Print ISSN: 1380-7501

Disciplines: Ejournals; Multimedia; Multimedia systems

Journal websiteAuthor guidelines

Top-read articles

514 reads in the past 30 days

Algorithm for the bathing procedure
Algorithm for the calculation of time consumed by ‘n’ pilgrims to reach Ghat and in bathing procedure
Crowd view at some of the places in the Mela site [Images captured through: CCTV footage provided by Mela police for (a), (b) and (e); Drone footage for (c) and (d) (during field survey with due permission we have used drone to get an idea of the density at Ghat as CCTV footages were not available for the Ghat areas)]; a crowd of pilgrims at Kali Sadak in one of the peak days, b crowd of pilgrims at Kali Sadak on a normal day, c Pilgrims at the SNG in one of the peak days, d Crowd of pilgrims at the SNG on a normal day and e Crowd of pilgrims at Kali Ramp on a peak day
Framework of our approach
Georeferenced map of Kumbh Mela site

+15

An agent based modeling approach to evaluate crowd movement strategies and density at bathing areas during Kumbh Mela-2019

July 2023

·

1,236 Reads

·

·

G. Ramesh

·

Rohan Chhabra
View access options

Aims and scope


The Multimedia Tools and Applications journal publishes original research on multimedia development, system support tools, and case studies of multimedia applications. Recognized as the first journal in the field of multimedia, it boasts the highest Google h5-index score in this domain. The journal covers a wide range of topics, including computer vision, machine learning, virtual reality, and digital games. With a 2023 impact factor of 3.0, it offers rapid publication and high visibility.

Recent articles


Optimized deep transfer learning techniques for spine fracture detection using CT scan images
  • Article
  • Publisher preview available

February 2025

·

1 Read

G. Prabu Kanna

·

Jagadeesh Kumar

·

P. Parthasarathi

·

[...]

·

Yogesh Kumar

Spine fractures pose an important health concern that requires a quick diagnosis to protect people from any long-term complications. This paper aims to propose an automatic prediction system that uses deep transfer learning techniques for spine fractures and minimizes the time required for classification compared to traditional techniques. The CT scan images of the cervical spine have been collected in the form of JPEG format and subsequently segmented using several phases such as mask exploring, detecting, zooming, and cropping the affected area at 6 different angles followed by the computation of contour features to analyze their geometric as well as intensity-based parameters. Ten advanced deep learning models along with ADAM optimizer are trained where InceptionResNetV2 obtained the highest accuracy of 99.70% while the lowest loss, as well as root mean error square score, has been obtained by ResNet50V2 with 0.0493 and 0.2220.36 respectively. In the case of precision, recall, and F1 score, DenseNet169 computed the highest precision and recall values of 99.99% and 98.495% respectively while Xception did well in terms of F1 score with 98.32%. These models perform better because InceptionResNetV2’s architecture is well-suited for capturing intricate spatial hierarchies, DenseNet169 benefits from dense connectivity for feature reuse, and ResNet50V2’s optimized residual blocks contribute to minimizing error rates. Apart from this, less diversity and class imbalance are also seen within the dataset which affects model performance. These limitations highlight the need for future work to expand the dataset and ensure a balanced distribution of classes to enhance the robustness and clinical relevance of the proposed AI-based system. However, despite these challenges, the results underscore the transformative potential of deep learning in improving clinical diagnosis of spine fractures.


Performance evaluation of the classifiers based on features from cotton leaf images

February 2025

·

1 Read

·

Vishwanath Burkpalli

Cotton disease classification is required because it affects agricultural yield. Leaf disease was classified using machine learning and deep learning classifiers. The performance of the machine learning classifiers is evaluated based on the extracted features. Disease classification is possible after preprocessing, segmentation, and feature extraction stages. Preprocessing was achieved by bilateral filtering and segmentation using the modified Chan Vese method; color moment and texture features were extracted. Finally, multiple classifiers were used for disease classification. Classifier's support vector machine, random forest, K-nearest neighbor, Naïve Bayes, and multilayer perceptron were used for classification. We evaluated the classifier’s performance based on the extracted features. We can conclude that the performance of the classifiers improves when we combine the color and texture features. The combined color moments, gray-level co-occurrence matrix, and local binary pattern features provide a higher classification accuracy than stand-alone features. In the experiment, multilayer perceptron achieved 98% accuracy compared to SVM, K-NN, Naïve Bayes, and random forest. Further, the multilayer perceptron is compared with other well known deep learning models: AlexNet, VGG 16, and ResNet. In comparison, we could observe that the ResNet model achieved an accuracy of 98.92, higher than the other models.


Design of an end-to-end recommendation system for crowdsourced road monitoring applications based on machine learning

Rusan Ahsan

·

Ayan Kumar Panja

·

Moumita Roy

·

Chandreyee Chowdhury

In this work, the design of a robust route recommendation system is proposed based on crowdsourcing. The prevalent research challenge of crowdsourcing, that is, biased or unreliable user opinions has been addressed in the work through a multi-phased data validation framework. The client-server-based multi-tier architecture proposed in the work ensures scalable performance. Data collected for the routes have been applied to their component subparts by the system. Thus, conditions of other routes sharing the roads can also be predicted. The work emphasizes the use of a regression model that is deployed in the cloud for the prediction of the best route. The analysis has been carried out on both synthetic and collected datasets. The work can be easily extended to any crowdsourced recommendation systems that are based on numeric user reviews. The system is implemented as a working prototype and the working is shown for real data collected from the users for the city of Kolkata.


Adaptive pelican optimization with optimized mask RCNN for automatic lung cancer detection

R. Sudha

·

K.M. Uma Maheswari

Lung cancer is a prevalent and deadly disease worldwide, necessitating accurate and timely detection methods for effective treatment. Deep learning-based approaches have emerged as promising solutions for automated medical image analysis. This study proposes an enhanced Mask R-CNN framework tailored specifically for the automatic detection and severity analysis of lung cancer from CT images. The proposed approach consists of three stages namely, pre-processing, lung nodule detection, and segmentation using enhanced Mask R-CNN and severity analysis. Our framework employs a deep convolutional neural network architecture trained on a comprehensive dataset of annotated lung images. By incorporating a region-based convolutional neural network (R-CNN) with a mask prediction branch, our model accurately localizes lung tumors while providing precise pixel-level segmentation masks. To enhance the performance of mask RCNN, the parameter present in the classifier is optimally selected using the adaptive pelican optimization (APO) algorithm. The proposed framework detects lung tumors and provides a comprehensive severity analysis, enabling clinicians to assess cancer stage and progression accurately. Evaluation on a benchmark dataset demonstrates superior detection accuracy and robustness compared to existing methods. Our enhanced Mask R-CNN approach shows promise as a valuable tool for early diagnosis and severity assessment of lung cancer, potentially improving patient outcomes and healthcare efficiency.


Optimal edge thinning compression and similarity based data hiding for 5G empowered architecture

R RoselinKiruba

·

Tamil Thendral M

·

Sahin Onur

·

[...]

·

A Keerthika

In today’s world, wireless technology is omnipresent, and millions of people utilize common applications such as iMessage, WhatsApp, and Zoom in their day-to-day lives. However, securely transmitting information can prove to be a challenge. Steganography is a method designed to protect data by concealing it within a cover medium, such as text, image, audio, or video. Factors considered in this process include the quality of the stego image and the hiding capacity, with an emphasis on maintaining a low distortion rate to evade detection by the human visual system. This paper proposes using the 5G-New Radio (NR) or 4G-Long Term Evolution (LTE) architecture for the data hiding process. In a bid to improve the compression rate and thereby enhance hiding capacity, the study introduces an optimal edge thinning algorithm implemented via the Aquila Optimizer (AO), and a similarity-based edge data hiding process. Initially, any noise present in the cover image is eliminated using a bilateral filter. Subsequently, edges are identified and examined for looping or discontinuous features. The AO optimizer then helps identify optimal edges and utilizes the quantization compression method to compress the image. Lastly, a method of identifying similar pixels in the bit planes is adopted, in these similar pixels without altering their values. Non-similar pixels (i.e., edge regions) hide one-bit data to maintain minimal distortion, while non-edge regions utilize Pixel Value Differencing (PVD) within different ranges to ensure the security of the image. The results indicate that the proposed method guarantees good performance and data hiding capacity, while also reducing processing time.


ML computational inference techniques and indicator metrics for analyzing uncertainties in stock market data

Santosh Kumar Henge

·

Sanjeev Kumar Mandal

·

Amit Sharma

·

[...]

·

Bhupinder Singh

In a growing demand of accurately predicting the stock market and inefficient complex markets the rising accurate relationship prediction is not adequately addressed by the conventional methods. The dynamic and complex natures of data sources become the issue and we need to propose an effective method for adaptive algorithms to accurately forecasting and make decision efficiently. The system framed with two stages of execution: The first stage has covered time series-based predictions for different stock values involving the LSTM Model; the second stage integrated the experimental scenarios based on trend, value, indicators, and it's supporting impacted data metrics, which experimented on various large-cap stock companies. The dataset is sourced online by yahoo finance API with consecutive days data and dependable, non-dependable influenced indicators, which helped to intend trend-based evaluation performances. The proposed model experimented on distinct stock values and test cases executed based on indicator-based prediction with a 76.92% accurate performance rate using a Random Forest Classifier and 76.92% of Logistic Regression. The value-based prediction test case has achieved a performance rate of 97.5% using SVR Regressor and 97.2% using Linear Regressor. It computed value of root mean square error for the Ten large cap companies for performance evaluation.


Deep learning in monocular 3D human pose estimation: Systematic review of contemporary techniques and applications

Divya Udayan J

·

Jayakumar TV

·

Raghu Raman

·

[...]

·

3D human pose estimation is integral to applications, including activity recognition, animation generation, and performance analysis. Deep learning techniques have greatly advanced the field of 3D human pose estimation, allowing for more precise and accurate results. It is crucial that in spite of these advancements, there are still limitations on how these techniques can be used in real-life situations. Accurately calculating poses for multiple subjects or in challenging situations, such as outdoor environments, rapidly evolving scenarios, or circumstances where the subject is too small or too far away from the camera, remains a challenge. These issues are especially noticeable when using monocular camera setups, which might make it more difficult to get precise results. This systematic review synthesizes recent trends in monocular 3D pose estimation, detailing tools, technologies, approaches, and strategies used in the past decade. Out of 432 initial publications, 103 peer-reviewed papers from scholarly databases were selected, offering insights into the field's progression and current trends. The study details the evolution of methodologies, with a focus on regression and detection-based methods, and discusses prevalent datasets and performance metrics. The findings emphasize the need to address challenges like depth ambiguity and occlusions for real-world applicability. This comprehensive review aims to understand the trajectory and prospects of 3D Human Pose Estimation, with implications for diverse applications.


Prediction of traffic time using XGBoost model with hyperparameter optimization

Deepika

·

Gitanjali Pandove

This study introduces a novel framework for enhancing traffic management systems through the integration of Machine learning and Deep Learning approaches. Leveraging both publicly available datasets and data generated through the SUMO simulator, this research presents the Hyper-Tuned Detrended XGBoost framework (HT-DXG) as a robust solution for accurate traffic flow and congestion prediction. The proposed framework incorporates advanced techniques such as detrending and hyperparameter optimization to improve predictive accuracy, focusing on critical metrics like wait time and travel time. The motivation behind this research stems from the need to create more responsive and resilient traffic management systems that can adapt to real-time conditions. Extensive experimentation using diverse traffic scenarios mapped on OpenStreetMap (OSM) highlights the superior performance of HT-DXG compared to baseline models. This work not only provides a comprehensive methodology for traffic prediction but also contributes to the field by generating a unique dataset tailored for real-time traffic analysis in urban environments. The findings offer valuable insights for urban planners and policymakers aiming to mitigate traffic-related challenges in rapidly growing cities.


Red blood cell segmentation and classification using hybrid image processing-squeezenet model

Navya K. T.

·

Nihar Prakash

·

Keerthana Prasad

·

Brij Mohan Kumar Singh

In medical diagnostics, blood testing is considered to be one of the most important clinical examination tests. Manual microscopic inspection of blood cells is time-consuming and subjective. Therefore, an automated blood cell classification system that will help a pathologist to identify the components of blood and diagnose the diseases pertaining to those cells, in a fast and efficient manner is useful. Due to multiple variable factors such as cell types, different stains and magnifications, and data complexities such as cell overlapping, inhomogeneous intensities, background clutters and image artifacts, development of a model for automated diagnosis of blood cells is an arduous task. This paper presents a robust and accurate method of segmenting and classifying the blood cells in Peripheral Blood Smear (PBS) images. The method involves a pre-processing step consisting of Decorrelation Stretching (DCS), followed by histogram matching for stain normalization and a Fuzzy C-means clustering algorithm for the segmentation of Red Blood Cells (RBCs). The segmented blood cells were then counted and classified as normal and abnormal along with the type of abnormalities using the SqueezeNet Deep Learning (DL) model which offered an average classification accuracy of 97.9%.


Dynamic reference camera selection for free-viewpoint video multicast streaming

February 2025

·

7 Reads

Free-viewpoint video (FVV) is an advantageous technology that allows users to interactively change their viewpoint or perspective when viewing a video or a scene. It provides a three-dimensional, dynamic, and immersive viewing experience, allowing users to explore a video scene from different angles and viewpoints as if they were present within the scene. FVV requires transmitting multiple video streams with depth information for different perspectives simultaneously. This places a significant burden on network bandwidth and infrastructure, leading to congestion and potential service degradation. The paper investigates dynamic reference camera grouping approaches, which are based on the users’ specific viewpoint needs. The primary goal of these methods is to minimize the network bandwidth necessary for transmitting reference color and depth camera sequences, thus enabling the provision of interactive FVV services in networks with limited resources. To achieve a reduction in the number of essential camera streams, we introduce dynamic reference camera selection methods that effectively identify shared reference camera pairs capable of serving multiple users for their individual viewpoint synthesis requirements. We found that among the investigated methods, DBSAN performance was the most balanced among the investigated algorithms, achieving 10–20% network load decrease, depending on the network complexity.


Improved multi-class brain tumor mri classification with ds-net: a patch-based deep supervision approach

Brain tumors, with their various features and potential for rapid growth and development, offer significant challenges in healthcare. The accurate classification of these tumors by MRI (magnetic resonance imaging) is important for several reasons. Different tumor types have diverse features, and correct & early classification is important for treatment planning and improving patient survival rates. Traditionally, this classification relied on expert neurologists analyzing MRI scans, a process prone to subjectivity and human error. Accurately classify multi-class brain tumors using MR images is a challenging issue due to different tumor characteristics. Recently, deep learning techniques achieve precise results in disease classification including in brain tumors. The aim of this work is to design a patch-based deep learning model to accurately classify brain tumors. In this research, EfficientNet based MBconv (Mobile inverted bottleneck convolution) Blocks with deep supervision mechanism (DSM) is used to achieve high accuracy and reduce true-negative and false-positive rate. DSM uses a dilated convolutional block with both adjacent & overlapping patches to extract both local and global features. Dilation effectively enlarges the receptive field of filters, enabling them to capture more contextual information without affecting the number of parameters or computational time. Proposed model is evaluated using multiple dataset split, five-fold, and leave-2-out cross-validation on 3 distinct datasets. Proposed model achieves 0.98 accuracy, 0.97 Recall & 0.97 Precision, 0.99 accuracy, 0.97 Recall & 0.98 Precision, and 0.99 accuracy, 0.96 Recall & 0.98 Precision on Figshare, Kaggle, and Sartaj datasets respectively. Graphical abstract


Skin cancer detection using optimized mask R-CNN and two-fold-deep-learning-classifier framework

Skin cancer is a serious and potentially life-threatening condition caused by DNA damage in the skin cells, leading to genetic mutations and abnormal cell growth. These mutations can cause the cells to divide and grow uncontrollably, forming a tumor on the skin. To prevent skin cancer from spreading and potentially leading to serious complications, it's critical to identify and treat it as early as possible. An innovative two-fold deep learning based skin cancer detection model is presented in this research work. Five main stages make up the proposed model: Preprocessing, segmentation, feature extraction, feature selection, and skin cancer detection. Initially, the Min–max contrast stretching and median filtering used to pre-process the collected raw image. From the pre-processed image, the Region of Intertest (ROI) is identified via optimized mask Region-based Convolutional Neural Network (R-CNN). Then, from the identified ROI areas, the texture features like Illumination-invariant Binary Gabor Pattern (II-BGP), Local Binary Pattern (LBP), Gray-Level Co-occurrence Matrix (GLCM), Color feature such as Color Correlogram and Histogram Intersection, and Shape feature including Moments, Area, Perimeter, Eccentricity, Average bending energy are extracted. To choose the optimal features from the extracted ones, the Golden Eagle Mutated Leader Optimization (GEMLO) is used. The proposed Golden Eagle Mutated Leader Optimization (GEMLO) is the conceptual amalgamation of the standard Mutated Leader Algorithm (MLA) and Golden Eagle Optimizer are used to select best features (GEO). The skin cancer detection is accomplished via two-fold-deep-learning-classifiers, that includes the Fully Convolutional Neural Networks (FCNs) and Multi-Layer Perception (MLP). The final outcome is the combination of the outcomes acquired from Fully Convolutional Neural Networks (FCNs) and Multi-Layer Perception (MLP). The PYTHON platform is being used to implement the suggested model. Using the current models, the findings are assessed for accuracy, sensitivity, precision, FPR, FNR, and other metrics. The proposed model has a 92% detection accuracy rating, which is the highest.


Partitioned neural network training via synthetic intermediate labels

February 2025

·

3 Reads

The proliferation of extensive neural network architectures, particularly deep learning models, presents a challenge in terms of resource-intensive training. GPU memory constraints have become a notable bottleneck in training such sizable models. Existing strategies, including data parallelism, model parallelism, pipeline parallelism, and fully sharded data parallelism, offer partial solutions. Model parallelism, in particular, enables the distribution of the entire model across multiple GPUs, yet the ensuing data communication between these partitions slows down training. Instead of using the entire model for training, this study advocates partitioning the model across GPUs and generating synthetic intermediate labels to train individual segments. These labels, produced through a random process, mitigate memory overhead and computational load. This approach results in a more efficient training process that minimizes data communication while maintaining model accuracy. The method is validated using 6-layer fully-connected networks, via the extended MNIST, CIFAR10 and CIFAR100 datasets. It is shown that the computational improvement to reach 90% of the cross-yield accuracy can be as high as 66%. Additionally, the improvement in training bandwidth compared to standard model parallelism is quantitatively demonstrated through an example scenario. This work contributes to mitigating the resource-intensive nature of training large neural networks, paving the way for more efficient deep learning model development.


Chatbot in education: trends, personalisation, and techniques

February 2025

·

7 Reads

Chatbots, as rapidly advancing AI technologies, have become increasingly integrated into human interactions, acting as valuable companions. This study examines the role of chatbots in education. We conducted a thorough literature review on published and stored articles in major databases, including the IEEE Xplore and the ACM Digital Library. Following PRISMA criteria, we analysed research published between 2018 and 2024. Our extensive search identified 720 relevant articles, which were then filtered according to established inclusion and exclusion criteria, resulting in 116 papers that met our quality assessment standards. The study aims to provide insights into three key areas: (1) the current state of chatbots in the education sector; (2) the personalisation of chatbots to enhance teaching and learning experiences; and (3) various techniques for chatbot development. The findings indicate that chatbots are becoming essential tools for students and can even augment the role of human educators. We also addressed the benefits and challenges associated with chatbot applications in education. This paper serves as a valuable resource for academics, developers, and researchers interested in chatbot technology in the educational context.



A comparative study of learning style model using machine learning for an adaptive E-learning

February 2025

·

14 Reads

Before starting the process of adaptive e-learning, it is essential to define the learner’s style which reflects all the characteristics that enable the learner to learn effectively. Traditional style identification methods provide a questionnaire to extract data. This can result in a loss of learning time, and the data obtained may not be relevant as many of the answers are arbitrary. Additionally, the use of a questionnaire in a formal learning context is not effective. For this reason, an approach is proposed that combines the K-means clustering algorithm with the Felder-Silverman Learning Style model (FSLSM). This combination is based on the results of a comparative study between k-means and four learning style models: Honey-Mumford, Kolb, VARK and Felder-Silverman. The selection of these models is based on their adaptability with distance learning environments and their ability to encompass multimedia aspects. Moreover, they are regarded as a reliable pedagogical basis for numerous research projects. The implementation of k-means and FSLSM was primarily based on analyzing learners’ preferred activities and grouping them into clusters, with each cluster representing a specific combination of the FSLSM model. The proposed approach represents an integral phase within an intelligent adaptive learning model. The objective of this study is to find a solution that can cover the different learner preferences and needs to provide effective adaptive e-learning. To measure the effectiveness of the proposed approach, its implementation was carried out within a learning platform dedicated to the primary level. To ensure the relevance of the identified styles, an experiment was conducted using two learning scenarios: the first was adapted to the identified style, while the second was arbitrary. The results showed a high success and engagement rate in favor of the learning scenario based on the proposed approach.


Saliency and contrast mapping based dark image enhancement using multiple illuminance instance

This paper presents a saliency and contrast mapping-based enhancement technique for dark images with web application. The dark images are not favorable to computer vision systems and human observations because of their low brightness. To solve this issue, various enhancement algorithms have been suggested but still those methods suffer from over enhancement and under enhancement problems. This paper proposes a multi-weighted fusion structure for uneven illuminated image enhancement, which is based on the human visual system. Saliency and contrast based image fusion algorithm is capable of providing a good contrast and illumination. Specifically, first multiple instances of input image are derived which deal with illumination estimation. Then, the saliency and contrast weight mapping are designed for fusion to obtain an adjusted illumination component. Then gamma correction is employed to obtain final adjusted illumination component, which is combined with reflectance of the image in subsequent step to obtain a final enhanced output. Experimental results show that the proposed method produces results with good contrast and brightness when is compared to various state-of-the-art enhancement techniques.


Mixed-distance maximization algorithm.
The deep feature learning framework with full-triple relation
Search examples on VIPeR dataset. Each column represents a ranking result with the top image being the query and the rest images being the returned list. The image with the red bounding box is the matched one
Performance on the ETHZ dataset. Our approach: deep feature learning with the mixed distance maximization. (a) SEQ.1 (b) SEQ.2 (c) SEQ.3
Search examples on ETHZ dataset. Each column represents a ranking result with the top image being the query and the rest images being the returned list. The image with the red bounding box is the matched one
Deep learning with full-triple relation for person re-identification

Person re-identification is a challenging task to identify the same person among disjoint camera views. Recently, many deep learning approaches such as architecture based on mixed distance maximization have been proposed, but person re-identification still is suffered from the local optima problem resulted from training strategies and limits of datasets etc. We design a new objective function, which is constructed of local and global objective function based on the full-triple relation unlike the previous version using only two edges of triangle in a set of triplet units. First, we define a local objective function to achieve maximization of intra-distance inside a triplet. Second, we also define a global objective function to take account of distances between triplets. Finally, we propose a main objective function based on two distances in order to take full advantage of the information of triplets. We define this main objective function based on the combination of two distances. This combination of two distances is also called as a mixed distance based on the full-triple relation, which makes the distances between triplets increase and the distances between the matched pairs in each triplet decrease. Our deep learning framework is validated through several datasets and quantitative comparisons to the-state-of-the-art methods and demonstrate that it has potentially more discriminative and more efficient performance for person re-identification.


Deep learning-based automatic food identification with numeric label

February 2025

·

2 Reads

With the changes of the times and the rapid development of fast food, obesity has become a common concern in modern society. It is important for modern people to consider how to effectively manage their diet. Taking fast-food restaurants as an example, this study proposes a numerical label counting method and builds a food recognition system based on deep-learning-based YOLO object detection technology, in which 16 classes of well-known fast-food restaurants are trained, with a total of 1836 multi-label photos. When the detector accuracy rate is IoU = 0.6, mAP can reach 93.29%. In addition, the numerical-label-based multi-count method proposed in this study can quickly search and count a large number of food objects with a time complexity of O(Nlog2N). In a 50-search experiment of 75 items, the numerical-label-based multi-count method can be up to 2.152 times faster than the text label counting method. Moreover, this study uses the Eigen-CAM method of xAI technology to further examine the correctness of the results predicted by the detection model through visualization, where the Eigen-CAM method is one of class activation map (CAM) techniques for analyzing classification problems in computer vision. Finally, users can use the Android-based APP built by this study to identify and count food, quickly query the calories and nutritional content of food, and record information related to food intake, so as to provide users with automated diet management.


Gamispotify: a gamified social music recommendation system based on users’ personal values

February 2025

·

3 Reads

In this article, we have introduced Gamispotify. For the first time, in a social network-based environment, and by benefiting from gamification and crowdsourcing, Gamispotify recommends music to the users based on their personal values. The proposed method has been compared to another published article by conducting two experiments and surveys on 32 application participants. Experimental results indicate that user engagement has significantly increased using the proposed method. Spending time on the application increased by 299.48%, the music play count increased by 150%, and receiving music recommendations increased by 244.41%. Statistical analysis using the Wilcoxon Signed-Rank Test confirmed these improvements were significant (p < 0.05). Moreover, survey results support these findings and show a notable increase in user satisfaction with the proposed music recommendation system compared with the previous work (p = 0.019). It also indicates that the proposed method successfully motivated users to discover new songs (p = 0.043). We have also found that social networking features, recommendations based on personal values, and gamification have positively impacted the users’ motivation to use the music streaming application.


Impact of preprocessing techniques on MRI-based brain tumor detection

Detecting, extracting and classifying different types of brain tumor from MRI images is much needed in the field of medical exploration. Image categorization is an important research area that encounters rising observations from the research community in the field of medical imaging. Although many classifiers are existing in the literature with promising results, many of them are suffering from computational complexity. To overcome the problem of computational complexity and the requirement of large number of datasets, a simple three-step process is proposed, which includes pre-processing, detection and classification. The construction of the proposed pre-processing stage includes Dark Channel Prior, Extended Anisotropic Diffusion Filtering (EADF) followed by principal image generation, and Edge Enhancement. An auto thresholding-based detection is used to detect the absolute tumor area. For the classification of different types of tumors, Resnet-50 has been utilized. Thorough investigations are conducted on BD35H, BMI-1, BTI, BTS, BD_BT and BRATS-2018 dataset. With significantly reduced computational complexity, the suggested model exhibits satisfactory results in terms of detection and classification.


DCARES: deep convolutional neural network with neural-based optimization for image-based product recommender system

The development of recommendation systems represents a critical challenge in machine learning, particularly in today’s digital landscape. These systems play a pivotal role in delivering personalized product and service suggestions tailored to user preferences, thereby enhancing user experience and driving sales revenue. While methodologies such as collaborative filtering, content-based filtering, and hybrid recommender systems have been widely explored, creating a robust and effective recommendation system remains a complex and demanding task. This study introduces the DCARES model, a novel content-based filtering approach that leverages the similarity between product features. The DCARES model integrates a deep convolutional neural network (CNN) with machine learning techniques to form the core of the proposed recommendation system. CNNs are employed to extract rich and intricate features from product images, which are then combined with machine learning models to generate personalized recommendations based on user preferences. To optimize performance, the model is fine-tuned using various neural-based optimization algorithms, with the most effective one selected through rigorous evaluation. Experimental results demonstrate that the proposed model delivers highly accurate and effective recommendations in all metrics: Accuracy, RMSE, MAE and MAPE, showcasing its potential to outperform traditional approaches. By incorporating CNNs into recommendation systems, this study highlights a promising direction in the field of machine learning. The proposed approach not only enhances the quality and precision of recommendations but also opens new avenues for future research and applications, particularly in visually driven domains such as e-commerce and fashion.


A novel approach to identify parkinson’s disease and other similar neural stress by analysing keystrokes on modern active devices with ensemble classification

Motor skills (related to the motor nerve) and neurocognitive disorders affect humans’ typing ability to an extent that is noticeable while using a keyboard, smartphone, or other electronic gadgets. These two medical conditions are Parkinson’s disease (PD), caused by malfunctions of the motor nerve, and neurocognitive disorder, caused by a deficiency of organismic responses to stimuli. A mild symptom of PD, change in fine motor skills during typing, is reflected heavily in keystroke patterns during the early stages. Similarly, Emotional stress (ES) expresses a neurocognitive disorder that affects cognitive abilities as well. Early symptoms of this disorder are reflected in the keystroke patterns according to their severity. As there is no such pathological examination, it is challenging to perceive and measure the development of such disorders already developed in human behaviour as a disease. Furthermore, early screening of these diseases is essential for future diagnosis and preventing fatal consequences, since both are progressive illnesses. A modest attempt is made here to detect two such neurodegenerative disorders in humans using the way they type, formally known as Keystroke dynamics (KD). In this study, a bootstrapped-based homogeneous ensemble classification method has been proposed to address the uncertain performance and uneven distribution of classes for the detection of such medical conditions using users’ typing tendencies. For this purpose, two recent benchmark datasets were used for the validation and confirmation of operational improvements of the proposed method, which have been validated qualitatively and quantitatively. As a result, sensitivity/specificity of 0.82/0.78 in detecting PD and 0.98/0.98 in ES have been achieved, which is robust and accurate in a more realistic evaluation. The proposed framework explores the possibilities of implementing it in web-based systems, which has significant benefits. A better diagnosis, early detection at home, future reference, and treatment management could be reached through it.


Music dynamics visualization for music practice and education

January 2025

·

7 Reads

Currently, musicians rely on their sense of hearing and notation cues on sheet music to precisely perform musical dynamics. However, other tools can help visualize music dynamics efficiently and enable musicians to create, emulate, and perform more effectively. Accordingly, we developed a Musical Dynamics Visualization Method (MDVM) interface program to visualize the amplitude of both a pre-recorded track and the real-time sound of the user playing the score. The interface layers the recorded visuals over real-time visuals to help the player precisely replicate the intended sound. The interface was developed for solo performances to allow them to carefully focus on amplitude changes. This novel MDVM method demonstrated impressive effectiveness in enhancing the expression of musical dynamics. Specifically, it reduced musical dynamic errors by approximately 256%, enhanced music education, and deepened students' emotional engagement. This method, thus, contributes to a more holistic and effective learning experience through its exceptional outcomes.


On measuring the change in historical city centres: an attempt at comparing human perception and deep learning through visual quality of street space

January 2025

·

12 Reads

The quality of street space serves as a pivotal factor in overseeing the preservation, development, and utilization of historic heritage sites by individuals. This study proposes a novel method for quantifying changes in historical environments by assessing visual space quality. The model integrates artificial intelligence (AI)-based image segmentation of street views, representing an indirect form of human perception, with diverse user opinions based on evocation of facade images, reflecting direct human perception. The aim of the study is to evaluate visual space quality by comparing artificial intelligence and human perception within the proposed model, thereby harnessing the strengths of both approaches. The Atatürk High Street of Bursa City, situated within the Khans region, which inscribed on the UNESCO heritage list, was utilized as the study area to validate the method. Workstations, 50 m far away each other were created on Atatürk Street, and 360-degree panoramic images were obtained from these stations with Google Street View and action camera shots for the years 2014, 2018, 2020 and 2023. The obtained images were analyzed with deep learning-based semantic segmentation technique to monitor the changes in the visual quality indicators of greenery, openness, enclosure, imageability, walkability and complexity. The facade images of the workstations were shown to the experts and stakeholders with a survey application, and subjective-semi subjective change was determined over the same parameters. In the assessment of visual space quality, indicators such as openness, greenery, and enclosure, which predominantly encompass physical components, tend to yield objective and subjective results that closely align with each other. Conversely, discrepancies between objective and subjective results emerge for indicators such as imageability and complexity, wherein human emotions and perception exert significant influence.


Journal metrics


3.0 (2023)

Journal Impact Factor™


28 days

Submission to first decision


£2090.00 / $2990.00 / €2390.00

Article processing charge