Multimedia Tools and Applications

Published by Springer Nature
Online ISSN: 1573-7721
Learn more about this page
Recent publications
Article
Photometric stereo methods seek to reconstruct 3D objects using multiple images captured under varied illumination directions. Nevertheless, shadows are still among the most significant problems faced by the photometric stereo and most of the existing formulations disregard this problem although the elimination of shadow greatly improves the results. Usually, authors define empirically a threshold to eliminate pixels that have low brightness. Accordingly, in this paper we present an improved approach to enhance the photometric stereo. Our aim consists to propose an improved formulation for solving the shadow problem and determine the optimal solution. In order to define the threshold value used to solve the shadow problem, we propose an improvement of an existing formulation. Our formulation normalizes the error rate with respect to the threshold, which makes it possible to compare the error rates of a different threshold values. A second contribution consists to find the optimal solution of the normal vectors by adapting the local search method “Tabu search meta-heuristic” to find the optimal solution in the neighborhood of the initial solution. We perform several tests on real objects of different complexity with different parameters values. In order to show the effectiveness of our proposal, a number of comparisons with recent published methods are made. Through these experiments, we show that our proposed method outperforms modern near-field photometric stereo approaches in terms of quality and application that does not require manual intervention.
 
Article
Multi-level secret image sharing is a scheme combining hierarchical shadows with progressive reconstruction. But in previous schemes, the generated shadows are divided into two hierarchies, and these hierarchy layers are stationary without considering dynamic environment demand. In this paper, we propose an improve secret image sharing based on Lagrange interpolation and Birkhoff interpolation. Generated shadows can divided any hierarchy according to specific application needs. In the proposed scheme, the secret image is divided into sub-images with different quality. And the corresponding temporary shadows are generated using some distinctive algorithms. Afterwards, the final shadows with any hierarchy can be generated by combining those temporary shadows. In the image reconstruction, sub-images can be partially reconstructed when the collected shadows come from different hierarchies. The original secret image can be reconstructed progressively by combining sub-images. The higher hierarchy and the more the number of participating shadows, the higher the quality of reconstructed image. When all requirements are met or n shadows have participated, the lossless secret image can be obtained. When some parts of shadows lost, most exist schemes cannot reconstruct the secret completely. However, the proposed method has the capability of the loss recovery. Correctness and security analysis is also given in the experiment.
 
Article
Sparse Subspace Clustering (SSC) based clustering methods have achieved great success since these methods could effectively explore the low-dimensional subspace structure embedded in the original data. However, most existing subspace clustering methods are designed for vectorial data from linear spaces, thus not suitable for high dimensional data (such as imageset or video) with the non-linear manifold structure. In this paper, we propose a unified framework about kernelized sparse subspace clustering on Grassmann manifolds, which can learn the optimal affinity graph with the best clustering index matrix. The experimental results on six public datasets illustrate that the proposed method is obviously better than most related clustering methods based on Grassmann manifolds.
 
Article
Stress and anger are two negative emotions that affect individuals both mentally and physically; there is a need to tackle them as soon as possible. Automated systems are highly required to monitor mental states and to detect early signs of emotional health issues. In the present work convolutional neural network is proposed for anger and stress detection using handcrafted features and deep learned features from the spectrogram. The objective of using a combined feature set is gathering information from two different representations of speech signals to obtain more prominent features and to boost the accuracy of recognition. The proposed method of emotion assessment is more computationally efficient than similar approaches used for emotion assessment. The preliminary results obtained on experimental evaluation of the proposed approach on three datasets Toronto Emotional Speech Set (TESS), Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and Berlin Emotional Database (EMO-DB) indicate that categorical accuracy is boosted and cross-entropy loss is reduced to a considerable extent. The proposed convolutional neural network (CNN) obtains training (T) and validation (V) categorical accuracy of T = 93.7%, V = 95.6% for TESS, T = 97.5%, V = 95.6% for EMO-DB and T = 96.7%, V = 96.7% for RAVDESS dataset.
 
Article
The expressive ability of digital visual media with the simplicity of their acquisition, processing, distribution, manipulation, and storage is such that they are utilized more to convey information over other sources of the information carriers. Formerly and conventionally, there has been credence in the authenticity of digital visual media. Still, with the availability of inexpensive and easy to use digital devices with high-resolution cameras coming in handy in mobile phones and the availability of low-cost and user-convenient editing tools, visual media forgery is ubiquitous. In general, forgery introduces certain artifacts in digital images. The recapturing process eliminates those artifacts and misleads the forensic system. The research work proposed in this paper presents a novel technique for the detection of near-duplicate images by examining the edge profile obtained by the edge histogram descriptor. The difference between the numbers of grouped directional edges present in singly captured and the near-duplicate image is used to build the feature vectors. Based on the training of those feature vectors, a model is generated using the SVM classifier. The proposed method is tested on three datasets of high-resolution and high-quality near-duplicate images, namely, NTU-ROSE, ICL, and Mturk. The evaluated results exemplify that the technique proposed is comparatively better than the state-of-the-art methods for near-duplicate detection. Features extracted from the image of vector length 91 allows an SVM classifier to achieve the precision of 100% and selectivity value above 97%. Furthermore, our results show that the proposed method achieves a performance rate that exceeds the overall accuracy of 99% for all three datasets.
 
Article
Recently, visual animal bio-metrics have attracted much attention for identifying endangered species and animals based on their prominent bio-metric features. This study explores a new possibility of using facial recognition for an inexpensive and user-friendly bio-metric modality to identify cattle. This paper proposes an automatic recognition system to identify cattle based on face and muzzle image features using sparse Stacked Denoising Autoencoder (SDAE) and group sparse representation techniques. The discriminatory features are extracted from cattle’s facial image and muzzle images using sparse SDAE and Deep Belief Networks (DBN) methods to represent extracted features. It mitigates the need for individual bio-metric feature representation from the single face and muzzle bio-metric modality features at various score levels. The single modality features and multi-features are fused at score level to represent better and classify individual cattle using group sparse representation technique. The proposed system’s performance is compared with holistic and handcrafted texture feature extraction and representation technique and current state of the art methods. Author shows that the proposed recognition system yield 96.85% accuracy for identifying individual cattle based on the multi-feature-based representation of cattle’s face and muzzle features.
 
Article
Malware classification continues to be exceedingly difficult due to the exponential growth in the number and variants of malicious files. It is crucial to classify malicious files based on their intent, activity, and threat to have a robust malware protection and post-attack recovery system in place. This paper proposes a novel deep learning-based model, S-DCNN, to classify malware binary files into their respective malware families efficiently. S-DCNN uses the image-based representation of the malware binaries and leverages the concepts of transfer learning and ensemble learning. The model incorporates three deep convolutional neural networks, namely ResNet50, Xception, and EfficientNet-B4. The ensemble technique is used to combine these component models’ predictions and a multilayered perceptron is used as a meta classifier. The ensemble technique fuses the diverse knowledge of the component models, resulting in high generalizability and low variance of the S-DCNN. Further, it eliminates the use of feature engineering, reverse engineering, disassembly, and other domain-specific techniques earlier used for malware classification. To establish S-DCNN’s robustness and generalizability, the performance of proposed model is evaluated on the Malimg dataset, a dataset collected from VirusShare, and packed malware dataset counterparts of both Malimg and VirusShare datasets. The proposed method achieves a state-of-the-art 10-fold accuracy of 99.43% on the Malimg dataset and an accuracy of 99.65% on the VirusShare dataset.
 
Article
No reference method for image quality assessment using shape adaptive wavelet features by applying neuro-wavelet model is proposed in this paper. Images usually consist of visual objects. Degradation of an image ultimately causes distortions to the objects present in the image. Distortions can change the shape of these objects. Quality assessment of an image cannot be said to be complete without assessing the quality of individual objects present in the image. Therefore, deviation in shape has to be quantified along with the quality assessment of an image. Shape Adaptive Discrete Wavelet Transform offers a solution to shape identification problem. The variations in magnitude of feature values are found not proportional to the amount of degradation due to the presence of other artifacts. Wavelet decomposition is applied to capture the small variations observed in extracted features. Separate back propagation neural network models are trained for quality assessment of all kind of images ranging from pristine to bad. Results show improvement in accuracy independent of image databases. It has been observed that the predicted score correlates well with the mean opinion score with 90% accuracy for LIVE dataset, 93% and 95% for TID2008 and TID2013 respectively.
 
Article
Garbage management is an essential task in the everyday life of a city. In many countries, dumpsters are owned and deployed by the public administration. An updated what-and-where list is in the core of the decision making process when it comes to remove or renew them. Moreover, it may give extra information to other analytics in a smart city context. In this paper, we present a capsule network-based architecture to automate the visual classification of dumpsters. We propose different network hyperparameter settings, such as reducing convolutional kernel size and increasing convolution layers. We also try several data augmentation strategies, as crop and flip image transformations. We succeed in reducing the number of network parameters by 85% with respect to the best previous method, thus decreasing the required training time and making the whole process suitable for low cost and embedded software architectures. In addition, the paper provides an extensive experimental analysis including an ablation study that illustrates the contribution of each component in the proposed method. Our proposal is compared with the state-of-the-art method, which is based on a Google Inception V3 architecture pretrained with Imagenet. Experimental results show that our proposal achieves a 95.35% accuracy, 2.35% over the previous best method.
 
Article
In the recent past, Non-Destructive Testing (NDT) methods play a significant role in welding defect detection. Phased array ultrasonic testing is the advanced NDT testing method that is used for accessing the weld integrity. But qualified personnel are required for measuring the geometrical features of the defects in welds. A full-fledged computer-based measuring system is very much required IN ORDER To avoid Manual interpretation, since, it always leads to some errors in inferring the Phased Array (PA) signals and images. This work proposes an artificial intelligence- based measuring approach to enhance the features of the image. The 2D Adaptive Anisotropic Diffusion Filter (2D AADF) is used to remove noise and also scattered pixels are corrected by applying a hexagonal sampled grid. The adaptive mean adjustment (AMA) algorithm would be used for enhancing features of the image in terms of contrast and brightness. The hybrid clustering segmentation incorporates non overlapping K means improved Fuzzy C Means (FCM) pixels clustering. The automatic seed point is selected by K means clustering, and high-intensity gradient is estimated by applying Gradient Cluster Fuzzy C Means (GCFCM) segmentation method. After segmenting the Region of Interest (ROI), the features are estimated due to Gray Level Co-Occurrence Matrix (GLCM). The computed segmented image features are further given to deep learning to classify different stages of welding defect. The investigational results are proven that the proposed algorithm would be high professional and accurate. The above technique is also implemented on the images acquired from the Omniscan MX2 instrument with a suitable linear array probe with 64 transducer elements in it. The proposed work aims to propose an automatic measurement technique for identifying the defect and characterizing it.
 
Article
Diabetic retinopathy (DR) – one of the diabetes complications – is the leading cause of blindness among the age group of 20–74 years old. Fortunately, 90% of these cases (blindness due to DR) could be prevented by early detection and treatment via manual and regular screening by qualified physicians. The screening of DR is tedious, which can be subjective, time-consuming, and sometimes prone to misclassification. In terms of accuracy and time, many automated screening systems based on image processing have been developed to improve diagnostic performance. However, the accuracy and consistency of the developed systems are largely unaddressed, where a manual screening process is still the most preferred option. The main contribution of this paper is to analyse the accuracy and consistency of microaneurysm (MA) detection via image processing by focusing on Otsu’s multi-thresholding as it has been shown to work very well in many applications. The analysis was based on Monte Carlo statistical analysis using synthetic retinal images of retinal images under variation of all stages of DR, retinal, and image parameters – intensity difference between MAs and blood vessels (BVs), MA size, and measurement noise. Then, the conditions – in terms of obtainable retinal and image parameters – that guarantee accurate and consistent MA detection via image processing were extracted. Finally, the validity of the conditions to guarantee accurate and consistent MA detection was verified using real retinal images. The results showed that MA detection via image processing is guaranteed to be accurate and consistent when the intensity difference between MAs and BVs is at least 50% and the sizes of MAs are from 5 to 20 pixels depending on measurement noise values. These conditions are very important as a guideline of MA detection for DR.
 
Article
Pen ink analysis is an essential step for establishing the integrity of a handwritten document. Traditional approaches for analyzing the ink are based on destructive techniques like thin layer chromatography, high performance liquid chromatography, etc. There are several non-destructive techniques too which focus on multi-spectral imaging on various wavelengths in ultraviolet, visible, and infra-red range. On the contrary, there have been very limited techniques that focus on analyzing scanned handwritten documents using normal scanners in the visible light spectrum. In this regard, a novel method has been proposed to discriminate pen inks on a handwritten document. The proposed technique considers a set of statistical (global) and motif (local) textural features of ink colors for differentiating pen inks in a pair of words. This is the first use of motif textural features along with statistical textural features for this task. In this work, absolute differences between feature values from a pair of words are considered as feature vector. This feature vector is fed to a binary classifier to determine whether those two words are written using same pen or different pens. Five different classifiers (decision tree, k-nearest neighbor, random forest, radial-basis-function kernel support vector machine, and multi-layer perceptron) have been experimentally tested for this task. Experimental results reveal that the proposed method with random forest classifier outperforms other classifiers for this task. Furthermore, the comparative analysis confirms the efficacy of the proposed method w.r.t. state-of-the-art methods for pen ink discrimination in visible light spectrum.
 
Article
Small obstacles can cause big accidents, even if the vehicle is equipped with an intelligent auxiliary system. In order to detect four kinds of small obstacles quickly and accurately, this paper proposes an optimized neural network algorithm based on YOLOv3. K-Means+ is used to determine the prior box and enhance the adaptability of the YOLO scale. For the data samples imbalance, loss function of YOLO is improved to increase the precision of the prediction box. In addition, a special classification and counting algorithm is proposed to get results quickly and visually. The experimental results show that the our method can classify and locate four kinds of small obstacles more accurately and faster.
 
Article
In modern communication system, speech communication is almost utilized in vast range of applications. Usually, during transmission of speech signal, environment interference causes degradation of signal. Few speech interference which affects quality of speech signal are acoustic noise, acoustic reverberation or white noise. In this research work, it is aimed to estimate the noise in the speech signal using Recurrent Function Network (RFN). The proposed technique is termed as Recurrent RATS Function Network (RRFN). The proposed network estimates the different noise exists in the input noisy speech signal. Once the noises are identified in speech signal, features are estimated using novel radial based RATS (Robust Automatic Transcription of Speech) approach. Further to enhance the clarity of speech signal, a novel generalized recursive singular value technique integrated in elliptic filter is used to effectively remove noises in the speech signal. Simulation analysis is performed for proposed RFN and compared with existing techniques in terms of PESQ and STOI. The proposed method exhibits good performance improvement over the existing techniques for different SNR levels.
 
Article
One of the primary clinical observations for screening the novel coronavirus is capturing a chest x-ray image. In most patients, a chest x-ray contains abnormalities, such as consolidation, resulting from COVID-19 viral pneumonia. In this study, research is conducted on efficiently detecting imaging features of this type of pneumonia using deep convolutional neural networks in a large dataset. It is demonstrated that simple models, alongside the majority of pretrained networks in the literature, focus on irrelevant features for decision-making. In this paper, numerous chest x-ray images from several sources are collected, and one of the largest publicly accessible datasets is prepared. Finally, using the transfer learning paradigm, the well-known CheXNet model is utilized to develop COVID-CXNet. This powerful model is capable of detecting the novel coronavirus pneumonia based on relevant and meaningful features with precise localization. COVID-CXNet is a step towards a fully automated and robust COVID-19 detection system.
 
Article
Feature pyramid network (FPN) has been an efficient framework to extract multi-scale features in object detection. However, current FPN-based methods mostly suffer from the intrinsic flaw of channel reduction, which brings about the loss of semantical information. And the miscellaneous feature maps may cause serious aliasing effects. In this paper, we present a novel channel enhancement feature pyramid network (CE-FPN) to alleviate these problems. Specifically, inspired by sub-pixel convolution, we propose sub-pixel skip fusion (SSF) to perform both channel enhancement and upsampling. Instead of the original 1 × 1 convolution and linear upsampling, it mitigates the information loss due to channel reduction. Then we propose sub-pixel context enhancement (SCE) for extracting stronger feature representations, which is superior to other context methods due to the utilization of rich channel information by sub-pixel convolution. Furthermore, we introduce a channel attention guided module (CAG) to optimize the final integrated features on each level. It alleviates the aliasing effect only with a few computational burdens. We evaluate our approaches on Pascal VOC and MS COCO benchmark. Extensive experiments show that CE-FPN achieves competitive performance and is more lightweight compared to state-of-the-art FPN-based detectors.
 
Article
Rapid developments in swarm intelligence optimizers and computer processing abilities make opportunities to design more accurate, stable, and comprehensive methods for color image segmentation. This paper presents a new way for unsupervised image segmentation by combining histogram thresholding methods (Kapur’s entropy and Otsu’s method) and different multi-objective swarm intelligence algorithms (MOPSO, MOGWO, MSSA, and MOALO) to thresholding 3D histogram of a color image. More precisely, this method first combines the objective function of traditional thresholding algorithms to design comprehensive objective functions then uses multi-objective optimizers to find the best thresholds during the optimization of designed objective functions. Also, our method uses a vector objective function in 3D space that could simultaneously handle the segmentation of entire image color channels with the same thresholds. To optimize this vector objective function, we employ multi-objective swarm optimizers that can optimize multiple objective functions at the same time. Therefore, our method considers dependencies between channels to find the thresholds that satisfy objective functions of color channels (which we name as vector objective function) simultaneously. Segmenting entire color channels with the same thresholds also benefits from the fact that our proposed method needs fewer thresholds to segment the image than other thresholding algorithms; thus, it requires less memory space to save thresholds. It helps a lot when we want to segment many images to many regions. The subjective and objective results show the superiority of this method to traditional thresholding methods that separately threshold histograms of a color image.
 
Article
Segmentation of brain tumors from Magnetic Resonance Imaging (MRI) is a challenging and essential task for brain tumor detection. In this article, a new two-fold fuzzy and deep learning-based approach is proposed for the segmentation of three types of brain tumor. A total of eight CNN-based approaches are proposed, from the basic architecture (Brainet1.01) to the final architecture DeepBrainet2.0 to find out the optimal one for brain tumor segmentation. Brainet1.01 is upgraded to DeepBrainet2.0 via Brainet1.02 to Brainet1.04, Brainet2.0, Brainet3.0, and Deepbrainet1.0 by changing the number of layers, neurons, and type of connection between neurons. The best result (final brain tumor mask) is achieved by using DeepBrainet2.0 which utilizes an efficient skip-connection mapping plan to lean the brain tumor features. To make DeepBrainet2.0 efficient, enhanced train and test instances are created by utilizing a fuzzy-logic-based method. Also, a second augmented dataset is created which applies five types of augmentations to the images of the first dataset to convert the system into robust from the alteration in orientation, scale, and flip. The outcome of the proposed method is tested on three datasets where the accuracy rate obtained are 94.3%, 96.7%, and 95.2% which specifies the efficacy of the proposed approach.
 
Article
Electroencephalogram (EEG) is the key component in the field of analyzing brain activity and behavior. EEG signals are affected by artifacts in the recorded electrical activity; thereby it affects the analysis of EGG. To extract the clean data from EEG signals and to improve the efficiency of detection during encephalogram recordings, a developed model is required. Although various methods have been proposed for the artifacts removal process, sill the research on this process continues. Even if, several types of artifacts from both the subject and equipment interferences are highly contaminated the EEG signals, the most common and important type of interferences is known as Ocular artifacts. Many applications like Brain-Computer Interface (BCI) need online and real-time processing of EEG signals. Hence, it is best if the removal of artifacts is performed in an online fashion. The main intention of this proposal is to accomplish the new deep learning-based ocular artifacts detection and prevention model. In the detection phase, the 5-level Discrete Wavelet Transform (DWT), and Pisarenko harmonic decomposition are used for decomposing the signals. Then, the Principle Component Analysis (PCA) and Independent Component Analysis (ICA) are adopted as the techniques for extracting the features. With the collected features, the development of optimized Deformable Convolutional Networks (DCN) is used for the detection of ocular artifacts from the input EEG signal. Here, the optimized DCN is developed by optimizing or tuning some significant parameters by Distance Sorted-Electric Fish Optimization (DS-EFO). If the artifacts are detected, the mitigation process is performed by applying the Empirical Mean Curve Decomposition (EMCD), and then, the optimized DCN is used for denoising the signals. Finally, the clean signal is generated by applying inverse EMCD. Based on the EEG data collected from diverse subjects, the proposed method has achieved a higher performance than that of conventional methods, which demonstrates a better ocular-artifact reduction by the proposed method.
 
Experimental equipment
Subjective evaluation results
Article
With the rapid development of civil aviation industry and the continuous improvement in passenger demand, higher requirements have been put forward for safety and comfort in the design process of civil aircraft. In order to develop a special ergonomic simulation experimental system for civil aircraft cabin seats, from the perspective of “human”, this paper comprehensively considers the impact on civil aircraft cabin seats environment on human physiology and psychology, and puts forward an evaluation method of civil aircraft cabin ergonomic simulation experimental system based on passengers’ perception of key design features. Based on the principles of ergonomics, the connection between key design features of seat comfort and the requirements of user preference is comprehensively considered, a simulation experimental system based on comfort evaluation is constructed using subjective and objective evaluation methods. Moreover, according to the experimental requirements of the simulation experimental system and the characteristics of seat structure, an ergonomic simulation experimental system for civil aircraft cabin is developed. The final experimental results prove the practicability and validity of the simulation experiment system designed.
 
Article
For the text content is known, the semantic information and speaker characteristics in the speech signal can be used for speech recognition and speaker verification respectively in text prompt speaker recognition, which solves the problem of forged recordings in the process of text association. In practical applications, by combining speech recognition and speaker recognition technologies, a double verification effect can be achieved, which also can effectively improve security. There are few studies on the combination of speaker recognition and speech recognition in Tibetan, mainly using non-end-to-end methods, and the performance of the model is not ideal. Based on the original research, this paper uses the mainstream end-to-end method to study the speaker verification part. The network model uses ResNet-34 and ResNet-50, and fine-tuned them. “Open set” speaker verification is essentially metric learning. The ideal embedding is to compress the frame-level features into a compact speech-level representation, thereby maximizing the inter-class distance and minimizing the intra-class distance. For the loss function, we use three classification objective loss functions and three metric learning objective loss functions to extensively evaluate the performance of the model. In order to further improve the performance of the model, we fused the two loss functions of Softmax and Angular Prototype. The experimental results show that the effect of Fast ResNet-50 is better than that of Fast ResNet-34, and the model effect of the Angular Prototype loss function is better than other single loss functions. The model with the fused loss function has the best performance, with an equal error rate of 4.25%.
 
Article
Audio Event Detection (AED) pertains to identifying the types of events in audio signals. AED is essential for applications requiring decisions based on audio signals, which can be critical, for example, for health, surveillance and security applications. Despite the proven benefits of deep learning in obtaining the best representation for solving a problem, AED studies still generally employ hand-crafted representations even when deep learning is used for solving the AED task. Intrigued by this, we investigate whether or not hand-crafted representations (i.e. spectogram, mel spectogram, log mel spectogram and mel frequency cepstral coefficients) are better than a representation learned using a Convolutional Autoencoder (CAE). To the best of our knowledge, our study is the first to ask this question and thoroughly compare feature representations for AED. To this end, we first find the best hop size and window size for each hand-crafted representation and compare the optimized hand-crafted representations with CAE-learned representations. Our extensive analyses on a subset of the AudioSet dataset confirm the common practice in that hand-crafted representations do perform better than learned features by a large margin (∼\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\sim $\end{document}30 AP). Moreover, we show that the commonly used window and hop sizes do not provide the optimal performances for the hand-crafted representations.
 
Article
Facial animation is a fundamental challenge that requires mathematical and computational strategies. In this paper, a novel facial animation technique using numerical traced algorithm is introduced. Homotpy-based animation methodology (HAM) uses the homotopy curve path in order to novelty generate intermediate frames for different λ values and therefore it represents the deformations from starting image to ending image. These deformations use system of equations embedded into a single homotopy equation in order to represent intermediate frames. Moreover, a hyperspherical tracking method establishes deformations with visually consistent and smooth changes. Experimental results reveal intermediate frames that can be interpreted as facial animation. Furthermore, histogram plots, homotopic trajectories, and pixel variation tables confirm that different pixel positions vary with different rates of change as the original image is transformed into the target image. Besides, these frames do not need external filters in order to correct visual interpretation errors and therefore the homotopy-based animation method can be considered as a useful alternative for animating facial images in different applications.
 
Embedding the visitor’s image inside the host image to create stego-image
The embedding and extraction process of the visitor’s information using the proposed technique
Testing grayscale images of size 512 × 512
Comparative RS-plots (a) 3LSB for Image 1, (b) proposed for Image 10, (c) proposed for Image 9
PDH analysis showing the fluctuation between the HI and SI for (a) (b)
Article
An unauthorized leak of information that is not tolerable always implies inadequate or poor security measures of information. Globally, the hospitality industry has in the recent past been targeted by cybercrimes. Management of cybercrimes are divided into facets like policies of security frameworks, cyber-threats, and management appreciating the value of Information Technology investment. These aspects possess a notable influence on the information security of an organization. This study's purpose is to examine cybersecurity activities of network threats, electronic information, and the techniques of preventing cybercrime in hotels. Helping Chief Information Officers (CIO) and the directors of information technology is the main aim of the research to improve policy for electronic information security in the hospitality industry and recommending several tools and techniques to stabilize the network of computers. Further, to protect personal information disclosure for the visitors, an information hiding technique has been proposed by utilizing the least significant bits (LSBs) of each pixel of grayscale image adopting XOR features of the host image (HI) pixels. Also, the proposed technique successfully withstands various steganalysis attacks like regular and singular (RS) attack, pixel difference histogram (PDH) attack, and subtractive pixel adjacency matrix (SPAM) steganalysis.
 
Article
In recent years the data hiding on encrypted image is significant topic for data security. Due to its capability for preserving confidentiality, Reversible Data Hiding (RDH) on encrypted domain will helpful on cloud computing as an emerging technology. In this paper, an error free and mean value based RDH process with high capacity on encrypted images is proposed. Here, the image provider (content owner), data hider and data receiver are three parties. Initially, the image provider keeps the averages to obtain the modified image. Encrypt the modified image further through the aid of the encryption key and pass it to data hider. In data hider, Gravitational Search Algorithm (GSA) is utilized to find the best pixel locations in encrypted image for information hiding. Moreover, the similar encrypted image size, an alpha channel is created and then combines the encrypted image and the alpha channel for embedding secret data to make an encrypted image with embedded data. On receiver side, through the aid of encryption key, the receiver recovers the image by decrypting the encrypted image and also the secret bits are removed through the aid of data hiding keys by data extraction. The experimental result demonstrates that proposed system has better performance compared to other state-of- art strategies. The peak signal-to-noise ratio (PSNR) achieved by the proposed system is better than 8.36% and 18.3% compared to other existing works.
 
Article
Gait is one of the most important biomarkers for Parkinson’s disease (PD). Nonetheless, current clinical diagnosis to quantify locomotion patterns uses coarse approximations, from a set of reduced marker-based trajectories. This approximation, among others, results restrictive, invasive, alters natural gait gestures, and leaves out relevant PD patterns. This paper introduces a new computational approach to quantify, classify and explain Parkinson gait patterns using a markerless video strategy. The core of the work is a local volumetric covariance to codify motion patterns during locomotion. Such covariance codifies convolutional pre-trained features tracked along a set of dense trajectories which represent subject’s gait. Covariance pattern computation involves an integral strategy to remain efficient in terms of computational cost. The proposed method was evaluated on 176 gait video sequences of a total of 22 patients among control and diagnosed with PD. The proposed approach achieved a remarkable average accuracy of 96.59% (± 0.13) with a sensitivity of 98.86%, specificity of 94.31%, and precision of 94.56%. These results suggest that the proposed approach may support clinical PD diagnosis and analysis using ordinary videos.
 
Article
A Radio Base Station (RBS), part of the Radio Access Network, is a particular type of equipment that supports the connection between a wide range of cellular user devices and an operator network access infrastructure. Nowadays, most of the RBS maintenance is carried out manually, resulting in a time consuming and costly task. A suitable candidate for RBS maintenance automation is repairing faulty links between devices caused by missing or unplugged connectors. This paper proposes and compares two Deep Learning (DL) solutions applied to identify attached RJ45 connectors on network ports. We named Connector Detection, the DL solution based on object detection, and Connector Classification, the one based on object classification. With connector detection, we achieve an accuracy of 0.934 and a mean average precision of 0.903. Connector Classification, reaches a higher maximum Accuracy of 0.981 and an Area Under the Receiving Operating characteristic Curve (AUC) of 0.989. Although Connector Detection was outperformed in this particular study, it is more flexible for scenarios where there is a lack of precise information about the environment and the possible devices. This in contrast with Connector Classification which requires such information to be well-defined beforehand.
 
Article
Classifying a weapon based on its muzzle blast is a challenging task that has significant applications in various security and military fields. Most of the existing works rely on ad-hoc deployment of spatially diverse microphone sensors to capture multiple replicas of the same gunshot, which enables accurate detection and identification of the acoustic source. However, carefully controlled setups are difficult to obtain in scenarios such as crime scene forensics, making the aforementioned techniques inapplicable and impractical. We introduce a novel technique that requires zero knowledge about the recording setup and is completely agnostic to the relative positions of both the microphone and shooter. Our solution can identify the category, caliber, and model of the gun, reaching over 90% accuracy on a dataset composed of 3655 samples that are extracted from YouTube videos. Our results demonstrate the effectiveness and efficiency of applying Convolutional Neural Network (CNN) in gunshot classification eliminating the need for an ad-hoc setup while significantly improving the classification performance.
 
Article
Animal models are helpful to evaluate the effects of some drugs in the treatment of brain diseases, such as the case of the Open Field Maze. Usually, these tests are recorded in video and analysed afterwards to carry out manual annotations about the activity and behaviour of the rat. Usually, these videos must be watched repeatedly to ensure correct annotations, but they are prone to become a tedious task and are highly likely to produce human errors. Existing commercial systems for automatic detection of the rat and classification of its behaviours may become inaccessible for research teams that cannot afford the license cost. Motivated by the latter, we propose a methodology for simultaneous rat detection and behaviour classification using inexpensive hardware in this work. Our proposal is a Deep Learning-based two-step methodology to simultaneously detect the rat in the test and classify its behaviour. In the first step, a single shot detector network is used to detect the rat; then, the systems crop the image using the bounding box to generate a sequence of six images that input our BehavioursNet network to classify the rodent’s behaviour. Finally, based on the results of these steps, the system generates an ethogram for the complete video, a trajectory plot, a heatmap plot for most visited regions and a video showing the rat’s detection and its behaviours. Our results show that it is possible to perform these tasks at a processing rate of 23 Hz, with a low error of 6 pixels in the detection and a first approach to classify ambiguous behaviours such as resting and grooming, with an average precision of 60%, which is competitive with that reported in the literature.
 
Article
Segmentation of television news videos into programs and stories (after removing advertisements) is a necessary first step for news broadcast analysis. Existing methods have used manually defined presentation styles as an important feature for such segmentation. Manually defined presentation styles make algorithms channel specific and hampers scalability for large number of channels. In this work, we advocate the usebility of overlay text for automatic characterization of broadcast presentation styles. This automatic characterization will minimize the manual intervention required in developing the scalable solutions for television news broadcast segmentation. To this end, we introduce three novel features solely derived from position and content of overlay text bands. These are Bag of Bands (BoB), BoB Templates (BoBT) and Text-based Semantic Similarity (TSS). The BoB features characterize on-screen distribution of text bands and are used with classifiers for advertisement detection. The BoBT features characterize co-occurrence of text bands. Thereby modeling the presentation styles of video shots. Sequences of BoBT features are modeled using Conditional Random Fields (CRFs) for identifying program boundaries. Sequences of features derived from semantic similarity (TSS) between consecutive shots and BoBT feature are used with CRFs for story segmentation. Performances of the proposed features are validated on 360 hours of broadcast data recorded from three Indian English news channels. Benchmark on baseline methods has shown better performance of our proposal.
 
Article
The demand for security of shared digital and printed images is increasing year after year. There is a need for a robust watermarking scheme capable of offering high detection rates for very aggressive attacks, such as a print-scan process. However, high robustness of the watermark method usually leads to perceptible visual artifacts. Therefore, in this paper, a novel DFT watermarking method with gray component replacement (GCR) masking is proposed. The watermark is additively embedded in the magnitude coefficients of the image within the Fourier domain and then masked using GCR masking to hide the artifacts introduced by embedding. Experimental results show that GCR masking is capable of completely hiding introduced artifacts and increase the visual quality of the watermarked image using both objective and subjective measures. It also outperforms similar state-of-the-art methods with respect to robustness against image attacks. The method is especially suited for printing since the watermark is hidden in the visible part of the electromagnetic spectrum but is detectable in the infra-red (IR) part of the spectrum using an IR-sensitive camera sensor.
 
Article
Semantic segmentation involves extracting meaningful information from images or input from a video or recording frames. It is the way to perform the extraction by checking pixels by pixel using a classification approach. It gives us more accurate and fine details from the data we need for further evaluation. Formerly, we had a few techniques based on some unsupervised learning perspectives or some conventional ways to do some image processing tasks. With the advent of time, techniques are improving, and we now have more improved and efficient methods for segmentation. Image segmentation is slightly simpler than semantic segmentation because of the technical perspective as semantic segmentation is pixels based. After that, the detected part based on the label will be masked and refer to the masked objects based on the classes we have defined with a relevant class name and the designated color. In this paper, we have reviewed almost all the supervised and unsupervised learning algorithms from scratch to advanced and more efficient algorithms that have been done for semantic segmentation. As far as deep learning is concerned, we have many techniques already developed until now. We have studied around 120 papers in this research area. We have concluded how deep learning is helping in solving the critical issues of semantic segmentation and gives us more efficient results. We have reviewed and comprehensively studied different surveys on semantic segmentation, specifically using deep learning.
 
Article
Over the last decade, deep learning has revolutionized many traditional machine learning tasks, ranging from computer vision to natural language processing. Although deep learning has achieved excellent performance, it does not perform as well as expected on geometric (non-Euclidean domain) data. Recently, many studies on extending deep learning approaches for graphs and manifolds have merged. In this article, we aim to provide a comprehensive overview of geometric deep learning and comparative methods. First, we introduce the related work and history of the geometric deep learning field and the theoretical background. Next, we summarize the evaluation of the methods of graph and manifold. We further discuss the applications and benchmark datasets of these methods across various research domains. Finally, we propose potential research directions and challenges in this rapidly growing field.
 
Article
In this paper we present an indexing method for probably approximately correct nearest neighbor queries in high dimensional spaces capable of improving the performance of any index whose performance degrades with the increased dimensionality of the query space. The basic idea of the method is quite simple: we use SVD to concentrate the variance of the inter-element distance in a lower dimensional space, Ξ. We do a nearest neighbor query in this space and then we “peek” forward from the nearest neighbor by gathering all the elements whose distance from the query is less than $d_{\Xi }(1+\zeta \sigma _{\Xi }^{2})$ d Ξ ( 1 + ζ σ Ξ 2 ) , where d Ξ is the distance from the nearest neighbor in Ξ, $\sigma _{\Xi }^{2}$ σ Ξ 2 is the variance of the data in Ξ, and ζ a parameter. All the data thus collected form a tentative set T , in which we do a scan using the complete feature space to find the point closest to the query. The advantages of the method are that (1) it can be built on top of virtually any indexing method and (2) we can build a model of the distribution of the error precise enough to allow designing a compromise between error and speed. We show the improvement that we can obtain using data from the SUN data base.
 
Article
To address the problem of redundant learning in remote sensing scene classification, a method of multi-space-scale frequency covariance pooling (MSFCP) is proposed in this study. Specifically, a Gabor filter is introduced to the network which reduced redundant learning in ordinary convolution filters and enhanced the robustness of the network to external interference. Secondly, reducing redundant information in low-frequency components via dividing the feature map output by the first layer into high and low-frequencies and performing average pooling for low-frequency information. Next, the introduction of the Octave Convolution (OctConv) operation realized self-update and information interaction of high and low-frequency characteristics. Finally, the global covariance pooling is performed on the output feature map to enhance the representation ability of the entire network and boost the classification effect. Our method performed an accuracy value of 99.35 ± 0.28 (%) on the UC Merced Land Use dataset. The experimental results demonstrate that the proposed MSFCP method achieves better classification performance and lower network model complexity than other methods, which significantly reduces the demand for computing power. Hence, a good trade-off is achieved between experimental accuracy and computational resource consumption.
 
Article
Social media evolution in the recent past has drastically changed the way the affected community responds to any disaster. Within moments of happening of any disaster the activity on social media platforms swiftly jumps many folds. The affected population starts posting messages to update their family and friends about the situation and to seek help. The high volume and variety of this social media content pose challenges for the Government authorities and rescue workers to timely respond to the needs of the people. Effective computational analysis of this information can serve as an efficient tool to boost the desired response and rescue operations during such events. Recently with the significant progress in artificial intelligence systems, many machine learning methods for automatic filtration of relevant messages have been proposed by researchers. However, most of these methods follow supervised learning techniques that need abundant labeled data for training which is hard to arrange at the onset of the disaster. To overcome this constraint for a disaster in progress, our study proposes a novel deep learning convolutional network based on the concept of Unsupervised Domain Adaptation (UDA) that can classify unlabeled data for a new disaster (target domain) using the labeled data available from previous disaster (source domain). Through experiments performed on the images of seven disasters, the current work demonstrates that the proposed UDA method using the Maximum Mean Discrepancy (MMD) metric outperforms different state-of-the-art methods even without having labeled data for the new disaster.
 
a Captures faulty vibration Signal. b Captures Faulty Gear Signal
BLSTM-WPCA Based Faulty Gear Diagnosis
“Swish Activation Function [1]
Various Faulty signals of gear
Performance Analysis of Proposed Work on the Cross-validation set
Article
The early faulty gear diagnosis is most necessary in the industry. In the current decade, with the tremendous growth of ANN (Artificial Neural Network), the researcher planned to use DL (Deep Learning) methods to sketch out faults in gear in an early stage. Traditional gear fault diagnosis method mostly utilizes deep NN (Neural Network) related to tine sequence of gathered signals. In this instance, feature extraction in the direction of inverse time domain signal is commonly ignored. To overcome this issue, here in this paper, proposed Weighted Principal Component Analysis (WPCA) and BLSTM (Bi-Directional Long Short Term Memory) along with Swish Activation function for faulty gear diagnosis from the vibration signals. WPCA is utilized to extract multi-scale features related to faulty gear from the vibration signal. Likewise, BLSTM is used to classify the extracted features to diagnose the fault in an earlier stage. Several experiments were conducted to evaluate the proposed work of categorizing the defects in gear from the vibrating signal. Experiments were conducted on three kinds of the dataset to classify the type of faulty gear accurately. The proposed work proves its superiority in organizing the gear faults in a most efficient way than existing methods.
 
Article
Having a system that can take an image of a natural scene and accurately classify the plants in it is of undeniable importance. However, the complexities of dealing with natural scene images and the vast diversity of plants in the wild make designing such a classifier a challenging task. Deep Learning (DL) lends itself as viable solution to tackle such complex problem. However, advanced in DL architectures and software (including DL frameworks) come with a high cost in terms of energy consumption especially when employing Graphics Processing Units (GPU). As data expands rapidly, the need to create energy-aware models increases in order to reduce energy consumption and move towards “Greener AI”. Since the problem of designing energy-aware architectures for plant classification has not been studied significantly in the literature, our work comes to start bridging this gap by focusing not only on the models’ performance, but also on their energy usage on both CPU and GPU platforms. We consider different state-of-the-art Convolutional Neural Networks (CNN) architectures and train them on two famous challenging plants datasets: iNaturalist and Herbarium. Our experiments are meant to highlight the trade-off between accuracy and energy consumption. For examples, the results show that while GPU-bound models can be about 40% faster in terms of training time than simple models running on CPU, the latter’s energy consumption is only two thirds of the former. We hope that such findings will encourage the community to reduce its reliance on accuracy measures to compare different architectures and start taking other factors into account such as power consumption, simplicity, etc.
 
Article
Congestive Heart Failures (CHFs) are prevalent, expensive, and deadly, causing damage or overload to the pumping power of the heart muscles. These leads to severe medical issues amongst humans and contribute to a greater death risk of numerous diseases at a later stage. We need accurate and less difficult techniques to detect these problems in our world with a growing population which will prevent many diseases and reduce deaths. In this work, we have developed a technique to diagnose CHF using the Electrocardiomatrix (ECM) technique. The 1-D ECG signals are transformed to a colourful 3D matrix to diagnose CHF. The detection of CHF using ECM are then compared with annotated CHF Electrocardiogram (ECG) signals manually. It has been found that ECM is able to detect the affected CHF duration from the ECG signals. Also, the ECM provides the reduction in both false positive and false negative which in turn improves the detection accuracy. The performance of the proposed approach has been tested on BIDMC CHF database. The proposed method achieved an accuracy of 97.6%, sensitivity of 98.0%, specificity of 97.0%, precision of 99.4%, and F1-Score of 98.3% . From this study, it has been revealed that the ECM technique allows the accurate, intuitive, and efficient detection of CHF and using ECM practitioners can diagnose the CHF without sacrificing the accuracy.
 
Article
Effective stock market prediction can significantly assist individual and institutional investors to make better trading decisions and help government stabilize the market. Therefore, a variety of methods have been proposed to tackle the issue of stock market prediction recently. However, it is still quite challenging to effectively extract the correlations and temporal information from multivariate time series of market data and integrate various kinds of features as well as auxiliary information, which is important for improving the performance of stock market prediction. This paper proposes an entirely Transformer based model, namely Gated Three-Tower Transformer (GT³), to incorporate numerical market information and social text information for accurate stock market prediction. Firstly, we devise a Channel-Wise Tower Encoder (CWTE) to capture the channel-wise features from transposed numerical data embeddings. Secondly, we design a Shifted Window Tower Encoder (SWTE) with Multi-Temporal Aggregation to extract and aggregate the multi-scale temporal features from the original numerical data embeddings. Then we adopt the encoder of vanilla Transformer as a Text Tower Encoder (TTE) to obtain the high-level textual features. Furthermore, we design a Cross-Tower Attention mechanism to assist the model to learn the trend-relevant significance of each daily text representation by leveraging the temporal features from SWTE. Finally, we unify CWTE, SWTE, and TTE as the GT³ model through a self-adaptive gate layer to perform end-to-end text-driven stock market prediction by fusing three types of features effectively and efficiently. Extensive experimental results on a real-world dataset show that the proposed model outperforms state-of-the-art baselines.
 
Face image inpainting results by our approach. It takes mask-wearing faces as input and facial attributes as guidance to obtain the mask removal result
Overview of our framework. The upper one is reconstructive path only used in training. The lower one is generative path used both in training and testing. Face attributes are fed to guide the inpainting results during the whole process
Our masks dataset. Due to each face corresponds to a unique mask as input, the masks in dataset have different shapes and sizes
The qualitative comparisons with existing state of the art methods on CelebA. Zoom in for a better view
The qualitative comparisons in ablation studies. (From left to right) Ground truth, input masked face, results of ours, results of ours without attributes, results of Zswap with attributes, and results of Zswap without attributes. Zoom in for a better view
Article
Due to the outbreak of the COVID-19 pandemic, wearing masks in public areas has become an effective way to slow the spread of disease. However, it also brings some challenges to applications in daily life as half of the face is occluded. Therefore, the idea of removing masks by face inpainting appeared. Face inpainting has achieved promising performance but always fails to guarantee high-fidelity. In this paper, we present a novel mask removal inpainting network based on face attributes known in advance including nose, chubby, makeup, gender, mouth, beard and young, aiming to ensure the repaired face image is closer to ground truth. To achieve this, a dual pipeline network based on GANs has been proposed, one of which is a reconstructive path used in training that utilizes missing regions in ground truth to get prior distribution, while the other is a generative path for predicting information in the masked region. To establish the process of mask removal, we build a synthetic facial occlusion that mimics the real mask. Experiments show that our method not only generates faces more similarly aligned with real attributes, but also ensures semantic and structural rationality compared with state-of-the-art methods.
 
Article
Smart Healthcare (SHC) plays an increasingly greater role in improving the quality of health care, which has been widely concerned by researchers, hospitals and governments. In SHC, it is crucial that a patient’s health data is readily accessible to authorized nurses, doctors, and emergency services. To realize the easy access while protecting the privacy of patients’ data, ciphertext-policy attribute-based encryption (CP-ABE) has been widely used to achieve secure data sharing and support fine-grained access control. However, the existing CP-ABE schemes have three flaws for SHC. First, CP-ABE with partially hidden of access policies may also leak user’s attribute privacy. Second, malicious user may disclose patient’s health records and these records can not be traced. Third, it is less efficient that the data user, who does not have right to access data, downloads the whole ciphertext. In this paper, we design STEAC to address the above problems. To solve the first problem, we introduce the garbled Bloom filter method to realize fully hidden of access policies. For solving the second problem, we use the transaction-based blockchain scheme to trace the ciphertext storage and access. And before the real decryption, a decryption test operation is added to overcome the third flaw. Finally, security analysis and comprehensive performance evaluation also demonstrate STEAC is secure in standard model and is also more efficient than the previous schemes.
 
Article
White blood cells (WBCs) are widely presented in human body which plays an important role in the human body immune system. Recently, the incidence of blood diseases related to WBC increases in the human body. The optimal WBC count given useful information for the diagnosis of the blood disease and it has become a popular field of research applications. Hence, in this paper, Deep Features based Convolutional Neural Network (DFCNN) is developed to identify the count of WBC from the image database. The proposed method is working with three phases such as feature extraction, feature selection and classification phases. In the feature extraction phase, the Combined CNN structure is designed which combination of AlexNet, GoogLeNet and ResNet-50 respectively. The combined CNN architecture is utilized to extract 3000 essential features from the images database. In the feature selection phase, hybrid Mayfly Algorithm with Particle Swarm Optimization (HMA-PSO) is designed for selecting the essential feature from the feature sets. In the HMA-PSO algorithm, the velocity updating of mayfly is achieved with the help of PSO algorithm. The selected features are sent to the proposed classifier which named as Recurrent Neural Network- Long Short-Term Memory (RNN-LSTM). This classifier is utilized to classify the WBC types such as Neutrophils, Eosinophils, Monocytes and Lymphocytes respectively. The proposed method is implemented in MATLAB and performances are evaluated by statistical measurements such as accuracy, precision, recall, specificity and F_Measure. The proposed method is compared with the existing methods such as MA-RNN and PSO-RNN respectively. The proposed methodology has been achieved best performance metrics such as recall: 0.98, precision: 0.9 and accuracy; 0.97.
 
Article
Nowadays, with the advances and use of technological possibilities and devices, the number of digital images is increasing gradually. Computer-aided classification of image types is widely applied in many applications such as medicine, security, and automation. The feature extraction and selection stages have great importance in terms of improving the classification performance as sub-stages of the pattern recognition process. Researchers apply different feature extraction methods for their works due to the requirements. In this study, a novel pattern recognition framework combining diverse and large-scale handcrafted feature extraction methods (shape-based and texture-based) and the selection stage on images is developed. Genetic algorithms are also used for feature selection. In the experimental studies, Flavia leaf recognition, Caltech101 object classification image datasets, and five supervised classification models (random forest, ECOC-SVM, k-nearest neighbor, AdaBoost, classification tree) with different parameters’ values are used. The experimental results show that the proposed method achieves 98.39% and 82.77% accuracy rates on Flavia and Caltech101 datasets with the ECOC-SVM model, respectively. The proposed framework is also competitive with the existing state-of-the-art methods in the related literature.
 
Article
A novel lossy RGB (Red, Green, Blue) colour still image compression algorithm is proposed. The intended method introduces Legendre wavelet-based image transformation technique integrated with vector quantization and run length encoding. High performance is guaranteed by lowering degradation in picture quality with desired compression. Transformation (Specifically) and Quantisation (implicitly) phases focus on reducing number of pixel values from pixel set and contribute in attaining higher compression ratio. Out of these two phases of image compression technique, the phase of transformation should be more effective with a view to implement its functionality because the lossless nature of this phase does not perturb the quality of reconstructed image. Image transformation via Legendre wavelet functions, along with self organizing map based quantization, proposed method for scanning of quantized values and run lenght encoding, tends to produce much sparser matrix when measured against Haar wavelet based compression. Due to the combined effect of curvilinearity nature of their component wavelets, the proposed Legendre wavelet based transformation provides comparatively much more higher PSNR of 225(average) with satisfactory compression of 0.41 bits per pixel(average). In this paper, image transformations are conducted using Haar wavelet, Legendre wavelets and transformation method presented in [7]. Experimental results have been analysed and compared in terms of qualitative and quantitative measure which are PSNR (Peak Signal to Noise Ratio) and bpp (bits per pixel). The performance of proposed algorithm is compared with existing Haar wavelet transformation-based image compression algorithm, compression based on transformation method [7], DCT and adaptive scanning based compression [12] and JPEG [5] compression. Picture quality achieved in the experiments clearly show that the proposed Legendre wavelet -oriented image transformation based image compression technique remarkably outperforms the above mentioned compression techniques.
 
Article
Domain adaptation aims to leverage information from the source domain to improve the classification performance in the target domain. It mainly utilizes two schemes: sample reweighting and feature matching. While the first scheme allocates different weights to individual samples, the second scheme matches the feature of two domains using global structural statistics. The two schemes are complementary with each other, which are expected to jointly work for robust domain adaptation. Several methods combine the two schemes, but the underlying relationship of samples is insufficiently analyzed due to the neglect of the hierarchy of samples and the geometric properties between samples. To better combine the advantages of the two schemes, we propose a Grassmannian graph-attentional landmark selection (GGLS) framework for domain adaptation. GGLS presents a landmark selection scheme using attention-induced neighbors of the graphical structure of samples and performs distribution adaptation and knowledge adaptation over Grassmann manifold. the former treats the landmarks of each sample differently, and the latter avoids feature distortion and achieves better geometric properties. Experimental results on different real-world cross-domain visual recognition tasks demonstrate that GGLS provides better classification accuracies compared with state-of-the-art domain adaptation methods.
 
Article
The present paper focuses on the contrast enhancement of an image using linear regression-based recursive sub-histogram equalization. The histogram of an image is partitioned into two non-overlapping sub-histograms using the mean intensity of the image. A set of points is constructed for each sub-histogram, considering gray level (intensity) as the abscissa and its corresponding count as the ordinate of the point. Then the method of least squares is used for fitting lines of regression for these sets of points in each sub-histogram. With the help of the regression line and histogram, intervals are created in each segmented partition. This process of creating intervals gives more intervals as compared to the Recursive Sub-Image Histogram Equalization (RSIHE) and the Mean and Variance-based Sub Image Histogram Equalization methods (MVSIHE). For qualitative and quantitative analysis of the proposed method, the experiments are performed on a set of test images, including medical and non-medical images. The evaluated results are presented in terms of various evaluation metrics. For medical images, the mean opinion score is also evaluated with the proposed method and other recent methods. The comparison with state-of-the-art methods shows the efficacy of the proposed method for enhancement.
 
Overview of the emotion and personality estimations in this study
An example of the analysis of the ECG data. The horizontal axis shows the number of samples (20 s × 256 Hz sampling frequency), and the vertical axis shows the potential difference (μ V). The blue line corresponds to the heartbeat waveform, and the orange circle corresponds to the peak of the R wave
An example of the analysis of the GSR data. The horizontal axis represents the number of samples (50 s × 128 Hz sampling frequency), and the vertical axis represents the SC (μ Siemens). The gray dashed line corresponds to the SC level, the blue line corresponds to the time series of SC and the orange circle corresponds to the SC responses (SCR)
Comparison of the BER with the accuracy of the binary classification model. (a and d) The classification accuracy of the LR (open circle), IW-LR (closed circle), SVM (open circle) and IW-SVM (closed circle) models trained with ECG features is compared to 1 − BER (red asterisk). (b and e) The classification accuracy of the LR, IW-LR, SVM and IW-SVM models trained with GSR features are compared to 1 − BER, and similarly, (c and f) shows the fusion models (V, valence; A, arousal, Ex, extraversion; Ag, agreeableness; Co, conscientiousness; Ne, neuroticism; Op, openness; Ave, average accuracy of these seven factors)
Article
For modeling human intelligence, understanding emotional intelligence as well as verbal and mathematical intelligence is an important and challenging issue. In affective and personality computing, it has been reported that not only visual and audio signals but also biosignals are useful for estimating emotions and personality. Biosignals are expected to provide additional and less biased information in implicit assessment, but estimation performance can degrade if there are physiological individual differences. In this paper, considering individual physiological differences as a covariate shift, we aimed to improve the performance results in biosignal-based emotion and personality estimations. For this purpose, we constructed importance-weighted logistic regression (IW-LR) and importance-weighted support vector machine (IW-SVM), which mitigate the accuracy degradation due to physiological individual differences in the training data, and compared them with conventional LR and linear SVM (L-SVM) for estimation performance. As a result, most of the IW models outperform conventional models based on electrocardiogram (ECG) and galvanic skin response (GSR) features in emotion estimation. In the personality estimation, the IW method improves the macroaveraged F1-score for all SVM models. The best performing model (GSR model) outperformed the model with the best previously reported macroaveraged F1-score by 1.9% in personality estimation. These results indicate that importance weighting in machine learning models can reduce the effects of individual physiological differences in peripheral physiological responses and contribute to the proposal of a new model for emotion and personality estimations based on biosignals.
 
Top-cited authors
Shuihua Wang
  • University of Leicester
Yu-Dong Zhang
  • University of Leicester
Khan Muhammad
  • Sungkyunkwan University
Amit Kumar Singh
  • National Institute of Technology Patna
Remco C. Veltkamp
  • Utrecht University