Article

Face recognition using Histograms of Oriented Gradients

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Face recognition has been a long standing problem in computer vision. Recently, Histograms of Oriented Gradients (HOGs) have proven to be an effective descriptor for object recognition in general and face recognition in particular. In this paper, we investigate a simple but powerful approach to make robust use of HOG features for face recognition. The three main contributions of this work are: First, in order to compensate for errors in facial feature detection due to occlusions, pose and illumination changes, we propose to extract HOG descriptors from a regular grid. Second, fusion of HOG descriptors at different scales allows to capture important structure for face recognition. Third, we identify the necessity of performing dimensionality reduction to remove noise and make the classification process less prone to overfitting. This is particularly important if HOG features are extracted from overlapping cells. Finally, experimental results on four databases illustrate the benefits of our approach.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Histogram of Oriented Gradients (HOG) descriptors compute gradient orientation histograms across image regions, encapsulating shape and edge information [22]. HOG features excel in object detection tasks, especially in pedestrian detection and recognition [23]. Local binary patterns (LBPs) encode local texture patterns invariant to illumination changes [14]. ...
... In the current study, we extracted and utilized the features that are presented in Table 1. Feature selection was based on relevancy, and we extracted features that are widely used in the literature and have been suggested in previous studies [14,17,[20][21][22][23]. Those features were extracted from all the CRC images that were used as our dataset. ...
Article
Full-text available
The aim of this study was to explore the application of computational models for the analysis of histopathological images in the context of colon cancer. A comprehensive dataset of colon cancer images annotated into eight distinct categories based on their representation of cancerous cell portions was used. The primary objective was to employ various image classification algorithms to assess their efficacy in the context of cancer classification. Additionally, this study investigated the use of feature extraction techniques to derive meaningful data from the images, contributing to a more nuanced understanding of cancerous tissues, comparing the performance of different image classification algorithms in the context of colon cancer image analysis. The findings of this research suggested that XGboost provides the highest accuracy (89.79%) and could contribute to the growing body of knowledge in computational pathology. Other algorithms, such as the random forest, SVM, and CNN, also provided satisfactory results, offering insights into the effectiveness of image classification algorithms in distinguishing between different categories of cancerous cells. This work holds implications for the development of more accurate and efficient tools, underscoring the potential of computational models in enhancing the analysis of histopathological images and improving diagnostic capabilities in cancer research.
... The HOG feature descriptor is developed by Dalal et al. (2005) for human detection using Support Vector Machine (SVM) classifier. The HOG has been successfully applied in many research fields such as word spotting task (Rodrıguez 2008;Terasawa et al., 2009), body parts detection (Corvee et al., 2010), face recognition (Deniz et al., 2011;Shu et al., 2011), character recognition (Newell et al., 2011), text/non-text classification problem (Minetto et al., 2013) and vehicle detection in traffic video (Arrospide et al., 2013). Rodriguez, J. A. et al. (2008) have proposed local gradient histogram features for word spotting in unconstrained handwritten documents. ...
Chapter
Full-text available
In this chapter, the authors present a segmentation-based word spotting method for handwritten documents using bag of visual words (BoVW) framework based on co-occurrence histograms of oriented gradients (Co-HOG) features. The Co-HOG descriptor captures the word image shape information and encodes the local spatial information by counting the co-occurrence of gradient orientation of neighbor pixel pairs. The handwritten document images are segmented into words and each word image is represented by a vector that contains the frequency of visual words appeared in the image. In order to include spatial information to the BoVW framework, the authors adopted spatial pyramid matching (SPM) method. The proposed method is evaluated using precision and recall metrics through experimentation conducted on popular datasets such as GW and IAM. The performance analysis confirmed that the method outperforms existing word spotting techniques.
... Many of the recently developed systems use shallow networks from the machine learning sphere for successful video analytics. Custom-made features such as LTP (Yaseen Deep learning based smart traffic management using video… (Déniz et al. 2011) and Haar (Yaseen et al. 2018a;Zamani et al. 2017;Anjum et al. 2016) made these shallow networks. Aggregation of these small local patches of subsequent video frames generates global attribute for vision and motion-related data. ...
Article
Full-text available
Due to the rapid growth in population and continue rise in number of vehicle on road issue of transportation congestion arises. Combination of Internet-of-Things-Aided Smart Transportation System is doing promising work in this area. A massive amount of video streaming data is produced at high speed by distributed mobile IoT devices and video cameras due to the use of artificial intelligence (AI) and Internet of Things (IoT) combinations in smart city scenarios. Real-time data processing application demands efficient analysis of these data. The key focus in this work is on improving cloud-based traffic video analytics systems by executing a two-step approach: first, Edge-based pre-processing of a video stream to reduce data transmission time and Cloud-based traffic Video Analytics. Second, Video Analytics and Sensor Fusion (VA/SF) are studied and examined to guarantee that the continuum of potentials are sufficiently covered by the data that algorithms are trained on and make it sufficiently efficient to provide high accuracy or low latency modes of services. We suggest a YOLO based deep learning video analytics system on the cloud to perform real-time object detection for traffic surveillance video. The proposed VA/SF model reduces detection speed of the model while improving the object detection accuracy by 1.8% when compared to no-IoT sensor fusion. The experiment proves that higher accuracy with better detection is achieved by our traffic analytical model under extreme weather conditions.
... The Histogram of Oriented Gradients (HOG) is a well-established technique used for image feature extraction. It was introduced in(Dalal & Triggs, 2005) for pedestrian detection but has subsequently been used for other applications such as face recognition(Déniz, Bueno, Salido, & De la Torre, 2011). The fundamental concept is the representation of small image cells by accumulating the 1-D histogram of gradient directions of their pixels, which allows for the characterisation of local object appearance in the image. ...
Article
Industry 4.0 aims for a digital transformation of manufacturing and production systems, producing what is known as smart factories, where information coming from Cyber-Physical Systems (core elements in Industry 4.0) will be used in all the manufacturing stages to improve productivity. Cyber-physical systems through their control and sensor systems, provide a global view of the process, and generate large amounts of data that can be used for instance to produce datadriven models of the processes. However, having data is not enough, we must be able to store, visualize and analyze them, and to integrate induced knowledge in the whole production process. In this work, we present a solution to automate the quality control process of manufactured parts through image analysis. In particular, we present a Deep Learning solution to detect defects in manufactured parts from thermographic images of a die casting machine at an aluminum foundry.
... The function also mapped each class name to an index and used this index as a label for each image. Histogram of Oriented Gradients (HOG) Feature Extraction [7,8]: The function extracted HOG features from an image. HOG features captured edge orientation and are particularly useful for object recognition. ...
Article
This study aims to improve the efficiency of automatic classification and quality control of fruits and vegetables through image recognition technology, to achieve efficient and accurate intelligent sorting in agricultural production, reduce labor costs and improve market competitiveness. A high-quality image dataset of 36 fruit and vegetable categories from Kaggle is used in this study. The images in the dataset have been preprocessed to ensure that the data is suitable for the classification task and sets the stage for efficient training and evaluation of the model. Logistic regression was first used as the baseline model in order to compare the performance with the Support Vector Machine (SVM) model. Subsequently, hyperparameter tuning is performed to optimize the model to achieve the best cross-validation accuracy. Next, the SVM model is trained with the selected hyperparameters, and the training time is recorded. The performance of the model was evaluated in detail by confusion matrix and classification reports, and the test set was used for final validation to ensure that the model would also perform well on unseen data. The SVM model achieves an accuracy of 96% on both validation and test sets, which is a very good performance. The hyperparameters optimized by GridSearchCV (C = 10, gamma = 0.1, kernel = rbf) effectively improve the performance of the model, verifying that reasonable hyperparameter selection is crucial to the SVM model. These results show that the model has good generalization ability and potential to be applied to real-time classification tasks.
... On the other hand, there are techniques commonly used in computer vision that extract the so-called feature vector to describe the structure and texture of an image. Some of these techniques include Histogram of Oriented Gradients (HOG) [28], Scale-Invariant Feature Transform (SIFT) [29], Local Binary Patterns (LBP) [30] and Gray-Level Co-occurrence Matrix (GLCM) [31]. However, when applied to a speckle pattern, which is essentially a texture generated by laser interference, these techniques can result in a representation that is too abstract and challenging to receive a direct visual interpretation on. ...
... For important areas, the weight is larger, and the model will pay more attention to it; for unimportant areas, the weight is smaller, and the model will pay less attention to it. Figure 4 shows the structure of the spatial attention mechanism [21][22][23][24][25]. Assume that the input feature map is , where C is the number of channels, H is the height, and W is the width. ...
Preprint
Full-text available
This study presents the Bayesian-Optimized Attentive Neural Network (BOANN), a novel approach enhancing image classification performance by integrating Bayesian optimization with channel and spatial attention mechanisms. Traditional image classification struggles with the extensive data in today's big data era. Bayesian optimization has been integrated into neural networks in recent years to enhance model generalization, while channel and spatial attention mechanisms improve feature extraction capabilities. This paper introduces a model combining Bayesian optimization with these attention mechanisms to boost image classification performance. Bayesian optimization optimizes hyperparameter selection, accelerating model convergence and accuracy; the attention mechanisms augment feature extraction. Compared to traditional deep learning models, our model utilizes attention mechanisms for initial feature extraction, followed by a Bayesian-optimized neural network. On the CIFAR-100 dataset, our model outperforms classical models in metrics such as accuracy, loss, precision, recall, and F1 score, achieving an accuracy of 77.6%. These technologies have potential for broader application in image classification and other computer vision domains.
... Therefore, in Equation (5), it is a union of gradient histograms h(i) with a limit of n. The combination of several bins is merged into one cell [20], [21]. ...
Article
Facial expression classification system is one of the implementations of machine learning (ML) that takes facial expression datasets, undergoes training, and then utilizes the trained results to recognize facial expressions in new facial images. The recognized facial expressions include anger, contempt, disgust, fear, happy, sadness, and surprise expressions. The method employed for facial feature extraction utilizes histogram-oriented gradient (HOG). This study proposes an enhancement method for HOG feature extraction by reducing the feature dimension into multiple sub-features based on gradient orientation intervals, referred to as HOG channel (HOG-C). Classifier testing techniques are divided into two methods for comparison—support vector machines (SVM) with HOG features and SVM with HOG-C features. The testing results demonstrate that SVM with HOG achieves an accuracy of 99.9% with an average training time of 18.03 minutes, while SVM with HOG-C attains a 100% accuracy with an average training time of 18.09 minutes. The testing outcomes reveal that the implementation of SVM with HOG-C successfully enhances accuracy for facial expression classification.
... Facial feature extraction involves detecting feature points like the center of the eyes and nose. Additionally, these feature points serve as input for local descriptors such as histograms of oriented gradients [7] and convolution filters [28]. These feature descriptors represent the extracted features in numerical form, either as one-dimensional feature vectors or two to three-dimensional feature maps. ...
Article
Full-text available
Facial identity is subject to two primary natural variations: time-dependent (TD) factors such as age, and time-independent (TID) factors including sex and race. This study aims to address a broader problem known as variation-invariant face recognition (VIFR) by exploring the question: “How can identity preservation be maximized in the presence of TD and TID variations?" While existing state-of-the-art (SOTA) methods focus on either age-invariant or race and sex-invariant FR, our approach introduces the first novel deep learning architecture utilizing multi-task learning to tackle VIFR, termed “multi-task learning-based variation-invariant face recognition (MTLVIFR)." We redefine FR by incorporating both TD and TID, decomposing faces into age (TD) and residual features (TID: sex, race, and identity). MTLVIFR outperforms existing methods by 2% in LFW and CALFW benchmarks, 1% in CALFW, and 5% in AgeDB (20 years of protocol) in terms of face verification score. Moreover, it achieves higher face identification scores compared to all SOTA methods. Open source code.
... Kemudian, untuk menemukan sudut dominan θd untuk setiap blok pada gambar, kemudian memilih sudut gradien yang sesuai dengan nilai magnitudo gradien akumulasi maksimum Gd [15]. Berdasarkan kriteria tersebut, langkahlangkah algoritma yang digunakan untuk menemukan arah tepi yang dominan dirangkum dalam Algoritma 1. Cintoh ilustrasi gambar grayscale yang berisi arah tepi yang berbeda untuk setiap BOI (Block Of Interest) [14]. ...
Article
Full-text available
Micro-Expression adalah ekspresi yang muncul dalam waktu singkat, hanya berlangsung sepersekian detik. Hal ini mungkin merupakan akibat dari aktivitas komunikasi antar manusia selama interaksi sosial. Reaksi ekspresi mikro wajah terjadi secara alami dan segera, sehingga hanya menyisakan sedikit ruang untuk manipulasi. Namun, karena Micro-Expression bersifat sementara dan memiliki intensitas rendah, pengenalan dan pengenalannya sulit dan sangat bergantung pada pengalaman para ahli. Karena kekhususan dan kompleksitas intrinsiknya, klasifikasi Micro-Expression menggunakan 2 ekstraksi yaitu CAS dan HOG menarik tetapi menantang, dan baru-baru ini menjadi area penelitian yang aktif. context-aware saliency (CAS) yang bertujuan untuk mendeteksi wilayah gambar yang mewakili pemandangan. Tutujuannya adalah untuk mendeteksi objek dominan. Histogram Oriented Gradient (HOG) Bertujuan sebagai deskriptor yang efektif untuk pengenalan dan deteksi objek. Metode K-Nearest Neighbors (K-NN) digunakan untuk klasifikasi Micro-Expression berdasarkan fitur HOG dari citra saliency. Dataset yang digunakan pada penelitian ini dari data sampel siswa SMK Ma’arif NU Prambon jurusan Multimedia sebanyak 45 siswa dan ditambahkan dataset dari affecnet. Hasil yang didapatkan dari total dataset sebanyak 4116 citra yang dibagi menjadi 6 Micro-Expression yaitu anger, disgust, fear, happy, sad dan surprise, mendapatkan hasil akurasi diatas 80% dari perbandingan dataset sejumlah 4116 terbagi menjadi 2 dengan persentase 70% training dan 30% data testing.
... The presented case study suggests that such sequences can facilitate stress management conversations and encourage self-reflection. However, more diverse sequences and contextualized feedback are needed to improve conversational experiences and confirm empirical effects [10]. The article discusses the use of machine learning algorithms for detecting human emotions from facial expressions captured in videos, EEG signals, or images. ...
Article
Full-text available
The project aim was to develop an app that would enable the recording and monitoring of behaviour related to specific aspects of wellness, as well as support those aspects of wellness that are entertainment-related. Our main goal was to envision and develop an app with the well-being of users in mind. People’s moods can be improved upon or changed by music, with music and mental health tightly intertwined. Music is frequently used to complement or change an individual’s mood. While there are advantages to mood-appropriate music, it may cause us to remain in a depressed, angry, or nervous state. A survey was conducted to examine these aspects. After performing a lot of research and interviews in this area, we found 68% of those surveyed listen to music according to their mood or to change their mood. This inspired us to build an application that not only plays music but also recommends songs to users, eliminating the daily nuisance of selecting the right music, which can waste valuable time. As mental balance is an essential component of healthy existence in today's hectic world, to enhance the practicality of our app, as icing on the cake, we included an AI chatbot that not only converses with the user but also provides them with suitable advice on their concerns.
... Recently, one of the most successful histogram features, the histograms of oriented gradients (HOG) [18]- [20], has attracted increasing attention. The HOG feature originates from the scale-invariant feature transformation (SIFT), and it can be viewed as a dense version of SIFT. ...
Article
Full-text available
In this study, we exploit the modified Jensen-Bregman LogDet (MJBLD) divergence to measure the dissimilarity between two region covariance descriptors extracted from an image, and design a target detection method based on this descriptor. In particular, MJBLD divergence, which considers the non-Euclidean geometric structure, is used as the measurement on the symmetric positive-definite (SPD) matrix manifold. The MJBLD divergence is a modified version of the Jensen-Bregman LogDet (JBLD) divergence which has many properties similar to the affine invariant Riemannian metric. Then, the MJBLD divergence is applied for the task of the image target detection where the image region of interest is represented as a covariance descriptor. The covariance descriptor is a SPD matrix which is constructed by the first and second gradients of intensity and the three-dimensional color information. Since the SPD matrix naturally resides on the non-Euclidean Riemannian manifold and the MJBLD divergence can be treated as a manifold metric, applying the non-Euclidean distance to SPD matrices can yield a better performance in comparison with the Euclidean distance. Experimental results show that our proposed method outperforms the state-of-the-art method.
... In recent years, with the rapid development of artificial intelligence and deep learning technology, this technology has been well applied in complex tasks such as face recognition [4,5], target detection [6,7], and human posture estimation [8,9]. At the same time, the development of the fishery industry has also ushered in a change, in which the application of sensors, machine vision, pattern recognition, and other technologies in the fishery has brought technological solutions for tuna weight estimation. ...
Article
Full-text available
Aiming at the problems of large statistical error and the poor real-time performance of catch weight in the ocean fishing tuna industry, an algorithm based on improved YOLOv8-Pose for albacore tuna (Thunnus alalunga) fork length extraction and weight estimation is proposed, with reference to the human body’s pose estimation algorithm. Firstly, a lightweight module constructed using a heavy parameterization technique is used to replace the backbone network, and secondly, a weighted bidirectional feature pyramid network BIFPN is utilized. Finally, the upper and lower jaw and tail feature points of the albacore tuna (Thunnus alalunga) were extracted using the key point detection algorithm, and the weight of the albacore tuna (Thunnus alalunga) was estimated based on the fitted relationship between fork length and weight. The experimental results show that the improved YOLOv8-Pose algorithm reduces the number of model parameters by 13.63% and the number of floating-point operations by 14.03% compared with the baseline model without decreasing the accuracy of the target detection and key point detection and improves the model inference speed by 374%. At the same time, it reduces the drift of the key point detection, and the error of the comparison with the actual albacore tuna (Thunnus alalunga) body weight is not more than 10%. The improved key point detection algorithm has high detection accuracy and inference speed, which provides accurate yield data for pelagic fishing and is expected to solve the existing statistical problems and improve the accuracy and real-time performance of data in the fishing industry.
... The HOG method is a highly prevalent technique for feature extraction. Its favorable computational efficiency and robustness properties have made it useful in diverse domains, including medical applications [72], facial recognition [73], and fault detection in wind turbines [74]. The implementation steps of the HOG feature extraction algorithm are as follows: ...
Article
Full-text available
This paper presents a unique hybrid classifier that combines deep neural networks with a type-III fuzzy system for decision-making. The ensemble incorporates ResNet-18, Efficient Capsule neural network, ResNet-50, the Histogram of Oriented Gradients (HOG) for feature extraction, neighborhood component analysis (NCA) for feature selection, and Support Vector Machine (SVM) for classification. The innovative inputs fed into the type-III fuzzy system come from the outputs of the mentioned neural networks. The system’s rule parameters are fine-tuned using the Improved Chaos Game Optimization algorithm (ICGO). The conventional CGO’s simple random mutation is substituted with wavelet mutation to enhance the CGO algorithm while preserving non-parametricity and computational complexity. The ICGO was evaluated using 126 benchmark functions and 5 engineering problems, comparing its performance with well-known algorithms. It achieved the best results across all functions except for 2 benchmark functions. The introduced classifier is applied to seven malware datasets and consistently outperforms notable networks like AlexNet, ResNet-18, GoogleNet, and Efficient Capsule neural network in 35 separate runs, achieving over 96% accuracy. Additionally, the classifier’s performance is tested on the MNIST and Fashion-MNIST in 10 separate runs. The results show that the new classifier excels in accuracy, precision, sensitivity, specificity, and F1-score compared to other recent classifiers. Based on the statistical analysis, it has been concluded that the ICGO and propose method exhibit significant superiority compared to the examined algorithms and methods. The source code for ICGO is available publicly at https://nimakhodadadi.com/algorithms-%2B-codes. Graphical abstract
... Compared with other image detection algorithms, HOG quantifies the gradient direction of the image in a cellular manner to characterize the structural features of object edges [22]. As the algorithm quantizes position and orientation space, it reduces the influence of object rotation and translation. ...
Article
Full-text available
Driving fatigue is a physiological phenomenon that often occurs during driving. After the driver enters a fatigued state, the attention is lax, the response is slow, and the ability to deal with emergencies is significantly reduced, which can easily cause traffic accidents. Therefore, studying driver fatigue detection methods is significant in ensuring safe driving. However, the fatigue state of actual drivers is easily interfered with by the external environment (glasses and light), which leads to many problems, such as weak reliability of fatigue driving detection. Moreover, fatigue is a slow process, first manifested in physiological signals and then reflected in human face images. To improve the accuracy and stability of fatigue detection, this paper proposed a driver fatigue detection method based on image information and physiological information, designed a fatigue driving detection device, built a simulation driving experiment platform, and collected facial as well as physiological information of drivers during driving. Finally, the effectiveness of the fatigue detection method was evaluated. Eye movement feature parameters and physiological signal features of drivers’ fatigue levels were extracted. The driver fatigue detection model was trained to classify fatigue and non-fatigue states based on the extracted features. Accuracy rates of the image, electroencephalogram (EEG), and blood oxygen signals were 86%, 82%, and 71%, separately. Information fusion theory was presented to facilitate the fatigue detection effect; the fatigue features were fused using multiple kernel learning and typical correlation analysis methods to increase the detection accuracy to 94%. It can be seen that the fatigue driving detection method based on multi-source feature fusion effectively detected driver fatigue state, and the accuracy rate was higher than that of a single information source. In summary, fatigue driving monitoring has broad development prospects and can be used in traffic accident prevention and wearable driver fatigue recognition.
... Remote sensing image detection technology plays a crucial role in both military and civilian domains, including applications such as port and airport flow detection, traffic management, and maritime rescue. Limited by data and hardware conditions, traditional remote sensing image detection methods usually focus on manual feature extraction and description, such as the histograms of oriented gradients [1], visual saliency detection [2] and scale-invariant feature transform [3] and other methods. However, the traditional methods have some shortcomings, such as a lack of learnable ability, weak transfer ability and cumbersome design. ...
Article
Full-text available
Due to the limited semantic information extraction with small objects and difficulty in distinguishing similar targets, it brings great challenges to target detection in remote sensing scenarios, which results in poor detection performance. This paper proposes an improved YOLOv5 remote sensing image target detection algorithm, SEB-YOLO (SPD-Conv + ECSPP + Bi-FPN + YOLOv5). Firstly, the space-to-depth (SPD) layer followed by a non-strided convolution (Conv) layer module (SPD-Conv) was used to reconstruct the backbone network, which retained the global features and reduced the feature loss. Meanwhile, the pooling module with the attention mechanism of the final layer of the backbone network was designed to help the network better identify and locate the target. Furthermore, a bidirectional feature pyramid network (Bi-FPN) with bilinear interpolation upsampling was added to improve bidirectional cross-scale connection and weighted feature fusion. Finally, the decoupled head is introduced to enhance the model convergence and solve the contradiction between the classification task and the regression task. Experimental results on NWPU VHR-10 and RSOD datasets show that the mAP of the proposed algorithm reaches 93.5% and 93.9%respectively, which is 4.0% and 5.3% higher than that of the original YOLOv5l algorithm. The proposed algorithm achieves better detection results for complex remote sensing images.
... The histogram channels are evenly spread over 0-180° or 0-360°, depending on whether the gradient is 'unsigned' or 'signed'. The histogram counts are normalized to compensate for illumination [5]. To measure the gradients magnitude, it can be done using this formula. ...
Article
Computer science and technology development in recent years has experienced great developments. This time, some types of technology digitise almost everything related to human life, including facial recognition. In recent years, various methods for recognising human faces have developed. One of them is using the Histogram of Oriented Gradients (HOG). On this occasion, an image processing system will be designed to recognise human faces using Histograms of Oriented Gradients (HOG) and machine learning such as Convolutional Neural Networks (CNN) and Support Vector Machines (SVM). Detects the winking of the face, using computer-recognisable points in the eye area from 68 facial landmarks, so from these results, the distance between the upper and lower eyelids can be measured. If the distance (in pixels) is small enough, it can be interpreted as a wink. In addition, it is also limited by the distance of faces that can be detected to blink. In the end, if a recognised face blinks are detected, the time and date will be recorded. It will then open a solenoid lock using serial communication via Arduino Uno to become a security system. From 100 facial photos and 207 blink tests, 89.86% found that the computer could detect a "True Positive" wink. Besides, this facial recognition system's recommended tolerance parameter value is between 0.42 and 0.48.
... Recently, one of the most successful histogram feature, the histograms of oriented gradients (HOG) [18], has attracted more attention to investigated. The HOG feature is originated from the scale invariance feature transformation (SIFT), and it can be viewed as a dense version of SIFT. ...
Preprint
Full-text available
In this paper, we exploit the modified Jensen-Bregman LogDet (MJBLD) divergence to measure the dissimilarity between two region covariance descriptors extracted from an image, and design a target detection method based on this descriptor. In particular, the MJBLD divergence that takes into account the non-Euclidean geometric structure is used as the measurement on the symmetric positive-definite (SPD) matrix manifold. The MJBLD divergence is a modified version of the Jensen-Bregman LogDet (JBLD) divergence which has many properties similar to the affine invariant Riemannian metric. Then, the MJBLD divergence is applied for the task of the image target detection where the image region of interest is represented as a covariance descriptor. The covariance descriptor is a SPD matrix which is constructed by the first and second gradients of intensity and the three-dimensional color information. Since the SPD matrix naturally resides on the non-Euclidean Riemannian manifold and the MJBLD divergence can be treated as a manifold metric, applying the non-Euclidean distance to SPD matrices can yield a better performance in comparison with the Euclidean distance. Experimental results show that our proposed method outperforms the state-of-the-art method.
... The specific steps for generating these features are detailed in Algorithm 3. Histogram of Oriented Gradients (HOG) is a type of feature descriptor, similar to SIFT (scale-invariant feature transform) and SURF (speeded-up robust feature), widely used in the realm of computer vision for extracting features [54][55][56][57][58]. HOG is particularly adept at capturing an object's form or structural characteristics. ...
Article
Full-text available
The advancement of machine learning in industrial applications has necessitated the development of tailored solutions to address specific challenges, particularly in multi-class classification tasks. This study delves into the customization of loss functions within the eXtreme Gradient Boosting (XGBoost) algorithm, which is a critical step in enhancing the algorithm’s performance for specific applications. Our research is motivated by the need for precision and efficiency in the industrial domain, where the implications of misclassification can be substantial. We focus on the drill-wear analysis of melamine-faced chipboard, a common material in furniture production, to demonstrate the impact of custom loss functions. The paper explores several variants of Weighted Softmax Loss Functions, including Edge Penalty and Adaptive Weighted Softmax Loss, to address the challenges of class imbalance and the heightened importance of accurately classifying edge classes. Our findings reveal that these custom loss functions significantly reduce critical errors in classification without compromising the overall accuracy of the model. This research not only contributes to the field of industrial machine learning by providing a nuanced approach to loss function customization but also underscores the importance of context-specific adaptations in machine learning algorithms. The results showcase the potential of tailored loss functions in balancing precision and efficiency, ensuring reliable and effective machine learning solutions in industrial settings.
... Global features include contour representations, shape descriptors, and texture features, and local features represent the texture in an image patch. Shape Matrices, Invariant Moments (Hu, Zernike) [3], Histogram Oriented Gradients (HOG) [4], and Co-HOG [5] are some examples of global descriptors. SIFT [6], SURF [7], LBP [8], BRISK [9], MSER [10], and FREAK [10] are some examples of local descriptors. ...
Article
Full-text available
Retrieving similar images from a dataset of images is always challenging for researchers, and it becomes more challenging under critical conditions like illumination variation and different facial expressions. Every image comprises three types of content, color, shape, and texture. Texture content of the image plays a vital role in the image retrieval process. Many researchers have been working on the image retrieval problem and have proposed many local descriptors for the last two decades. This study gives a comparative analysis of existing local descriptors using different facial image datasets. In recent trends, researchers retrieve similar images using deep learning techniques. This paper also discussed some of the existing Deep learning methods for retrieving similar images.
... Global features include contour representations, shape descriptors, and texture features, and local features represent the texture in an image patch. Shape Matrices, Invariant Moments (Hu, Zernike) [3], Histogram Oriented Gradients (HOG) [4], and Co-HOG [5] are some examples of global descriptors. SIFT [6], SURF [7], LBP [8], BRISK [9], MSER [10], and FREAK [10] are some examples of local descriptors. ...
Article
Retrieving similar images from a dataset of images is always challenging for researchers, and it becomes more challenging under critical conditions like illumination variation and different facial expressions. Every image comprises three types of content, color, shape, and texture. Texture content of the image plays a vital role in the image retrieval process. Many researchers have been working on the image retrieval problem and have proposed many local descriptors for the last two decades. This study gives a comparative analysis of existing local descriptors using different facial image datasets. In recent trends, researchers retrieve similar images using deep learning techniques. This paper also discussed some of the existing Deep learning methods for retrieving similar images.
... On the other hand, Zhou et al. [27] showed that the "histogram of oriented gradients (HOG) technique helps to extract features of objects following a change in intensity." Kok and Rajendran [20] also proposed the application of HOG descriptors for different computer vision tasks [28]. HOG feature is typically "extracted by counting the occurrences of gradient orientation based on the gradient angle and the gradient magnitude of local patches of an image" [29]. ...
Article
Full-text available
p> Maize is Ethiopia’s dominant cereal crop regarding area coverage and production level. There are different varieties of maize in Ethiopia. Maize varieties are classified based on morphological features such as shape and size. Due to the nature of maize seed and its rotation variant, studies are still needed to identify Ethiopian maize seed varieties. With expert eyes, identification of maize seed varieties is difficult due to their similar morphological features and visual similarities. We proposed a hybrid feature-based maize variety identification model to solve this problem. For training and testing the model, images of each maize variety were collected from the adet agriculture and research center (AARC), Ethiopia. A multi-class support vector machine (MCSVM) classifier was employed on a hybrid of handcrafted (i.e., gabor and histogram of oriented gradients) and convolutional neural network (CNN)-based feature selection techniques and achieved an overall classification accuracy of 99%. </p
... HOG generates a feature vector that encapsulates the visual characteristics of objects by analyzing the local gradients within an image. The key steps for computing HOG features can be outlined as follows: gradient computation, orientation binning, feature description, and L2 normalization [27]. ...
Article
Full-text available
Aim: Breast cancer stands as a prominent cause of female mortality on a global scale, underscoring the critical need for precise and efficient diagnostic techniques. This research significantly enriches the body of knowledge pertaining to breast cancer classification, especially when employing breast ultrasound images, by introducing a novel method rooted in the two dimensional empirical mode decomposition (biEMD) method. In this study, an evaluation of the classification performance is proposed based on various texture features of breast ultrasound images and their corresponding biEMD subbands. Methods: A total of 437 benign and 210 malignant breast ultrasound images were analyzed, preprocessed, and decomposed into three biEMD sub-bands. A variety of features, including the Gray Level Co-occurrence Matrix (GLCM), Local Binary Patterns (LBP), and Histogram of Oriented Gradient (HOG), were extracted, and a feature selection process was performed using the least absolute shrinkage and selection operator method. The study employed GLCM, LBP and HOG, and machine learning techniques, including artificial neural networks (ANN), k-nearest neighbors (kNN), the ensemble method, and statistical discriminant analysis, to classify benign and malignant cases. The classification performance, measured through Area Under the Curve (AUC), accuracy, and F1 score, was evaluated using a 10-fold cross-validation approach. Results: The study showed that using the ANN method and hybrid features (GLCM+LBP+HOG) from BUS images' biEMD sub-bands led to excellent performance, with an AUC of 0.9945, an accuracy of 0.9644, and an F1 score of 0.9668. This has revealed the effectiveness of the biEMD method for classifying breast tumor types from ultrasound images. Conclusion: The obtained results have revealed the effectiveness of the biEMD method for classifying breast tumor types from ultrasound images, demonstrating high-performance classification using the proposed approach.
... In order to develop a robust face detection system, Single-Shot-Multibox detector with ResNet-10 is used as a backbone architecture [1]. In the literature, face detection is generally focused on finding 68 points using the distinctive textural features of the face [2][3] Facial landmarks are found in order to localize eyes, nose, contour of the face and mouth. Landmark points are exploited for determining whether the eyes and mouth are open or closed. ...
Article
Full-text available
Portrait photo is one of the most crucial documents that many people need for official transactions in many public and private organizations. Despite the developing technologies and high resolution imaging devices, people need such photographer offices to fulfil their needs to take photos. In this study, a Photo Capturing System has been developed to provide infrastructure for web and mobile applications. After the system detects the person's face, facial orientation and facial expression, it automatically takes a photo and sends it to a graphical user interface developed for this purpose. Then, with the help of the user interface of the photo taken by the system, it is automatically printed out. The proposed study is a unique study that uses imaging technologies, deep learning and vision transformer algorithms, which are very popular image processing techniques in several years. Within the scope of the study, face detection and facial expression recognition are performed with a success rate of close to 100\% and 95.52\%, respectively. In the study, the performances of Vision Transformer algorithm is also compared with the state of art algorithms in facial expression recognition.
... We used Inovtaxon (Bambil et al., 2020) to extract 226 picture features (namely, 36 colour features, 135 shape features, and 55 texture features) that were used as input to feed the algorithms (see Bambil et al., 2020 and htt ps://github.com/DeborahBambil/Inovtaxon for details; see also Hu, 1962;Zhao and Pietikainen, 2007;Flusser et al., 2009;Deniz et al., 2011;Nascimento et al., 2023). ...
Article
Full-text available
Our Research "Advanced Face Detection Using Machine Learning &AI Based Algorithm" is a facial acknowledgment framework utilizing AI, explicitly support vector machines (SVM). The initial step required is face recognition which we achieve utilizing a broadly utilized strategy called the Viola-Jones calculation. The Viola-Jones calculation is exceptionally attractive because of its high location rate and quick handling time. When the face is identified, include extraction on the face is performed utilizing histogram of arranged slopes (HOG) which basically stores the edges of the face as well as the directionality of those edges. Hoard is a viable type of element extraction due its superior execution in normalizing neighbourhood contrast. Finally, preparing and grouping of the facial information bases is finished utilizing the multi-class SVM where every exceptional face in the facial data set is a class. We endeavor to utilize this facial acknowledgment framework on two arrangements of information bases, the AT&T face data set and the YALE B face data set and will examine the outcomes.
Conference Paper
Full-text available
Free-space optics (FSO) is a data relaying technology, which requires a direct line of sight between the transmitter and the receiver units for reliable transmission. Wavelength Division Multiplexing (WDM) is a technology that multiplexes numerous carrier signals onto single fiber using nonidentical wavelengths and enables the efficiency of bandwidth and expanded data rate. Multiple Input Multiple Output (MIMO) is implemented to improve the quality and performance of free space optical communication in various atmospheric conditions. In this paper, a WDM-based FSO communication system is being implemented under MIMO concept i.e., 1 × 1, 2 × 2, 4 × 4 Various factors like BER and Quality Factor are analyzed for the WDM-based FSO communication with MIMO using the OptiSystem v 7.0 for under different atmospheric conditions. It is also evident in this paper that the transmit power of 15 𝑑𝐵𝑚 and 4 × 4 FSO showed the best performance and highest correlation distance compared to the transmit power of 10𝑑𝐵𝑚 and other beams. Moreover, WDM-FSO MIMO systems with 20 cm receiving aperture better than 10 cm. In addition, systems with a data rate of 2.5 𝐺𝑏𝑝𝑠 provide better performance than 5 𝐺𝑏𝑝𝑠.
Conference Paper
Full-text available
Face Recognition is a technology that enables people to identify faces using an image or a video clip. Although it has been around for a long time, the task of developing a robust and reliable system is still a bit challenging, the human face can have only a limited number of training images. In this paper, a new set of training samples is generated from the original samples to increase number of used samples by using mirror and the symmetry property of the face. The Histograms of Oriented Gradients (HOG) and The Local Binary Pattern (LBP) methods are used to extract the features from the face images, in general, increasing the number of training images upturns the performance of face recognition systems and the recognition performance is improved. The proposed methods is tested and evaluated using OLR dataset which is widely used for testing and comparing the accuracy of face recognition systems. The experimental results show that the proposed method has a recognition accuracy rates higher than the traditional methods.
Article
Full-text available
Sandification can degrade the strength and quality of dolomite, and to a certain extent, compromise the stability of a tunnel's surrounding rock as an unfavorable geological boundary. Sandification degree classification of sandy dolomite is one of the non-trivial challenges faced by geotechnical engineering projects such as tunneling in complex geographical environments. The traditional methods quantitatively measuring the physical parameters or analyzing some visual features are either time-consuming or inaccurate in practical use. To address these issues, we, for the first time, introduce the convolutional neural network (CNN)-based image classification methods into dolomite sandification degree classification task. In this study, we have made a significant contribution by establishing a large-scale dataset comprising 5729 images, classified into four distinct sandification degrees of sandy dolomite. These images were collected from the vicinity of a tunnel located in the Yuxi section of the CYWD Project in China. We conducted comprehensive classification experiments using this dataset. The results of these experiments demonstrate the groundbreaking achievement of CNN-based models, which achieved an impressive accuracy rate of up to 91.4%. This accomplishment underscores the pioneering role of our work in creating this dataset and its potential for applications in complex geographical analyses.
Article
Full-text available
Dragon fruit stem disease significantly affects both the quality and yield of dragon fruit. Therefore, there is an urgent need for an efficient, high-precision intelligent detection method to address the challenge of disease detection. To address the limitations of traditional methods, including slow detection and weak micro-integration capability, this paper proposes an improved YOLOv8-G algorithm. The algorithm reduces computational redundancy by introducing the C2f-Faster module. The loss function was modified to the structured intersection over union (SIoU), and the coordinate attention (CA) and content-aware reorganization feature extraction (CARAFE) modules were incorporated. These enhancements increased the model’s stability and improved its accuracy in recognizing small targets. Experimental results showed that the YOLOv8-G algorithm achieved a mean average precision (mAP) of 83.1% and mAP50:95 of 48.3%, representing improvements of 3.3% and 2.3%, respectively, compared to the original model. The model size and floating point operations per second (FLOPS) were reduced to 4.9 MB and 6.9 G, respectively, indicating reductions of 20% and 14.8%. The improved model achieves higher accuracy in disease detection while maintaining a lighter weight, serving as a valuable reference for researchers in the field of dragon fruit stem disease detection.
Article
Facial recognition technology is transformative in security and human-machine interaction, reshaping societal interactions. Robust descriptors, essential for high precision in machine learning tasks like recognition and recall, are integral to this transformation. This paper presents a hybrid model enhancing local binary pattern descriptors for facial representation. By integrating rotation-invariant local binary pattern with uniform rotation-invariant grey-level co-occurrence, employing linear discriminant analysis for feature space optimization, and utilizing an artificial neural network for classification, the model achieves exceptional accuracy rates of 100% for Olivetti Research Laboratory, 99.98% for Maastricht University Computer Vision Test, and 99.17% for Extended Yale B, surpassing traditional methods significantly.
Article
In this paper, we present an optimal edge-weighted graph semantic correlation (EWGSC) framework for multi-view feature representation learning. Different from most existing multi-view representation methods, local structural information and global correlation in multi-view feature spaces are exploited jointly in the EWGSC framework, leading to a new and high quality multi-view feature representation. Specifically, a novel edge-weighted graph model is first conceptualized and developed to preserve local structural information in each of the multi-view feature spaces. Then, the explored structural information is integrated with a semantic correlation algorithm, labeled multiple canonical correlation analysis (LMCCA), to form a powerful platform for effectively exploiting local and global relations across multi-view feature spaces jointly. We then theoretically verified the relation between the upper limit on the number of projected dimensions and the optimal solution to the multi-view feature representation problem. To validate the effectiveness and generality of the proposed framework, we conducted experiments on five data sets of different scales, including visual-based (University of California Irvine (UCI) iris database, Olivetti Research Lab (ORL) face database and Caltech 256 database), text-image-based (Wiki database) and video-based (Ryerson Multimedia Lab (RML) audio-visual emotion database) examples. The experimental results show the superiority of the proposed framework on multi-view feature representation over state-of-the-art algorithms.
Research
Full-text available
Face recognition system, as any recognition process, depends highly on features extracted from face images. The selected features play a great role in deciding the recognition rate result. In this paper, a two-phase feature extraction and selection process is used for face recognition system. The process depends on histogram of Oriented Gradients (HOG) feature extraction and window size use to determine similarity between classes. Low number of features are used (big window size) to divide classes into small closed-similarity groups as first recognition phase. Then, the best matched class is found using larger number of features where differences between classes are bigger. The proposed method was applied to Essex face dataset using support vector machine (SVM) and Naïve Bayesian (NB) methods for comparison. The proposed method achieved 5% and 10% better recognition rate compared to SVM and NB respectively.
Article
In India, countless children are reported missing every year. Among the missing child cases, a large percentage of children remain untraced. The enhancement of missing child identification is the adoption of the child from the remaining children database after a period Child welfare Centre is responsible for checking the parent's authenticity. In Adoption customers, after logging in to the parents, registered parents have to fill in the same child details that were filled in the registration form of the parents. The Convolutional Neural Network (CNN), a highly effective deep learning technique for image based applications is adopted here for face recognition. Face descriptors are extracted from the images using a pre-trained CNN model VGG-Face deep architecture. The images using a pre-trained CNN model VGG-Face deep architecture.
Article
High resolution range profile (HRRP) provides abundant target information but is susceptible to external electromagnetic interference. While infrared sensor possesses strong anti-jamming capability, it has limited detection range and is vulnerable to weather conditions, leading to reduced imaging resolution. The integration of radar and infrared sensors can synergize their respective strengths to not only improve the reliability and robustness of the system but also enhance the credibility and accuracy of the data. However, there exist many challenges in the research on the fusion of heterogeneous data like HRRP 1D data and infrared 2D data. In this letter, a radar infrared sensor fusion method based on hierarchical features mining (HFM) is proposed to solve the problems above. The method is applied to multi-target recognition tasks to verify the effectiveness. The results demonstrate that the proposed method can enhance the information completeness of the target and improve the accuracy of target recognition.
Article
Due to the absorption and scattering of light by water, underwater imaging suffers from severe degradation in quality, which greatly hinders underwater exploration and research. Therefore, achieving the application of quality assessment in underwater visual tasks is very important. To effectively evaluate the quality of underwater images, a novel no-reference underwater image quality assessment (UIQA) method based on multiscale and antagonistic energy distribution, called MSAEQA, is proposed. Specifically, considering that different wavelengths of light attenuate externally in water commonly leading to the color cast of underwater images, a maximum chrominance map is constructed on the Laplacian pyramid to represent the color cast. Then, the statistical distribution of the maximum chrominance map of the underwater image is proposed as a multiscale statistical feature. Furthermore, considering that structure and texture are important attributes of underwater images, multi-scale Laplacian weighted local binary patterns and multiscale histogram of oriented gradient features are extracted as quality perceptual features to capture multiscale underwater structure and texture statistical features. Finally, the Weibull distribution is constructed to simulate the energy distribution of singular value decomposition in the CIELab color space representing the underwater antagonistic energy distribution statistics. Experimental results show that the proposed MSAEQA method exhibits the highest correlation with ground truth scores compared to state-of-the-art UIQA methods.
Article
Full-text available
We develop a face recognition algorithm which is insensitive to large variation in lighting direction and facial expression. Taking a pattern classification approach, we consider each pixel in an image as a coordinate in a high-dimensional space. We take advantage of the observation that the images of a particular face, under varying illumination but fixed pose, lie in a 3D linear subspace of the high dimensional image space-if the face is a Lambertian surface without shadowing. However, since faces are not truly Lambertian surfaces and do indeed produce self-shadowing, images will deviate from this linear subspace. Rather than explicitly modeling this deviation, we linearly project the image into a subspace in a manner which discounts those regions of the face with large deviation. Our projection method is based on Fisher's linear discriminant and produces well separated classes in a low-dimensional subspace, even under severe variation in lighting and facial expressions. The eigenface technique, another method based on linearly projecting the image space to a low dimensional subspace, has similar computational requirements. Yet, extensive experimental results demonstrate that the proposed “Fisherface” method has error rates that are lower than those of the eigenface technique for tests on the Harvard and Yale face databases
Article
Full-text available
This paper examines the classification capability of different Gabor representations for human face recognition. Usually, Gabor filter responses for eight orientations and five scales for each orientation are calculated and all 40 basic feature vectors are concatenated to assemble the Gabor feature vector. This work explores 70 different Gabor feature vector extraction techniques for face recognition. The main goal is to determine the characteristics of the 40 basic Gabor feature vectors and to devise a faster Gabor feature extraction method. Among all the 40 basic Gabor feature representations the filter responses acquired from the largest scale at smallest relative orientation change (with respect to face) shows the highest discriminating ability for face recognition while classification is performed using three classification methods: probabilistic neural networks (PNN), support vector machines (SVM) and decision trees (DT). A 40 times faster summation based Gabor representation shows about 98% recognition rate while classification is performed using SVM. In this representation all 40 basic Gabor feature vectors are summed to form the summation based Gabor feature vector. In the experiment, a sixth order data tensor containing the basic Gabor feature vectors is constructed, for all the operations.
Article
Full-text available
The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and non-separable data, working through a non-trivial example in detail. We describe a mechanical analogy, and discuss when SVM solutions are unique and when they are global. We describe how support vector training can be practically implemented, and discuss in detail the kernel mapping technique which is used to construct SVM solutions which are nonlinear in the data. We show how Support Vector machines can have very large (even infinite) VC dimension by computing the VC dimension for homogeneous polynomial and Gaussian radial basis function kernels. While very high VC dimension would normally bode ill for generalization performance, and while at present there exists no theory which shows that good generalization performance is guaranteed for SVMs, there are several arguments which support the observed high accuracy of SVMs, which we review. Results of some experiments which were inspired by these arguments are also presented. We give numerous examples and proofs of most of the key theorems. There is new material, and I hope that the reader will find that even old material is cast in a fresh light.
Article
Full-text available
The CSU Face Identification Evaluation System includes standardized image preprocessing software, four distinct face recognition algorithms, analysis tools to study algorithm performance, and Unix shell scripts to run standard experiments. All code is written in ANSII C. The four algorithms provided are principle components analysis (PCA), a.k.a eigenfaces, a combined principle components analysis and linear discriminant analysis algorithm (PCA + LDA), an intrapersonal/extrapersonal image difference classifier (IIDC), and an elastic bunch graph matching (EBGM) algorithm. The PCA + LDA, IIDC, and EBGM algorithms are based upon algorithms used in the FERET study contributed by the University of Maryland, MIT, and USC, respectively. One analysis tool generates cumulative match curves; the other generates a sample probability distribution for recognition rate at recognition rank 1, 2, etc., using Monte Carlo sampling to generate probe and gallery choices. The sample probability distributions at each rank allow standard error bars to be added to cumulative match curves. The tool also generates sample probability distributions for the paired difference of recognition rates for two algorithms. Whether one algorithm consistently outperforms another is easily tested using this distribution. The CSU Face Identification Evaluation System is available through our Web site and we hope it will be used by others to rigorously compare novel face identification algorithms to standard algorithms using a common implementation and known comparison techniques.
Conference Paper
Full-text available
In this paper we investigate improvements to the efficiency of human body detection using histograms of oriented gradients (HOG). We do this without compromising the performance significantly. This is especially relevant for embedded implementations in smart camera systems, where the on-board processing power and memory is limited. We focus on applications for indoor environments such as offices and living rooms. We present different experiments to reduce both the computational complexity as well as the memory requirements for the trained model. Since the HOG feature length is large, the total memory size needed for storing the model can become more than 50 MB. We use a feature selection based on Bayesian theory to reduce the feature length. Additionally we compare the performance of the full-body detector with an upper-body only detector. For computational complexity reduction we employ a ROI-based approach.
Conference Paper
Full-text available
The CSU Face Identification Evaluation System provides standard face recognition algorithms and standard statistical methods for comparing face recognition algorithms. The system includes standardized image pre-processing software, three distinct face recognition algorithms, analysis software to study algorithm performance, and Unix shell scripts to run standard experiments. All code is written in ANSI C. The preprocessing code replicates feature of pre- processing used in the FERET evaluations. The three algorithms provided are Principle Components Analysis (PCA), a.k.a Eigenfaces, a combined Principle Components Analysis and Linear Discriminant Analysis algorithm (PCA+LDA), and a Bayesian Intrapersonal/Extrapersonal Classifier (BIC). The PCA+LDA and BIC algorithms are based upon algorithms used in the FERET study contributed by the University of Maryland and MIT respectively. There are two analysis. The first takes as input a set of probe images, a set of gallery images, and similarity matrix produced by one of the three algorithms. It generates a Cumulative Match Curve of recognition rate versus recognition rank. The second analysis tool gen- erates a sample probability distribution for recognition rate at recognition rank 1, 2, etc. It takes as input multiple images per subject, and uses Monte Carlo sampling in the space of possible probe and gallery choices. This procedure will, among other things, add standard error bars to a Cumulative Match Curve. The System is available through our website and we hope it will be used by others to rigorously compare novel face identification algorithms to standard algorithms using a common implementation and known comparison techniques.
Conference Paper
Full-text available
Appearance models (AM) are commonly used to model appearance and shape variation of objects in images. In particular, they have proven useful to detection, tracking, and synthesis of people's faces from video. While AM have numerous advantages relative to alternative approaches, they have at least two important drawbacks. First, they are especially prone to local minima in fitting; this problem becomes increasingly problematic as the number of parameters to estimate grows. Second, often few if any of the local minima correspond to the correct location of the model error. To address these problems, we propose filtered component analysis (FCA), an extension of traditional principal component analysis (PCA). FCA learns an optimal set of filters with which to build a multi-band representation of the object. FCA representations were found to be more robust than either grayscale or Gabor filters to problems of local minima. The effectiveness and robustness of the proposed algorithm is demonstrated in both synthetic and real data.
Conference Paper
Full-text available
In this work, we present a novel approach to face recognition which considers both shape and texture information to represent face images. The face area is first divided into small regions from which Local Binary Pattern (LBP) histograms are extracted and concatenated into a single, spatially enhanced feature histogram efficiently representing the face image. The recognition is performed using a nearest neighbour classifier in the computed feature space with Chi square as a dissimilarity measure. Extensive experiments clearly show the superiority of the proposed scheme over all considered methods (PCA, Bayesian Intra/extrapersonal Classifier and Elastic Bunch Graph Matching) on FERET tests which include testing the robustness of the method against different facial expressions, lighting and aging of the subjects. In addition to its efficiency, the simplicity of the proposed method allows for very fast feature extraction.
Conference Paper
Full-text available
Histograms of Oriented Gradients (HOG) is one of the well-known features for object recognition. HOG features are calculated by taking orientation histograms of edge intensity in a local region. N.Dalal et al. proposed an object detection algorithm in which HOG features were extracted from all locations of a dense grid on a image region and the combined features are classified by using linear Support Vector Machine (SVM). In this paper, we employ HOG features extracted from all locations of a grid on the image as candidates of the feature vectors. Principal Component Analysis (PCA) is applied to these HOG feature vectors to obtain the score (PCA-HOG) vectors. Then a proper subset of PCA-HOG feature vectors is selected by using Stepwise Forward Selection (SFS) algorithm or Stepwise Backward Selection (SBS) algorithm to improve the generalization performance. The selected PCA-HOG feature vectors are used as an input of linear SVM to classify the given input into pedestrian/non-pedestrian. The improvement of the recognition rates are confirmed through experiments using MIT pedestrian dataset.
Conference Paper
Full-text available
We introduce an augmented histograms of oriented gradients (AHOG) feature for human detection from a nonstatic camera. We increase the discriminating power of original histograms of oriented gradients (HOG) feature by adding human shape properties, such as contour distances, symmetry, and gradient density. Based on the biological structure of human shape, we impose the symmetry property on HOG features by computing the similarity between itself and itspsila symmetric pair to weight HOG features. After that, the capability of describing human features is much better than the original one, especially when the humans are moving across. We also augment the gradient density into features to mitigate the influences caused by repetitive backgrounds. In the experiments, our method demonstrates most reliable performance at any view of targets.
Conference Paper
Full-text available
We developed a novel learning-based human detection system, which can detect people having different sizes and orientations, under a wide variety of backgrounds or even with crowds. To overcome the affects of geometric and rotational variations, the system automatically assigns the dominant orientations of each block-based feature encoding by using the rectangular- and circular-type histograms of orientated gradients (HOG), which are insensitive to various lightings and noises at the outdoor environment. Moreover, this work demonstrated that Gaussian weight and tri-linear interpolation for HOG feature construction can increase detection performance. Particularly, a powerful feature selection algorithm, AdaBoost, is performed to automatically select a small set of discriminative HOG features with orientation information in order to achieve robust detection results. The overall computational time is further reduced significantly without any performance loss by using the cascade-of-rejecter structure, whose hyperplanes and weights of each stage are estimated by using the AdaBoost approach.
Article
Full-text available
As one of the most successful applications of image analysis and understanding, face recognition has recently received significant attention, especially during the past several years. This is evidenced by the emergence of face recognition conferences such as AFGR [1] and AVBPA [2], and systematic empirical evaluations of face recognition techniques, including the FERET [3, 4, 5, 6] and XM2VTS [7] protocols. There are at least two reasons for this trend; the first is the wide range of commercial and law enforcement applications, and the second is the availability of feasible technologies after 30 years of research. This paper provides an up-to-date critical survey of still- and video-based face recognition research. 1 The support of the Office of Naval Research under Grants N00014-95-1-0521 and N00014-00-1-0908 is gratefully acknowledged. 2 Vision Technologies Lab, Sarnoff Corporation, Princeton, NJ 08543-5300. 3 Center for Automation Research, University of Maryland, College Park...
Chapter
Full-text available
We present a system for recognizing human faces from single images out of a large database with one image per person. The task is difficult because of image variation in terms of position, size, expression, and pose. The system collapses most of this variance by extracting concise face descriptions in the form of image graphs. In these, fiducial points on the face (eyes, mouth etc.) are described by sets of wavelet components (jets). Image graph extraction is based on a novel approach, the bunch graph, which is constructed from a small set of sample image graphs. Recognition is based on a straight-forward comparison of image graphs. We report recognition experiments on the FERET database and the Bochum database, including recognition across pose.
Article
Full-text available
In this paper we propose a human detection framework based on an enhanced version of Histogram of Oriented Gradients (HOG) features. These feature descriptors are computed with the help of a precalculated histogram of square blocks. This novel method outperforms the integral of oriented histograms allowing the calculation of a single feature four times faster. Using Adaboost for HOG feature selection and Support Vector Machine as weak classifier, we build up a fast human classifier with an excellent detection rate. Peer Reviewed
Article
Full-text available
In this paper we propose a human detection framework based on an enhanced version of Histogram of Oriented Gradients (HOG) features. These feature descriptors are computed with the help of a precalculated histogram of square-blocks. This novel method outperforms the integral of oriented histograms allowing the calculation of a single feature four times faster. Using Adaboost for HOG feature selection and Support Vector Machine as weak classifier, we build up a real-time human classifier with an excellent detection rate. Peer Reviewed
Article
Full-text available
We derive a new self-organizing learning algorithm that maximizes the information transferred in a network of nonlinear units. The algorithm does not assume any knowledge of the input distributions, and is defined here for the zero-noise limit. Under these conditions, information maximization has extra properties not found in the linear case (Linsker 1989). The nonlinearities in the transfer function are able to pick up higher-order moments of the input distributions and perform something akin to true redundancy reduction between units in the output representation. This enables the network to separate statistically independent components in the inputs: a higher-order generalization of principal components analysis. We apply the network to the source separation (or cocktail party) problem, successfully separating unknown mixtures of up to 10 speakers. We also show that a variant on the network architecture is able to perform blind deconvolution (cancellation of unknown echoes and reverberation in a speech signal). Finally, we derive dependencies of information transfer on time delays. We suggest that information maximization provides a unifying framework for problems in "blind" signal processing.
Article
Full-text available
A number of current face recognition algorithms use face representations found by unsupervised statistical methods. Typically these methods find a set of basis images and represent faces as a linear combination of those images. Principal component analysis (PCA) is a popular example of such methods. The basis images found by PCA depend only on pairwise relationships between pixels in the image database. In a task such as face recognition, in which important information may be contained in the high-order relationships among pixels, it seems reasonable to expect that better basis images may be found by methods sensitive to these high-order statistics. Independent component analysis (ICA), a generalization of PCA, is one such method. We used a version of ICA derived from the principle of optimal information transfer through sigmoidal neurons. ICA was performed on face images in the FERET database under two different architectures, one which treated the images as random variables and the pixels as outcomes, and a second which treated the pixels as random variables and the images as outcomes. The first architecture found spatially local basis images for the faces. The second architecture produced a factorial face code. Both ICA representations were superior to representations based on PCA for recognizing faces across days and changes in expression. A classifier that combined the two ICA representations gave the best performance.
Conference Paper
Full-text available
We integrate the cascade-of-rejectors approach with the Histograms of Oriented Gradients (HoG) features to achieve a fast and accurate human detection system. The features used in our system are HoGs of variable-size blocks that capture salient features of humans automatically. Using AdaBoost for feature selection, we identify the appropriate set of blocks, from a large set of possible blocks. In our system, we use the integral image representation and a rejection cascade which significantly speed up the computation. For a 320 × 280 image, the system can process 5 to 30 frames per second depending on the density in which we scan the image, while maintaining an accuracy level similar to existing methods.
Conference Paper
Full-text available
The error correcting output coding (ECOC) approach to classifier design decomposes a multi-class problem into a set of complementary two-class problems. We show how to apply the ECOC concept to automatic face verification, which is inherently a two-class problem. The output of the binary classifiers defines the ECOC feature space, in which it is easier to separate transformed patterns representing clients and impostors. We propose two different combining strategies as the matching score for face verification. The first uses the first order Minkowski metric, and requires a threshold to be set. The second is a kernel-based method and has no parameters to set. The proposed method exhibits better performance on the well known XM2VTS data set compared with previous reported results.
Article
Full-text available
Two of the most critical requirements in support of producing reliable face-recognition systems are a large database of facial images and a testing procedure to evaluate systems. The Face Recognition Technology (FERET) program has addressed both issues through the FERET database of facial images and the establishment of the FERET tests. To date, 14,126 images from 1,199 individuals are included in the FERET database, which is divided into development and sequestered portions of the database. In September 1996, the FERET program administered the third in a series of FERET face-recognition tests. The primary objectives of the third test were to 1) assess the state of the art, 2) identify future areas of research, and 3) measure algorithm performance.
Conference Paper
We study the question of feature sets for robust visual object recognition, adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of Histograms of Oriented Gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.
Book
Major strides have been made in face processing in the last ten years due to the fast growing need for security in various locations around the globe. A human eye can discern the details of a specific face with relative ease. It is this level of detail that researchers are striving to create with ever evolving computer technologies that will become our perfect mechanical eyes. The difficulty that confronts researchers stems from turning a 3D object into a 2D image. That subject is covered in depth from several different perspectives in this volume. This book begins with a comprehensive introductory chapter for those who are new to the field. A compendium of articles follows that is divided into three sections. The first covers basic aspects of face processing from human to computer. The second deals with face modeling from computational and physiological points of view. The third tackles the advanced methods, which include illumination, pose, expression, and more. Editors Zhao and Chellappa have compiled a concise and necessary text for industrial research scientists, students, and professionals working in the area of image and signal processing. *Contributions from over 35 leading experts in face detection, recognition and image processing *Over 150 informative images with 16 images in FULL COLOR illustrate and offer insight into the most up-to-date advanced face processing methods and techniques *Extensive detail makes this a need-to-own book for all involved with image and signal processing.
Article
The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and non-separable data, working through a non-trivial example in detail. We describe a mechanical analogy, and discuss when SVM solutions are unique and when they are global. We describe how support vector training can be practically implemented, and discuss in detail the kernel mapping technique which is used to construct SVM solutions which are nonlinear in the data. We show how Support Vector machines can have very large (even infinite) VC dimension by computing the VC dimension for homogeneous polynomial and Gaussian radial basis function kernels. While very high VC dimension would normally bode ill for generalization performance, and while at present there exists no theory which shows that good generalization performance is guaranteed for SVMs, there are several arguments which support the observed high accuracy of SVMs, which we review. Results of some experiments which were inspired by these arguments are also presented. We give numerous examples and proofs of most of the key theorems. There is new material, and I hope that the reader will find that even old material is cast in a fresh light.
Article
This paper examines the classification capability of different Gabor representations for human face recognition. Usually, Gabor filter responses for eight orientations and five scales for each orientation are calculated and all 40 basic feature vectors are concatenated to assemble the Gabor feature vector. This work explores 70 different Gabor feature vector extraction techniques for face recognition. The main goal is to determine the characteristics of the 40 basic Gabor feature vectors and to devise a faster Gabor feature extraction method. Among all the 40 basic Gabor feature representations the filter responses acquired from the largest scale at smallest relative orientation change (with respect to face) shows the highest discriminating ability for face recognition while classification is performed using three classification methods: probabilistic neural networks (PNN), support vector machines (SVM) and decision trees (DT). A 40 times faster summation based Gabor representation shows about 98% recognition rate while classification is performed using SVM. In this representation all 40 basic Gabor feature vectors are summed to form the summation based Gabor feature vector. In the experiment, a sixth order data tensor containing the basic Gabor feature vectors is constructed, for all the operations.
Article
Eye activity is one of the main sources of artefacts in EEG and MEG recordings. A new approach to the correction of these disturbances is presented using the statistical technique of independent component analysis. This technique separates components by the kurtosis of their amplitude distribution over time, thereby distinguishing between strictly periodical signals, regularly occurring signals and irregularly occurring signals. The latter category is usually formed by artefacts. Through this approach, it is possible to isolate pure eye activity in the EEG recordings (including EOG channels), and so reduce the amount of brain activity that is subtracted from the measurements, when extracting portions of the EOG signals.
Article
Major strides have been made in face processing in the last ten years due to the fast growing need for security in various locations around the globe. A human eye can discern the details of a specific face with relative ease. It is this level of detail that researchers are striving to create with ever evolving computer technologies that will become our perfect mechanical eyes. The difficulty that confronts researchers stems from turning a 3D object into a 2D image. That subject is covered in depth from several different perspectives in this volume. This book begins with a comprehensive introductory chapter for those who are new to the field. A compendium of articles follows that is divided into three sections. The first covers basic aspects of face processing from human to computer. The second deals with face modeling from computational and physiological points of view. The third tackles the advanced methods, which include illumination, pose, expression, and more. Editors Zhao and Chellappa have compiled a concise and necessary text for industrial research scientists, students, and professionals working in the area of image and signal processing. *Contributions from over 35 leading experts in face detection, recognition and image processing *Over 150 informative images with 16 images in FULL COLOR illustrate and offer insight into the most up-to-date advanced face processing methods and techniques *Extensive detail makes this a need-to-own book for all involved with image and signal processing.
Article
We have developed a near-real-time computer system that can locate and track a subject's head, and then recognize the person by comparing characteristics of the face to those of known individuals. The computational approach taken in this system is motivated by both physiology and information theory, as well as by the practical requirements of near-real-time performance and accuracy. Our approach treats the face recognition problem as an intrinsically two-dimensional (2-D) recognition problem rather than requiring recovery of three-dimensional geometry, taking advantage of the fact that faces are normally upright and thus may be described by a small set of 2-D characteristic views. The system functions by projecting face images onto a feature space that spans the significant variations among known face images. The significant features are known as "eigenfaces," because they are the eigenvectors (principal components) of the set of faces; they do not necessarily correspond to features such as eyes, ears, and noses. The projection operation characterizes an individual face by a weighted sum of the eigenface features, and so to recognize a particular face it is necessary only to compare these weights to those of known individuals. Some particular advantages of our approach are that it provides for the ability to learn and later recognize new faces in an unsupervised manner, and that it is easy to implement using a neural network architecture.
Article
Human detection is the task of finding presence and position of human beings in images, In this paper, we apply scale space theory to detection human in still images. By integrating scale space theory with histogram of oriented gradients(HOG), we designed a new feature descriptor called scale space histogam of oriented gradients (SS-HOG). SS-HOG focus on the multiple scale property of describe an object. Using HOGs at multiple scale, SS-HOG encodes more information to discriminate human bodies from other object types than traditional uni-scale HOGs Experiments on INRIA person dataset demonstrate the effectiveness of our method.
Conference Paper
This paper presents a complete method for pedestrian detection applied to infrared images. First, we study an image descriptor based on histograms of oriented gradients (HOG), associated with a support vector machine (SVM) classifier and evaluate its efficiency. After having tuned the HOG descriptor and the classifier, we include this method in a complete system, which deals with stereo infrared images. This approach gives good results for window classification, and a preliminary test applied on a video sequence proves that this approach is very promising
Article
Humans detect and identify faces in a scene with little or no effort. However, building an automated system that accomplishes this task is very difficult. There are several related subproblems: detection of a pattern as a face, identification of the face, analysis of facial expressions, and classification based on physical features of the face. A system that performs these operations will find many applications, e.g. criminal identification, authentication in secure systems, etc. Most of the work to date has been in identification. This paper surveys the past work in solving these problems. The capability of the human visual system with respect to these problems is also discussed. It is meant to serve as a guide for an automated system. Some new approaches to these problems are also briefly discussed.
Article
We present an extensive study of the support vector machine (SVM) sensitivity to various processing steps in the context of face authentication. In particular, we evaluate the impact of the representation space and photometric normalisation technique on the SVM performance. Our study supports the hypothesis that the SVM approach is able to extract the relevant discriminatory information from the training data. We believe that this is the main reason for its superior performance over benchmark methods (e.g. the eigenface technique). However, when the representation space already captures and emphasises the discriminatory information content (e.g. the fisherface method), the SVMs cease to be superior to the benchmark techniques. The SVM performance evaluation is carried out on a large face database containing 295 subjects.
Article
This paper presents a new face recognition algorithm based on the well-known EBGM which replaces Gabor features by HOG descriptors. The recognition results show a better performance of our approach compared to other face recognition approaches using public available databases. This better performance is explained by the properties of HOG descriptors which are more robust to changes in illumination, rotation and small displacements, and to the higher accuracy of the face graphs obtained compared to classical Gabor–EBGM ones.
Conference Paper
The purpose of this paper is to detect pedestrians from images. This paper proposes a method for extracting feature descriptors consisting of co-occurrence histograms of oriented gradients (CoHOG). Including co-occurrence with various positional offsets, the feature descriptors can express complex shapes of objects with local and global distributions of gradient orientations. Our method is evaluated with a simple linear classifier on two famous pedestrian detection benchmark datasets: “DaimlerChrysler pedestrian classification benchmark dataset” and “INRIA person data set”. The results show that proposed method reduces miss rate by half compared with HOG, and outperforms the state-of-the-art methods on both datasets.
Conference Paper
We describe new methods for multi-hop mobile wireless network design, so as to meet performance specifications. We introduce an implicitly defined approximate loss model that couples the physical, MAC and routing layers' effects. The model provides quantitative statistical relations between the loss parameters that are used to characterize multiuser interference and physical path conditions on the one hand and the traffic rates and types between origin-destination pairs on the other. The model considers effects of hidden nodes, node scheduling algorithms, MAC and PHY layer failures and unsuccessful packet transmission attempts at the MAC layer in arbitrary network topologies where multiple paths share nodes. We consider both contention-based as well as scheduling-based MAC protocols. The new methods are based on implicit loss network models developed using fixed point methods. We describe the application of Automatic Differentiation (AD) to these implicit performance models, and develop a methodology for sensitivity analysis and parameter optimization for wireless protocols. We also introduce new methods that utilize on-line feedback from the PHY layer to simplify the joint MAC and routing protocol design. We provide several examples, with realistic scenarios, that demonstrate the benefits of applying these new methods in an integrated manner. We close with current and future work that extends these methods and algorithms towards a Component-Based design, performance analysis and validation of distributed protocols for mobile wireless networks. We argue that this set of ideas provides a very promising approach to 'clean slate' design for such networks and the resulting hybrid broadband Internet.
Article
This paper discusses the application of a modern signal processing technique known as independentcomponent analysis (ICA) or blind source separation to multivariate financial time series such as aportfolio of stocks. The key idea of ICA is to linearly map the observed multivariate time series into a newspace of statistically independent components (ICs). This can be viewed as a factorization of the portfoliosince joint probabilities become simple products in the coordinate system of the ICs.We apply ICA to three years of daily returns of the 28 largest Japanese stocks and compare the results withthose obtained using principal component analysis. The results indicate that the estimated ICs fall into twocategories, (i) infrequent but large shocks (responsible for the major changes in the stock prices), and (ii)frequent smaller fluctuations (contributing little to the overall level of the stocks). We show that the overallstock price can be reconstructed surprisingly well by using a small number of thresholded weighted ICs.In contrast, when using shocks derived from principal components instead of independent components, thereconstructed price is less similar to the original one. Independent component analysis is a potentially powerfulmethod of analyzing and understanding driving mechanisms in financial markets. There are furtherpromising applications to risk management since ICA focuses on higher-order statistics.
Article
This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Conference Paper
This paper details filtering subsystem for a tetra-vision based pedestrian detection system. The complete system is based on the use of both visible and far infrared cameras; in an initial phase it produces a list of areas of attention in the images which can contain pedestrians. This list is furtherly refined using symmetry-based assumptions. Then, this results is fed to a number of independent validators that evaluate the presence of human shapes inside the areas of attention. Histogram of oriented gradients and Support Vector Machines are used as a filter and demonstrated to be able to successfully classify up to 91% of pedestrians in the areas of attention.
Conference Paper
Between October 2000 and December 2000, we collected a database of over 40,000 facial images of 68 people. Using the CMU (Carnegie Mellon University) 3D Room, we imaged each person across 13 different poses, under 43 different illumination conditions, and with four different expressions. We call this database the CMU Pose, Illumination and Expression (PIE) database. In this paper, we describe the imaging hardware, the collection procedure, the organization of the database, several potential uses of the database, and how to obtain the database
Conference Paper
Two of the most critical requirements in support of producing reliable face-recognition systems are a large database of facial images and a testing procedure to evaluate systems. The Face Recognition Technology (FERET) program has addressed both issues through the FERET database of facial images and the establishment of the FERET tests. To date, 14,126 images from 1199 individuals are included in the FERET database, which is divided into development and sequestered portions. In September 1996, the FERET program administered the third in a series of FERET face-recognition tests. The primary objectives of the third test were to (1) assess the state of the art, (2) identify future areas of research, and (3) measure algorithm performance on large databases
Article
The goal of this paper is to present a critical survey of existing literature on human and machine recognition of faces. Machine recognition of faces has several applications, ranging from static matching of controlled photographs as in mug shots matching and credit card verification to surveillance video images. Such applications have different constraints in terms of complexity of processing requirements and thus present a wide range of different technical challenges. Over the last 20 years researchers in psychophysics, neural sciences and engineering, image processing analysis and computer vision have investigated a number of issues related to face recognition by humans and machines. Ongoing research activities have been given a renewed emphasis over the last five years. Existing techniques and systems have been tested on different sets of images of varying complexities. But very little synergism exists between studies in psychophysics and the engineering literature. Most importantly, there exists no evaluation or benchmarking studies using large databases with the image quality that arises in commercial and law enforcement applications In this paper, we first present different applications of face recognition in commercial and law enforcement sectors. This is followed by a brief overview of the literature on face recognition in the psychophysics community. We then present a detailed overview of move than 20 years of research done in the engineering community. Techniques for segmentation/location of the face, feature extraction and recognition are reviewed. Global transform and feature based methods using statistical, structural and neural classifiers are summarized
Article
The development of face recognition over the past years allows an organization into three types of recognition algorithms, namely frontal, profile, and view-tolerant recognition, depending on the kind of imagery and the according recognition algorithms. While frontal recognition certainly is the classical approach, view-tolerant algorithms usually perform recognition in a more sophisticated fashion by taking into consideration some of the underlying physics, geometry, and statistics. Profile schemes as stand-alone systems have a rather marginal significance for identification. However, they are very practical either for fast coarse pre-searches of large face databases to reduce the computational load for a subsequent sophisticated algorithm, or as part of a hybrid recognition scheme. Such hybrid approaches have a special status among face recognition systems as they combine different recognition approaches in an either serial or parallel order to overcome the shortcomings of the indiv...