About
223
Publications
58,926
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,964
Citations
Introduction
Received his Ph.D. and M.S. in EE (Texas Tech Univ.) and MBA (Shenandoah Univ.). He is a teacher, researcher, inventor, and entrepreneur. He is a professor and the director of the Robotic Vision Lab in the ECE Dept. at BYU. He co-founded Smart Vision Works in 2012 to build and market artificial intelligence-based smart cameras for visual inspection automation. His research focuses on artificial intelligence, high-performance visual computing, robotic vision, and visual inspection automation.
Publications
Publications (223)
The process of manually annotating sports footage is a demanding one. In American football alone, coaches spend thousands of hours reviewing and analyzing videos each season. We aim to automate this process by developing a system that generates comprehensive statistical reports from full-length football game videos. Having previously demonstrated t...
Facial expression recognition (FER) plays a crucial role in various applications, including human–computer interaction and affective computing. However, the joint training of an FER network with multiple datasets is a promising strategy to enhance its performance. Nevertheless, widespread annotation inconsistencies and class imbalances among FER da...
American football is one of the most popular team sports in the United States. There are approximately 16,000 high school and 890 college football teams, and each team plays around 10–14 games per football season. Contrary to most casual fans’ views, American football is more than speed and power, it requires preparation and strategies. Coaches ana...
Macro-expression spotting is an important prior step in many dynamic facial expression analysis applications. It automatically detects the onset and offset image frames of a macro-expression in the video. The state-of-the-art methods of macro-expression spotting characterize the movement of facial muscle through explicit analysis of the optical flo...
Facial recognition systems frequently exhibit high accuracies when evaluated on standard test datasets. However, their performance tends to degrade significantly when confronted with more challenging tests, particularly involving specific racial categories. To measure this inconsistency, many have created racially aware datasets to evaluate facial...
Camera with a fisheye or ultra-wide lens covers a wide field of view that cannot be modeled by the perspective projection. Serious fisheye lens distortion in the peripheral region of the image leads to degraded performance of the existing head pose estimation models trained on undistorted images. This paper presents a new approach for head pose est...
This work reviews the dataset-driven advancements that have occurred in the area of lip motion analysis, particularly visual lip-reading and visual lip motion authentication, in the deep learning era. We provide an analysis of datasets and their usage, creation, and associated challenges. Future research can utilize this work as a guide for selecti...
Modeling the interactions among individuals in a group is essential for group activity recognition (GAR). Various graph neural networks (GNNs) are regarded as popular modeling methods for GAR, as they can characterize the interaction among individuals at a low computational cost. The performance of the current GNN-based modeling methods is affected...
Facial motion representation learning has become an exciting research topic, since biometric technologies are becoming more common in our daily lives. One of its applications is identity verification. After recording a dynamic facial motion video for enrollment, the user needs to show a matched facial appearance and make a facial motion the same as...
Annotation and analysis of sports videos is a time-consuming task that, once automated, will provide benefits to coaches, players, and spectators. American football, as the most watched sport in the United States, could especially benefit from this automation. Manual annotation and analysis of recorded videos of American football games is an ineffi...
Research on social psychology has revealed the existence of an affective mechanism in a human group, which is the group members spread their emotions to one another, the emotions of the group members form the group emotion, and the group emotion as a powerful force shapes the group members' emotions. Current group emotion recognition methods focus...
The performance of all learning-based group emotion recognition (GER) methods depends on the number of labeled samples. Although there are lots of group emotion images available on the Internet, labeling them manually is a labor-intensive and cost-expensive process. For this reason, datasets for GER are usually small in size, which limits the perfo...
Running a reliable network on resource-limited platforms for a low-resolution image is a great challenge for heatmap-based human pose estimation (HPE). Scale mismatch between the input image and heatmaps and the intrinsic quantization effect induced by the ‘argmax’ function hinder the performance of heatmap-based human pose estimation for low-resol...
Deep learning became an important image classification and object detection technique more than a decade ago. It has since achieved human-like performance for many computer vision tasks. Some of them involve the analysis of human face for applications like facial recognition, expression recognition, and facial landmark detection. In recent years, r...
A model with capability for precisely predicting readmission is a target being pursued worldwide. The objective of this study is to design predictive models using artificial intelligence methods and data retrieved from the National Health Insurance Research Database of Taiwan for identifying high-risk pneumonia patients with 30-day all-cause readmi...
Using lightweight networks for facial expression recognition (FER) is becoming an important research topic in recent years. The key to the success of FER with lightweight networks is to explore the potentials of expression features in distinct abstract levels and regions, and design robust features to characterize the facial appearance. This paper...
Head pose estimation is an important step for many human-computer interaction applications such as face detection, facial recognition, and facial expression classification. Accurate head pose estimation benefits these applications that require face images as the input. Most head pose estimation methods suffer from perspective distortion because the...
Group emotion recognition (GER) from image has attracted much attention in recent years. Networks using attention mechanism for GER have shown great potential. However, the performance of the current attention-based GER networks suffers from the indistinctive features of individuals in the group, poor feature fusion weights, and the lack of semanti...
Estimating gaze from a low-resolution facial image is a challenging task. Most current networks for gaze estimation focus on using face images of adequate resolution. Their performance degrades when the image resolution decreases due to information loss. This work aims to explore more helpful face and gaze information in a novel way to alleviate th...
The state of Michigan, U.S.A., was awarded USD 1 million in March 2018 for the Great Lakes Invasive Carp Challenge. The challenge sought new and novel technologies to function independently of or in conjunction with those fish deterrents already in place to prevent the movement of invasive carp species into the Great Lakes from the Illinois River t...
Facial expression recognition (FER) accuracy is often affected by an individual's unique facial characteristics. Recognition performance can be improved if the influence from these physical characteristics is minimized. Using video instead of single image for FER provides better results but requires extracting temporal features and the spatial stru...
There has been a recent surge in publications related to binarized neural networks (BNNs), which use binary values to represent both the weights and activations in deep neural networks (DNNs). Due to the bitwise nature of BNNs, there have been many efforts to implement BNNs on ASICs and FPGAs. While BNNs are excellent candidates for these kinds of...
Golf players spend hours perfecting their swing. It takes much practice and dedicated effort to train their body to make an effective swing. In order to train the body in such a way, golf players must be extremely mindful about the placement and motion of key body parts, such as wrists, elbows, shoulders, and torso. With correct placement and motio...
Identity verification is ubiquitous in daily life. Its applications range from unlocking mobile device to accessing online account, boarding airplane or other types of transportation, recording times of arrival and leaving work, controlling access to a restricted area, facility, or vault, and many more. The traditional and the most popular identity...
Annotation and analysis of sports videos is a challenging task that, once accomplished, could provide various benefits to coaches, players, and spectators. In particular, American Football could benefit from such a system to provide assistance in statistics and game strategy analysis. Manual analysis of recorded American football game videos is a t...
A golf swing requires full-body coordination and much practice to perform the complex motion precisely and consistently. The force from the golfer’s full-body movement on the club and the trajectory of the swing are the main determinants of swing quality. In this research, we introduce a unique motion analysis method to evaluate the quality of golf...
Feature description has an important role in image matching and is widely used for a variety of computer vision applications. As an efficient synthetic basis feature descriptor, SYnthetic BAsis (SYBA) requires low computational complexity and provides accurate matching results. However, the number of matched feature points generated by SYBA suffers...
Due to the increasing consumption of food products and demand for food quality and safety, most food processing facilities in the United States utilize machines to automate their processes, such as cleaning, inspection and grading, packing, storing, and shipping. Machine vision technology has been a proven solution for inspection and grading of foo...
Feature detection, description, and matching are crucial steps for many computer vision algorithms. These steps rely on feature descriptors to match image features across sets of images. Previous work has shown that our SYnthetic BAsis (SYBA) feature descriptor can offer superior performance to other binary descriptors. This paper focused on variou...
Capturing the dynamics of facial expression progression in video is an essential and challenging task for facial expression recognition (FER). In this paper, we propose an effective framework to address this challenge. We develop a C3D-based network architecture, 3D-Inception-ResNet, to extract spatial-temporal features from the dynamic facial expr...
This paper reports the development of an efficient evolutionary learning algorithm designed specifically for real-time embedded visual inspection applications. The proposed evolutionary learning algorithm constructs image features as a series of image transforms for image classification and is suitable for resource-limited systems. This algorithm r...
Deep neural networks have achieved great success in many tasks of pattern recognition. However, large model size and high cost in computation limit their applications in resource-limited systems. In this paper, our focus is to design a lightweight and efficient convolutional neural network architecture by directly training the compact network for i...
Erectile dysfunction (ED) affects millions of men worldwide. Men with ED generally complain failure to attain or maintain an adequate erection during sexual activity. The prevalence of ED is strongly correlated with age, affecting about 40% of men at age 40 and nearly 70% at age 70. A variety of chronic diseases, including diabetes, ischemic heart...
Finding corresponding image features between two images is often the first step for many computer vision algorithms. This paper introduces an improved synthetic basis feature descriptor algorithm that describes and compares image features in an efficient and discrete manner with rotation and scale invariance. It works by performing a number of simi...
In this work, we review Binarized Neural Networks (BNNs). BNNs are deep neural networks that use binary values for activations and weights, instead of full precision values. With binary values, BNNs can execute computations using bitwise operations, which reduces execution time. Model sizes of BNNs are much smaller than their full precision counter...
This paper explores a set of learned convolutional kernels which we call Jet Features. Jet Features are efficient to compute in software, easy to implement in hardware and perform well on visual inspection tasks. Because Jet Features can be learned, they can be used in machine learning algorithms. Using Jet Features, we make significant improvement...
Food recognition is the first step for dietary assessment. Computer vision technology is being viewed as an effective tool for automatic food recognition for monitoring nutrition intake. Of the many food recognition algorithms in the literature, Bag-of-Features model is a proven approach that has shown impressive recognition accuracy. In this paper...
The structure of image consists of two aspects: intensity of structure and distribution of structure. Image distortions that degrade image quality potentially affect both the intensity and distribution of image structure. Yet most structure-based image quality assessment methods focus only on the change of the intensity of structure. In this paper,...
Automatic detection of fabric defects is an important process for the textile industry. Besides the detection accuracy, an automatic fabric defect detection solution for a resource-limited system also requires superior performance in terms of processing time and simplicity. This paper proposes a compact convolutional neural network architecture for...
Development of advanced driver assistance systems has become an important focus for automotive industry in recent years. Within this field, many computer vision–related functions require motion estimation. This article discusses the implementation of a newly developed SYnthetic BAsis (SYBA) feature descriptor for matching feature points to generate...
Maintenance of catenary system is a crucial task for the safe operation of high-speed railway systems. Catenary system malfunction could interrupt railway service and threaten public safety. This article presents a computer vision algorithm that is developed to automatically detect the defective rod-insulators in a catenary system to ensure reliabl...
More than 1 billion people suffer from chronic respiratory diseases worldwide, accounting for more than 4 million deaths annually. Inhaled corticosteroid is a popular medication for treating chronic respiratory diseases. Its side effects include decreased bone mineral density and osteoporosis. The aims of this study are to investigate the associati...
The Antarctic nematode Plectus Murrayi is an excellent model organism for the study of stress and molecular mechanisms. Biologists analyze its development and adaptation by measuring the body length and volume. This work proposes an edge detection algorithm to automate this labor-intensive task. Traditional edge detection techniques use predefined...
Many quality evaluation tasks that are complicated and unique to specialty crops are often carried out manually by human experts by visually inspecting product appearances. This labor-intensive process usually depends greatly on experienced workers and lacks verification efficiency. Automating these tasks not only reduces the processing time, impro...
Evolution-Constructed (ECO) Feature as a method to learn image features has achieved very good results on a variety of object recognition and classification applications. When compared with hand-crafted features, ECO-Feature is capable of constructing non-intuitive features that could be overlooked by human experts. Although the ECO features are ea...
Obesity is becoming a widely concerned health problem of most part of the world. Computer vision based recognition system has great potential to be an efficient tool to monitor food intake and cope with the growing problem of obesity. This paper proposes a food recognition algorithm based on sparse representation. The proposed algorithm learns over...
Segmenting Magnetic Resonance images plays a critical role in radiotherapy, surgical planning and image-guided interventions. Traditional differential filter-based segmentation algorithms are predefined independently of image features and require extensive post processing. Convolutional Neural Networks (CNNs) are regarded as a powerful visual model...
Feature point matching is a critical step to visual odometry (VO) computation and many other vision applications. Frame-to-frame ego-motion drift caused by feature mismatching is the main challenge for VO. This paper presents a VO algorithm that uses a newly developed feature descriptor called synthetic basis descriptor to obtain accurate feature m...
Invasive fish species are a growing threat worldwide, causing great harm to biodiversity and ecosystems, and leading to large economic losses. As the most introduced group of aquatic animals in the world, fish are also one of the most threatened. For species that are considered invasive, removing them is the best way to reduce the long-term cost of...
Feature matching is an important step for many computer vision applications. This paper introduces the development of a new feature descriptor, called SYnthetic BAsis (SYBA), for feature point description and matching. SYBA is built on the basis of the compressed sensing theory that uses synthetic basis functions to encode or reconstruct a signal....
Many computer vision applications need motion detection and analysis. In this research, a newly developed feature descriptor is used to find sparse motion vectors. Based on the resulting sparse motion field the camera motion is detected and analyzed. Statistical analysis is performed, based on polar representation of motion vectors. Direction of mo...
Evolution-Constructed (ECO) features have been shown to be effective for general object recognition. ECO features use evolution strategies to build series of transforms and thus can be generated automatically without human expert involvement. We improved on our successful ECO features algorithm by reducing their dimensions before putting them into...
Assessing the taxonomy of fish is important to manage fish populations, regulate fisheries, and remove the exotic invasive species. Automating this process saves valuable resources of time, money, and manpower. Current
methods for automatic fish monitoring rely on a human expert to design features necessary for classifying fish into a taxonomy. Thi...
Tracking moving objects with a moving camera is a challenging task. For unmanned aerial vehicle applications, targets of interest such as human and vehicles often change their location from image frame to frame. This paper presents an object tracking method based on accurate feature description and matching, using the SYnthetic BAsis descriptor, to...
This paper presents research work on the detection, tracking, and localization of the soccer ball in a broadcast soccer video and maps the ball locations to the global coordinate system of the soccer field. Because of the lack of reference points in these frames, the calculation of the global coordinates of the ball remains a very challenging task....
Many vision-based applications require a robust feature descriptor that works well with image deformations such as compression, illumination, and blurring. It remains a challenge for a feature descriptor to work well with image deformation caused by viewpoint change. This paper introduces, first, a new binary feature descriptor called SYnthetic BAs...
One of the common ways of human showing emotion is through the change in facial expression. In this paper, we propose a new method for emotion detection by analyzing facial expression images. Facial expression information is analyzed by using a new feature construction method called Evolution-COnstructed (ECO) Features. The proposed algorithm is ab...
This paper presents a novel feature descriptor called TreeBASIS that provides improvements in descriptor size, computation time, matching speed, and accuracy. This new descriptor uses a binary vocabulary tree that is computed using basis dictionary images and a test set of feature region images. To facilitate real-time implementation, a feature reg...
An efficient histogram analysis algorithm is proposed for real-time automated fruit surface quality evaluation. This approach, based on short-wave infrared imaging, provides excellent image contrast between the fruit surface and delaminated skin, which allows significant simplification of image processing algorithm and reduction of computational po...
This paper presents a monocular visual odometry algorithm that incorporates a wheeled vehicle model for ground vehicles. The main innovation of this algorithm is to use the single-track bicycle model to interpret the relationship between the yaw rate and side slip angle, which are the two most important parameters that describe the motion of a whee...
This paper presents the development of a new feature descriptor derived from previous work on the basis sparsecoding inspired similarity descriptor that provides smaller descriptor size, simpler computations, faster matching speed, and higher accuracy. The TreeBASIS descriptor algorithm uses a binary vocabulary tree that is computed offline using b...
A new color grading method is proposed in this paper to provide an automatic and intuitive way of evaluating the maturity and quality of harvested dates. Different from other existing methods that rely on complicated machine learning or artificial intelligent algorithms, this method uses 2D histograms of colors in each grading category to determine...
In order to help the visually impaired as they navigate unfamiliar environment such as public buildings, this paper presents a novel smart phone, vision-based indoor localization, and guidance system, called Seeing Eye Phone. This system requires a smart phone from the user and a server. The smart phone captures and transmits images of the user fac...
A feature descriptor that is robust to a number of image deformations is a basic requirement for vision based applications. Most feature descriptors work well in image deformations such as compression artifacts, illumination changes, and blurring. To develop a feature descriptor that works well apart from these image deformations like transformatio...
A variety of platforms, such as micro-unmanned vehicles, are limited in the amount of computational hardware they can support due to weight and power constraints. An efficient stereo vision algorithm implemented on an FPGA would be able to minimize payload and power consumption in microunmanned vehicles, while providing 3D information and still lea...