Hau T. Ngo

United States Naval Academy, Annapolis, Maryland, United States

Are you Hau T. Ngo?

Claim your profile

Publications (36)2.92 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Improvements in face detection performance would benefit many applications. The OpenCV library implements a standard solution, the Viola-Jones detector, with a statistically boosted rejection cascade of binary classifiers. Empirical evidence has shown that Viola-Jones underdetects in some instances. This research shows that a truncated cascade augmented by a neural network could recover these undetected faces. A hybrid framework is constructed, with a truncated Viola-Jones cascade followed by an artificial neural network, used to refine the face decision. Optimally, a truncation stage that captured all faces and allowed the neural network to remove the false alarms is selected. A feedforward backpropagation network with one hidden layer is trained to discriminate faces based upon the thresholding (detection) values of intermediate stages of the full rejection cascade. A clustering algorithm is used as a precursor to the neural network, to group significant overlappings. Evaluated on the CMU/VASC Image Database, comparison with an unmodified OpenCV approach shows: (1) a 37% increase in detection rates if constrained by the requirement of no increase in false alarms, (2) a 48% increase in detection rates if some additional false alarms are tolerated, and (3) an 82% reduction in false alarms with no reduction in detection rates. These results demonstrate improved face detection and could address the need for such improvement in various applications.
    IS&T/SPIE Electronic Imaging; 12/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, a real time FPGA-based iris segmentation system is presented. The segmentation method implements the Canny edge detection algorithm and a circle search to detect an iris in an image or video frame. The proposed high performance architecture utilizes on-chip memory to significantly improve the throughput of the pipelined and parallel structure. A data forwarding technique is incorporated in the design to efficiently utilize the FPGA's embedded resources. The proposed architecture demonstrates a high speed processing capability that will facilitate the use of dedicated hardware to support an iris recognition application for large databases.
    Application-Specific Systems, Architectures and Processors (ASAP), 2012 IEEE 23rd International Conference on; 01/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: FPGA devices with embedded DSP and memory blocks, and high-speed interfaces are ideal for real-time video processing applications. In this work, a hardware-software co-design approach is proposed to effectively utilize FPGA features for a prototype of an automated video surveillance system. Time-critical steps of the video surveillance algorithm are designed and implemented in the FPGAs logic elements to maximize parallel processing. Other non timecritical tasks are achieved by executing a high level language program on an embedded Nios-II processor. Pre-tested and verified video and interface functions from a standard video framework are utilized to significantly reduce development and verification time. Custom and parallel processing modules are integrated into the video processing chain by Altera's Avalon Streaming video protocol. Other data control interfaces are achieved by connecting hardware controllers to a Nios-II processor using Altera's Avalon Memory Mapped protocol.
    Proc SPIE 05/2011;
  • [Show abstract] [Hide abstract]
    ABSTRACT: The human iris exhibits random and unique textural patterns that allow for identification with high accuracy. These patterns are evident in near-infrared (NIR) imagery, even for very dark irises. The authors investigate the information content of the iris contained in these patterns, and how it affects recognition performance. In this paper, iris templates are created from NIR iris imagery with the Ridge Energy Direction (RED) recognition algorithm, and using common biometric performance metrics we determine which portions of the iris contain the most distinctive information for recognition.
    01/2011;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Iris recognition is an important application in the Department of Defense and the Department of Homeland Security. An algorithm that is both accurate and fast in a hardware design that is small and transportable are crucial to the implementation of this tool. As part of an ongoing effort to meet these criteria, this paper improves a segment of the US Naval Academy's RED iris recognition algorithm, namely pupil isolation. We show a significant speed-up of pupil isolation by implementing this portion of the algorithm on a Field Programmable Gate Array (FPGA).
    Signals, Systems and Computers (ASILOMAR), 2010 Conference Record of the Forty Fourth Asilomar Conference on; 12/2010
  • Hau Ngo, V. Asari
    [Show abstract] [Hide abstract]
    ABSTRACT: Window-based operations such as two dimensional (2-D) convolution operations are commonly used in image and video processing applications. In this paper, a new design technique that considers the neighboring pixels within the window to detect and eliminate redundant or unnecessary computations for power reduction is presented. A novel on-chip detection technique is developed for the proposed neighborhood dependent approach (NDA) to reduce computations. In addition, data partitioning methodology is employed in the on chip buffer design support real-time operations. This NDA method is applied to different window buffering schemes and experimental results are presented.
    Industrial Electronics and Applications, 2009. ICIEA 2009. 4th IEEE Conference on; 06/2009
  • [Show abstract] [Hide abstract]
    ABSTRACT: A high performance digital architecture for the implementation of a non-linear image enhancement technique is proposed in this paper. The image enhancement is based on a luminance dependent non-linear enhancement algorithm which achieves simultaneous dynamic range compression, colour consistency and lightness rendition. The algorithm provides better colour fidelity, enhances less noise, prevents the unwanted luminance drop at the uniform luminance areas, keeps the ‘bright’ background unaffected, and enhances the ‘dark’ objects in ‘bright’ background. The algorithm contains a large number of complex computations and thus it requires specialized hardware implementation for real-time applications. Systolic, pipelined and parallel design techniques are utilized effectively in the proposed FPGA-based architectural design to achieve real-time performance. Estimation techniques are also utilized in the hardware algorithmic design to achieve faster, simpler and more efficient architecture. The video enhancement system is implemented using Xilinx’s multimedia development board that contains a VirtexII-X2000 FPGA and it is capable of processing approximately 67 Mega-pixels (Mpixels) per second.
    Microprocessors and Microsystems. 06/2009;
  • Hau T. Ngo, Vijayan K. Asari
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a partitioning and gating technique for the design of a high performance and low-power multiplier for kernel-based operations such as 2D convolution in video processing applications. The proposed technique reduces dynamic power consumption by analyzing the bit patterns in the input data to reduce switching activities. Special values of the pixels in the video streams such as zero, repeated values or repeated bit combinations are detected and data paths in the architecture design are disabled appropriately to eliminate unnecessary switching. Input pixels in the video stream are partitioned into halves to increase the possibility of detecting special values. It is observed that the proposed scheme helps to reduce dynamic power consumption in the 2D convolution operations up to 33%.
    Microelectronics Journal. 01/2009; 40:1582-1589.
  • Hau T. Ngo, Vijayan K. Asari
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, a design and implementation of an efficient, low power log-based 2D convolution unit (convolver) for video processing applications is proposed. The design of the proposed convolver utilizes approximation method with error correction technique to transform data to logarithmic domain for reduced power consumption. A novel design and implementation of a modular approach for leading bit detection module that is used to compute the binary logarithm is presented. A partitioning and gating technique is also presented to reduce the switching activities based on detection of insignificant data bits. It is observed that the proposed logarithmic-domain multiplier reduces power consumption in two common image filtering operations by more than 50% compared to conventional linear-domain 2D convolvers.
    Sixth International Conference on Information Technology: New Generations, ITNG 2009, Las Vegas, Nevada, 27-29 April 2009; 01/2009
  • Proceedings of the ISCA 24th International Conference on Computers and Their Applications, CATA 2009, April 8-10, 2009, Holiday Inn Downtown-Superdome, New Orleans, Louisiana, USA; 01/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Modern advancements in configurable hardware, most notably field-programmable gate arrays (FPGAs) have provided an exciting opportunity to discover the parallel nature of modern image processing algorithms. On the other hand, PlayStation3 (PS3) game consoles contain a multi-core heterogeneous processor known as the cell, which is designed to perform complex image processing algorithms at a high-performance level. All the while, image processing algorithms are still coded for off-the-shelf computers, such as the state-of-the-art Xeon-based computer systems. In this research project, we study the differences in performance of a modern image processing algorithm on three hardware platforms. We show that the heterogenous cell based PS3 is able to outperform a state-of-the-art Xeon processor by 7.7 times. However, our results on an FPGA are 2.5 times better than the PS3. Although the cell processor greatly outperforms the Xeon processor, the FPGA is the leader in performance.
    01/2009;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Design of an efficient modular architecture for detection of multiple faces in video stream is presented in this paper. Face detection is the first step in many surveillance and security applications such as face recognition, face authentication for banking and security access control, monitoring and tracking, etc. The algorithm used for this hardware design is the Viola-Jones algorithm, which has proven to be very effective and fast. The hardware design employs a modular approach in an efficient memory management strategy for this memory-intensive application. The proposed design is targeted for a low-cost FPGA prototype board from Altera (DE2 board) for a cost-effective face detection system. The proposed approach utilizes the on-chip memory module to reduce accesses to external memory chip for improved performance in the application.
    Information, Communications & Signal Processing, 2007 6th International Conference on; 01/2008
  • [Show abstract] [Hide abstract]
    ABSTRACT: A high performance digital architecture for the implementation of a nonlinear image enhancement technique is proposed in this paper. The image enhancement is based on an illuminance-reflectance model which improves the visual quality of digital images and video captured under insufficient or non-uniform lighting conditions. The algorithm shows robust performance with appropriate dynamic range compression, good contrast, accurate and consistent color rendition. The algorithm contains a large number of complex computations and thus it requires specialized hardware implementation for real-time applications. Systolic, pipelined and parallel design techniques are utilized effectively in the proposed FPGA-based architectural design to achieve real-time performance. Approximation techniques are used in the hardware algorithmic design to achieve high throughput. The video enhancement system is implemented using Xilinx's multimedia development board that contains a VirtexII-X2000 FPGA and it is capable of processing approximately 63 Mega-pixels (Mpixels) per second.
    Integration. 01/2008; 41:474-488.
  • H.T. Ngo, V.K. Asari
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a neighborhood dependent approach (NDA) for the design of a high performance and low power radix-4 booth multiplier for kernel-based operations such as 2D convolution in video processing applications. The proposed technique reduces dynamic power consumption by analyzing the bit patterns in the input data to reduce switching activities. Special values of the pixels in the video streams such as zero, repeated values or repeated bit combinations are detected and data paths in the architecture design are disabled appropriately to eliminate unnecessary switching in arithmetic units and data buses. Input pixels in the video stream are partitioned into halves to increase the possibility of detecting special values. It is observed that the proposed scheme helps to reduce operations and switching activities in the 2D convolution operations up to 46% of the switching activity rate which results in significant power reduction with low hardware overhead.
    Circuits and Systems, 2007. MWSCAS 2007. 50th Midwest Symposium on; 09/2007
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, an efficient design for the high performance, power-aware architecture to extract skinlike regions in the video stream is presented. Skin segmentation is an important step in many image processing and computer vision applications such as face detection and hand gesture recognition. The design utilizes the high correlation and similarity of neighboring pixels in video streams to reduce switching activity (hence reducing dynamic power dissipation) in the arithmetic unit. The proposed design is implemented and fitted in the Altera's Cyclone II FPGA which is available in the DE2 development and educational board. The pipelined system is capable of performing the skin segmentation procedure in real-time with a processing rate of 654 frames per second for video frames with standard size of 640*480. It is observed that the proposed design helps to reduce operations and switching activities in the processing unit up to 42 percent which results in lower dynamic power dissipation with low hardware overhead.
    Application -specific Systems, Architectures and Processors, 2007. ASAP. IEEE International Conf. on; 08/2007
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A VLSI efficient multiplier-less architecture for real-time computation of multi-dimensional convolution is presented in this paper. The new architecture performs computations in the logarithmic domain by utilizing novel multiplier-less log2 and inverse-log2 modules which are capable of converting the fraction numbers currently not available in the literature. An effective data handling strategy is developed in conjunction with the logarithmic modules to eliminate the necessity of multipliers in the architecture. The proposed approach reduces hardware resources significantly compared to other approaches maintaining a high degree of accuracy. The architecture is developed as a combined systolic-pipelined design that produces an output in every clock cycle after an initial latency of 93.19 uSec. The architecture is capable of operating with a clock frequency of 99 MHz based on Xilinx’s Virtex II 2v2000ff896-4 FPGA and the throughput of the system is observed as 99 MOPS (million outputs per second). Error analysis performed with the FPGA-based system in the image processing examples of edge detection and noise filtering shows that the proposed architecture produces outputs similar to that obtained by software simulation using Matlab.
    Microprocessors and Microsystems. 01/2007;
  • Hau T. Ngo, Vijayan K. Asari
    [Show abstract] [Hide abstract]
    ABSTRACT: Design of a low power multiply-and-accumulator (MAC) unit for video processing systems exploiting insignificant bits in pixels values and the similarity of neighboring pixels in video streams is presented in this paper. The proposed technique reduces dynamic power consumption by analyzing the bit patterns in the input data to reduce switching activities. Special values of the pixels in the video streams such as zero, one, repeated values or repeated bit combinations are detected and data paths in the architecture design are disabled appropriately to eliminate unnecessary switching in arithmetic units and data buses. It is observed that the proposed scheme helps to reduce operations and switching activities in the MAC unit up to 30% which results in lower power consumption with low hardware overhead
    Fourth International Conference on Information Technology: New Generations (ITNG 2007), 2-4 April 2007, Las Vegas, Nevada, USA; 01/2007
  • [Show abstract] [Hide abstract]
    ABSTRACT: A design of a high performance digital architecture for a nonlinear image enhancement technique is presented in this paper. The image enhancement is based on illuminance-reflectance model which improves the visual quality of digital images and video captured under insufficient or non-uniform lighting conditions [1]. Systolic, pipelined and parallel design techniques are utilized effectively in the proposed FPGA-based architectural design to achieve real-time performance. Estimation and folding techniques are used in the hardware algorithmic design to achieve faster, simpler and more efficient architecture. The video enhancement system is implemented using Xilinx's multimedia development board that contains a VirtexII-X2000 FPGA and it is capable of processing approximately 66 Mega-pixels (Mpixels) per second.
    Circuits and Systems, 2006. MWSCAS '06. 49th IEEE International Midwest Symposium on; 09/2006
  • [Show abstract] [Hide abstract]
    ABSTRACT: The concept of simultaneously processing different non-overlapping spatial regions of an image and combining the results to obtain a final image is used in this paper. We apply this concept to the domain of face recognition using Principal Component Analysis (PCA). We have shown in [1] that modular PCA improves the accuracy of face recognition when the face images have varying expression and illumination. In this work we design and implement the modular PCA algorithm for face recognition in a Field Programmable Gate Array (FPGA) environment. Since modular PCA processes non-overlapping regions of a face image to produce weight vectors, we design a parallel architecture where each parallel path has a processing element to process a predefined region of a face image. Each processing element computes a weight vector from a face image region and pre-computed eigenvectors; hence the processing element is also parallelized where each path works on one eigenvector and the face image region to compute one element in the weight vector. Each of these paths is pipelined to process the pixels from the face image region and corresponding elements from the eigenvectors in a faster manner. We name this design having pipelined parallel paths as multi-lane architecture. The architecture is able to recognize a face image from a database of 1000 face images in 11 ms.
    Microprocessors and Microsystems 01/2006; 30(4):216-224. · 0.55 Impact Factor
  • H. T. Ngo, V. K. Asari
    [Show abstract] [Hide abstract]
    ABSTRACT: The radial lens distortion correction technique based on least squares estimation corrects a distorted image by expanding it nonlinearly so that straight lines in the object space remain straight in the image space. An absolute pipelined architecture is designed to correct radial lens distortion in images by partitioning the distortion correction algorithm into four main modules. The architecture includes a COKDIC based rectangular to polar coordinate transformation module, a back mapping module for nonlinear transformation of corrected image space to distorted image space, a COKDIC based polar to rectangular coordinate transformation module, and a linear interpolation module to calculate the intensities of four pixels simultaneously in the corrected image space. The system parameters include the expanded/corrected image size, distorted image size, the back mapping coefficients, distortion center and the center of the corrected image. The hardware architecture can sustain a high throughput rate of 30 4-MegaPixel (Mpixels) frames per second (total of 120 Mpixels). The pipelined architecture design will facilitate the use of dedicated hardware that can be mounted along with the camera unit.
    Midwest Symposium on Circuits and Systems 01/2006; 2:526-530.