Embedded Convolutional Face Finder.
ABSTRACT In this paper, a high-level optimization methodology is applied for the implementation of the well-known Convolutional Face Finder (CFF) algorithm for real-time applications on cellular phone, such as teleconferencing, advanced user interfaces, pictures indexing and security access control. This face detector is based on a feature extraction and classification technique which consists in a pipeline of convolutions and subsampling operations. Design of embedded systems must find a good trade off between performance and code size due to the limited amount of resource available. We propose a methodology to cope with the main drawbacks of the CFF original implementation like floating- point computation and memory allocation, to allow parallelism exploitation and perform algorithm optimizations. Results show that our embedded face detection system can accurately locate faces with less computational load and memory cost. It runs on a 275MHz Starcore DSP at 9 QCIF images/s with state-of-the-art detection rates and very low false alarm rates.
Full-textDOI: · Available from: Roux Sébastien, Sep 05, 2014
[Show abstract] [Hide abstract]
ABSTRACT: In this paper, we present a parallel architecture for fast and robust face detection implemented on FPGA hardware. We propose the first implementation that meets both real-time requirements in an embedded context and face detection robustness within complex backgrounds. The chosen face detection method is the Convolutional Face Finder (CFF) algorithm, which consists of a pipeline of convolution and subsampling operations, followed by a multilayer perceptron. We present the design methodology of our face detection processor element (PE). This methodology was followed in order to optimize our implementation in terms of memory usage and parallelization efficiency. We then built a parallel architecture composed of a PE ring and an FIFO memory, resulting in a scalable system capable of processing images of different sizes. A ring of 25 PEs running at 80 MHz is able to process 127 QVGA images per second and performing real-time face detection on VGA images (35 images per second).IEEE Transactions on Circuits and Systems for Video Technology 05/2009; DOI:10.1109/TCSVT.2009.2014013 · 2.26 Impact Factor
Conference Paper: A Parallel Face Detection System Implemented on FPGA.[Show abstract] [Hide abstract]
ABSTRACT: In this paper, we introduce a methodology for designing a system for face detection and its implementation on FPGA. The chosen face detection method is the well-known convolutional face finder (CFF) algorithm, which consists in a pipeline of convolutions and subsampling operations. Our goal is to define a parallel architecture able to process efficiently this algorithm. We present a dataflow based architecture algorithm adequation (AAA) methodology implemented using the SynDEx software, in order to find the best compromise between the processing power and functionality requirement of each processor element (PE), and the efficiency of algorithm parallelization. We describe a first implementation of a PE on a Virtex 4 FPGA using the DSP48 dedicated blocks. This PE is able to run at a maximum frequency of 352 MHz and occupies only 2% of a Virtex 4 SX35 device.International Symposium on Circuits and Systems (ISCAS 2007), 27-20 May 2007, New Orleans, Louisiana, USA; 01/2007