A high speed programmable focal-plane SIMD vision chip. Analog Integrated Circuits and Signal Processing
ABSTRACT A high speed analog VLSI image acquisition and low-level image processing system is presented. The architecture of the chip is based on a dynamically reconfigurable SIMD processor array. The chip features a massively parallel architecture enabling the computation of programmable mask-based image processing in each pixel. Each pixel include a photodiode, an amplifier, two storage capacitors, and an analog arithmetic unit based on a four-quadrant multiplier architecture. A 64 × 64 pixel proof-of-concept chip was fabricated in a 0.35 μm standard CMOS process, with a pixel size of 35 μm × 35 μm. The chip can capture raw images up to 10,000 fps and runs low-level image processing at a framerate of 2,000–5,000 fps
Conference Proceeding: A 200 mW 3.3 V CMOS color camera IC producing 352×288 24 bvideo at 30 frames/s[show abstract] [hide abstract]
ABSTRACT: Recent advances in CMOS imaging technology enable the creation of single-chip digital cameras that offer system designers fully-digital interfaces, reduced part counts, and low-power dissipation. A PC-based camera system is described using a single-chip camera. The chip produces digital video data that is fed to the host over a standard interface. The host adjusts operation of the camera by setting frame rate, exposure time, analog gains, and color processing coefficients. Frame-rate functions, such as exposure control and white balance algorithms, are in software on the host, while pixel-rate tasks requiring intensive computation, such as color interpolation, color correction, and calculation of image statistics are in hardware on the camera chipSolid-State Circuits Conference, 1998. Digest of Technical Papers. 1998 IEEE International; 03/1998
Article: In-pixel autoexposure CMOS APS[show abstract] [hide abstract]
ABSTRACT: A CMOS active pixel sensor (APS) with in-pixel autoexposure and a wide dynamic-range linear output is described. The chip features a unique architecture enabling a customized number of additional bits per pixel per readout, with minimal effect on the sensor spatial or temporal resolution. By utilizing multiple readouts via real-time feedback, each pixel in the field of view can automatically set an independent exposure time, according to its illumination. A customized, large increase in the dynamic range can be achieved and a scene containing both bright and dark regions can be captured. A prototype of 64 × 64 pixels has been fabricated using 1-poly 3-metal CMOS 0.5 μm n-well process available through MOSIS. Power dissipation is 3.7 mW at V<sub>DD</sub> = 5 V. The special functions have been verified experimentally, and an increase of 2 bits over the inherent dynamic range captured is shown.IEEE Journal of Solid-State Circuits 09/2003; · 3.23 Impact Factor
Article: A 1/3'' VGA linear wide dynamic range CMOS image sensor implementing a predictive multiple sampling algorithm with overlapping integration intervals[show abstract] [hide abstract]
ABSTRACT: Many tasks performed by machine vision systems involve processing of natural scenes with large intra-frame illumination ratios. Thus, wide dynamic range visible spectrum image sensors are required to achieve adequate processing performance and reliability. An image sensor implementing an algorithm that linearly increases the illumination dynamic range of solid-state pixels is presented. Optimal exposure is achieved with a predictive pixel saturation decision that allows for multiple integration intervals of different duration to run concurrently for different pixels while keeping the sensor frame rate constant. A proof-of-concept chip was fabricated in a 0.18-μm CMOS process. Added functionality to standard imagers is mainly concentrated off-pixel so fill factor is not sacrificed. Measured data corroborates the algorithm functionality.IEEE Journal of Solid-State Circuits 10/2004; · 3.23 Impact Factor
A high speed programmable focal-plane SIMD vision chip
Dominique Ginhac Æ Æ Je ´ro ˆme Dubois Æ Æ
Barthe ´le ´my Heyrman Æ Æ Michel Paindavoine
Received: 24 March 2008/Revised: 9 March 2009/Accepted: 28 May 2009/Published online: 5 July 2009
? Springer Science+Business Media, LLC 2009
and low-level image processing system is presented. The
architecture of the chip is based on a dynamically recon-
figurable SIMD processor array. The chip features a mas-
sively parallel architecture enabling the computation of
programmable mask-based image processing in each pixel.
Each pixel include a photodiode, an amplifier, two storage
capacitors, and an analog arithmetic unit based on a four-
quadrant multiplier architecture. A 64 9 64 pixel proof-of-
concept chip was fabricated in a 0.35 lm standard CMOS
process, with a pixel size of 35 lm 9 35 lm. The chip can
capture raw images up to 10,000 fps and runs low-level
image processing at a framerate of 2,000–5,000 fps.
A high speed analog VLSI image acquisition
SIMD ? High-speed image processing ?
Analog arithmetic unit
CMOS image sensor ? Parallel architecture ?
Today, improvements in the growing digital imaging world
continue to be made with two main image sensor tech-
nologies: charge coupled devices (CCD) and CMOS sen-
sors. The continuous advances in CMOS technology for
processors and DRAMs have made CMOS sensor arrays a
viable alternative to the popular CCD sensors. This led to
the adoption of CMOS image sensors in several high-vol-
ume products, such as webcams, mobile phones, PDAs for
example. New technologies provide the potential for
integrating a significant amount of VLSI electronics into a
single chip, greatly reducing the cost, power consumption,
and size of the camera [1–4]. By exploiting these advan-
tages, innovative CMOS sensors have been developed and
have demonstrated fabrication cost reduction, low power
consumption, and size reduction of the camera [5–7].
The main advantage of CMOS image sensors is the
flexibility to integrate processing down to the pixel level.
As CMOS image sensors technologies scale to 0.18 lm
processes and under, processing units can be realized at
chip level (system-on-chip approach), at column level by
dedicating processing elements to one or more columns, or
ar pixel-level by integrating a specific unit in each pixel or
local of neighboring pixels. Most of the researches deals
with chip and column-level [8–11]. Indeed, pixel-level
processing is generally dismissed because pixel sizes are
often too large to be of practical use. However, as CMOS
scales, integrating a processing element at each pixel or
group of neighboring pixels becomes feasible. This offers
the opportunity to increase quality of imaging in terms of
resolution, noise for example by integrating specific pro-
cessing functions such as correlated double sampling ,
anti blooming , high dynamic range , and even all
basic camera functions (color processing functions, color
correction, white balance adjustment, gamma correction)
ontothe same camera-on-chip
employing a processing element per pixel offers the ability
to exploit the high speed imaging capabilities of the CMOS
technology by achieving massively parallel computations
[16–22]. Komuro et al.  describe a new vision chip
architecture for high-speed target tracking based on hard-
ware implementation of bit-serial and cumulative summa-
tion circuits. Rodriguez-Vasquez et al. [17, 18] present a
chip based on arrays of mixed-signal processing elements
conceived to cover the early stages of the visual processing
D. Ginhac (&) ? J. Dubois ? B. Heyrman ? M. Paindavoine
LE2I—Universite ´ de Bourgogne, Aile des Sciences de
l’Inge ´nieur, BP 47870, 21078 Dijon Cedex, France
Analog Integr Circ Sig Process (2010) 65:389–398
path in a fully-parallel manner. Lindgren et al. 
presents a multiresolution general-purpose high-speed
machine vision sensor with on-chip image processing
capabilities dedicated to high-speed multisense imaging.
Sugiyama et al.  have developed a specific imager
performing both target tracking within a 512 9 512-pixel
entire image area and acquisition of partial images simul-
taneously and independently. Dudek and Hicks 
describes a smart-sensor VLSI circuit suitable for focal-
plane low-level image processing applications which is
characterized by small cell area, low power dissipation and
the ability to execute a variety of image processing algo-
rithms in real-time. Miao et al.  present a program-
mable vision chip for real-time vision applications based
on a pixel processing element array and row-parallel pro-
cessors, able to implement mathematical morphology
algorithms such as erosion and dilatation.
In this paper, we discuss hardware implementation issues
of a high speed CMOS imaging system with per-pixel
image processing. Embedding low-level tasks at focal plane
is quite interesting for several aspects. First, the key features
are the capability to operate in accordance with the princi-
ples of single instruction multiple data (SIMD) computing
architectures . This enables massively parallel compu-
tations with processing times, independent of the resolution
of the sensor. This leads to high framerates up to thousands
of images per second, with a rather low power consumption
[23–25]. Secondly, embedding hardware processing oper-
ators, along with the sensor’s array, enables to remove the
classical input output bottleneck between the sensor and the
external processors in charge of processing the pixel values.
This can benefit the implementation of new complex
applications at standard rates and can also improve the
performance of existing video applications such as motion
vector estimation [26, 27], multiple capture with dynamic
range [28, 29], and pattern recognition .
To sump up, we designed, fabricated, and tested a proof-
of-concept 64 9 64 pixel CMOS analog sensor with per-
pixel programmable processing element in a standard
0.35 lm double-poly quadruple-metal CMOS technology.
The analog processing operators are fully programmable
devices by dynamic reconfiguration, They can be viewed as
a software-programmable image processor dedicated to
low-level image processing. The main objectives of our
design are: (1) to evaluate the potential for high speed
snap-shot imaging and, in particular, to reach a 10,000 fps
rate, (2) to demonstrate a versatile and reconfigurable
processing unit at pixel-level, and (3) to provide an original
platform for experimenting with low-level image process-
ing algorithms that exploit high-speed imaging.
The rest of the paper is organized as follows. The main
characteristics of the sensor architecture are described in
the Sect. 2. The Sect. 3 talks about the design of the circuit,
with a full description of the photodiode structure, the
embedded analog memories, and the arithmetic unit. In the
Sect. 4, we describe the test hardware platform and the chip
characterization results. Finally, some experimental results
of high speed image acquisition with pixel-level processing
are described in the last section of this paper.
This paper is an extended and complementary version of
a preliminary paper  dedicated to fundamental theo-
retical aspects and specificities of our image sensor. In this
new paper, focus has been made on image processing and
the development and implementation of various low level
image processing applications on the chip.
2 Description of the architecture
The proof-of-concept chip presented in this paper is
depicted in Fig. 1. The core includes a two-dimensional
array of 64 9 64 identical processing elements (PE). Each
PE follows the SIMD computing paradigm and is able to
convolve the pixel value issued from the photodiode by
applying a set of mask coefficients to the image pixel
values located in a small neighborhood. The key idea is
that a global control unit can dynamically reconfigure the
convolution kernel masks and then implements the most
part of low-level image processing algorithms [17, 18].
This confers the functionality of programmable processing
devices to the PEs embedded in the circuit. Each individual
PE includes the following elements:
–a photodiode dedicated to the optical acquisition of the
visual information and the light-to-voltage transduction,
two analog memory, amplifier and multiplexer struc-
tures called [AM]2, which serve as intelligent pixel
memories and are able to dissociate the acquisition of
the current frame in the first memory and the process-
ing of the previous frames in the second memory,
an Analog arithmetic unit named A2U based on four
analog multipliers, which performs the linear combi-
nation of the four adjacent pixels using a 2 9 2
In brief, each PE includes 38 transistors integrating all
the analog circuitry dedicated to the image processing
algorithms. The global size of the PE is 35 lm 9 35 lm
(1,225 lm2). The active area of the photodiode is 300 lm2,
giving a fill-factor of 25%. The chip has been realized in a
standard 0.35 lm double-poly quadruple-metal CMOS
technology and contains about 160,000 transistors on a
3.67 mm 9 3.77 mm die (13.83 mm2). The chip also
contains test structures on the bottom left of the chip. These
structures are used for detailed characterization of the
photodiodes and processing units.
390Analog Integr Circ Sig Process (2010) 65:389–398
3 Circuit design
3.1 Pixel structure
Each pixel in the CMOS image sensor array includes a
photodiode and a processing unit dedicated to low-level
image processing based on neighborhoods. In our chip, the
type of photodiodes is one of the simplest photo element in
CMOS image sensor technology, i.e. N-type photodiodes
based on an n?-type diffusion in a p-type silicon substrate.
In order to achieve good performances, the photodiodes
have been designed and optimized carefully, in order to
minimize critical parameters such as the dark current and
the spectral response . Moreover, the shape and the
layout of photodiode have significant influences on the
performance of the whole imager [33, 34]. The active area
of the photodiode absorbs the illumination energy and turns
that energy into charge carriers. This active area must be
large as possible in order to absorb a maximum of photons.
In the mean time, the control circuitry required for the
readout of the collected charges and the inter-element
isolation area must be as small as possible in order to
obtain the best fill factor. We have theoretically analyzed,
designed and benchmarked different photodiodes shapes
, and finally, an octagonal shape based on 45? struc-
tures was chosen (see Fig. 1).
The second part of the pixel is the analog processing unit,
dedicated to the implementation of various in situ image
processing using local neighborhoods. This forces a re-
thinking of the spatial distribution of the processing
resources, so that each computational unit can easily use a
programmable neighborhood of pixels. For this purpose, the
pixels are mirrored about the horizontal and the vertical axis
As example, a block of 2 9 2 pixels is depicted in Fig. 1.
Such a distribution optimizes the compactness of the metal
interconnections between pixels, giving a better fill factor.
3.2 Analog memory, amplifier and multiplexer [AM]2
In order to increase the algorithmic possibilities of the
architecture, one potential solution is the separation of
the acquisition of the light inside the photodiode and the
readout of the stored value at pixel-level . One of the
main advantages of such structures is that the capture
sequence can be made in the first memory in parallel with a
readout sequence and/or processing sequence of the previ-
ous image stored in the second memory, as shown in Fig. 2.
Fig. 1 Overview of the image sensor with a processor-per-pixel array
Fig. 2 Parallelism between capture sequence and readout sequence
Analog Integr Circ Sig Process (2010) 65:389–398391
Such a strategy has several advantages:
1.The framerate can be increased (up to 29) without
reducing the exposure time,
The image acquisition is time-decorrelated from image
processing, implying that the architecture performance
is always the highest, and the processing framerate is
A new image is always available without spending any
So, for each pixel, we have designed and implemented
two specific circuits called analog memory, amplifier, and
multiplexer [AM]2, as shown in Fig. 3. The system has four
successive operation modes: acquisition, storage, amplifi-
cation, and readout. All these phases are externally con-
trolled by global signals common to the full array of pixels.
In each pixel, the photosensor is associated with a PMOS
transistor reset. This switch resets the pixel to the fixed
voltage Vdd. The pixel array is held in the reset mode until
the init signal raises, turning the PMOS transistor off.
Then, the photodiode discharges for a fixed period,
according to the incidental luminous flow. The first NMOS
transistor acts as a voltage follower, producing the voltage
Vph, directly proportional to the incident light intensity.
The integrated voltage is biased around Vdd/2 by the second
NMOS transistor, with a positive reference bias voltage
(Vbias= 1.35 V). Following the acquisition stage, two
identical subcircuits [AM]i
storage phase of Vph. Each [AM]2includes three pairs of
NMOS and PMOS transistors and a capacitor which acts as
an analog memory. The subcircuit [AM]i
the stisignal is turned on. Then, the associated analog
switch is open allowing the charge of the corresponding Ci
capacitor with a voltage level reflecting the integrated
photocurrent. Consequently, the capacitors are able to store
the pixel value during the frame capture from one of the
two switches. The capacitors are implemented with double-
polysilicon. The size of the capacitors is as large as
2(with i = 1, 2) realize the
2is selected when
possible in order to respect the fill-factor and the pixel size
requirements. The capacitors values are about 40 fF. They
are able to store the pixel value for 20 ms with an error
lower than 4%. Behind the storage subcircuit, a basic
CMOS inverter is integrated. This inverter serves as a
linear high-gain amplifier around Vdd/2 with a gain of 12.
Finally, the last phase consists in the readout of the stored
values in the capacitors Ci, through one of the two
switches, controlled by the risignals.
3.3 Analog arithmetic unit (A2U)
Our analog arithmetic unit (A2U) is able to perform con-
volution of the pixels with a 2 9 2 dynamic kernel. This
unit is based on four-quadrant analog multipliers [36, 37]
named M1, M2, M3, and M4, as illustrated in Fig. 4. Each
A2U includes only 22 transistors leading to a relative small
area, simplicity. Each multiplier Mi (with i ¼ 1;...;4)
takes two analog signals Vi1and Vi2and produces an output
ViSwhich is their product. The outputs of multipliers are all
interconnected with a diode-connected transistor employed
as load. Consequently, the global operation result at the VS
point is a linear combination of the four products ViS.
Image processing operations such as spatial convolution
can be easily performed by connecting the inputs Vi1to the
kernel coefficients and the inputs Vi2to the corresponding
Considering the MOS transistors operating in sub-
threshold region, the output node ViSof a multiplier can be
expressed as a function of the two inputs Vi1and Vi2as
kr VThN? Vi1
¼ Vi1? ViS? VThN
with krrepresents the transconductance factor, VThNand
VThPare the threshold voltage for the NMOS and PMOS
ðÞ Vi1? Vi2? VThN
Þ Vi2? ViS? VThN? VThP
Fig. 3 The [AM]2structure
Fig. 4 The A2U structure
392Analog Integr Circ Sig Process (2010) 65:389–398
transistors. Around the operating point (Vdd/2), the
variations of the output node mainly depend on the
product Vi1Vi2. So, the Eq. 1 can be simplified and
finally, the output node ViScan be expressed as a simple
first-order of the two input voltages Vi1and Vi2.
2VThNþ VThP? 8:07=V
The value of the coefficient M gives a primordial
importance to the term Vi1Vi2in Eq. 1, limiting the impact
of second-order products. Consequently, the output ViS
mainly depends on the input values Vi1and Vi2around the
operating point Vdd/2. This leads to a good linearity of our
multiplier design integrating only five transistors.
4 Chip characterization
An experimental 64 9 64 pixel image sensor has been
developed in a 0.35 lm, 3.3 V, standard CMOS process
with poly–poly capacitors. Its functional testing and its
characterization were performed using a specific hardware
platform. The hardware part of the imaging system con-
tains a one million Gates Spartan-3 FPGA board with
32MB SDRAM embedded. This FPGA board is the XSA-
3S1000 from XESS Corporation. An interface acquisition
circuit includes three ADC from Analog Device (AD9048),
high speed LM6171 amplifiers and others elements such as
the motor lens. Figure 5 shows the schematic and some
pictures of the experimental platform.
4.1 Electrical characterization
The sensor was quantitatively tested for conversion gain,
sensitivity, fixed pattern noise, thermal reset noise, output
levels disparities, voltage gain of the amplifier stage, linear
flux, and dynamic range. Table 1 summarizes these imag-
ing sensor characterization results.
To determine these values, the sensor included specific
test pixels in which some internal node voltages can be
directly read. The test equipment hardware is based on a
light generator with wavelength of 400–1100 nm. The
sensor conversion gain was evaluated to 54 lV/e-RMS
with a sensitivity of 0.15 V/lux.s, thanks to the octagonal
shape of the photodiode and the fill factor of 25%. At
10,000 fps, measured non-linearity is 0.12% over a 2 V
range. These performances are similar to the sensor
described in . According to the experimental results,
the voltage gain of the amplifier stage of the two [AM]2is
Av = 12 and the disparities on the output levels are about
4.2 Fixed pattern noise
Image sensors always suffer from technology related
nonidealities that can limit the performances of the vision
system. Among them, fixed pattern noise (FPN) is the
Fig. 5 Block diagram and
pictures of the hardware
platform including FPGA board
and CMOS sensor
Analog Integr Circ Sig Process (2010) 65:389–398393
variation in output pixel values, under uniform illumina-
tion, due to device and interconnect mismatches across the
image sensor. FPN can be reduced by implementing cor-
related double sampling (CDS). To implement CDS, each
pixel output needs to be read twice, after reset and at the
end of integration time. The correct pixel signal is obtained
by subtracting the two values. A CDS can be easily
implemented in our chip. For this purpose, the first analog
memory stores the pixel value just after the reset signal and
the second memory stores the value at the end of integra-
tion. Then, at the end of the image acquisition, the two
values can be transfered to the FPGA, responsible for
producing the difference.
In Fig. 6, the two images show fixed pattern noise with
and without CDS using a 1-ms integration time. On the left
image, the FPN (225 lV RMS) is mainly due to the ran-
dom variations in the offset voltages of the pixel-level
analog structures. On the right picture, the FPN has been
reduced by a factor of 34–6.6 lV after an analog CDS,
performed as described above.
5 High-speed image processing applications
5.1 Sample images
The prototype chip was used for acquisition of raw images.
First, sample raw images of stationary scenes were cap-
tured at different framerates, as shown in Fig. 7. In the
three views, no image processing is performed on the video
stream, except for amplification of the photodiodes signal.
From left to right, we can see a human face obtained at
1,000 fps, a static electric fan at 5,000 fps, and a electronic
chip at 10,000 fps. The exploitation of high FPS capability
with the 64 9 64-pixel sensor is obtained with a simple
sequencer in charge of the transfer of analog pixel values to
external ADCs. For bigger sensors, it could be achieved
with the integration of a dedicated output module able to
cope with a gigapixel per second bandwidth. Another
possible solution is to assemble 64 9 64-pixel modules
with a dedicated output bus for each of them.
Figure 8 represents different frames of a moving object,
namely, a milk drop splashing sequence. In order to capture
the details of such a rapidly moving scene, the sensor
operates at 2,500 fps and stores a sequence of 50 images.
The frames 1, 5, 10, 15, 20, 25, 30 and 40 is shown in the
5.2 Sobel operator
The Sobel operator estimates the gradient of a 2D image. It
is used for edge detection in the preprocessing stage of
computer vision systems. The classical algorithm is based
on a pair of 3 9 3 convolution kernels (see Eq. 3), one to
detect changes along the vertical axis (h1) and another to
detect horizontal contrast (h2). For this purpose, the algo-
rithm performs a convolution between the image and the
sliding convolution mask over the image. It manipulates 9
Table 1 Chip measurements
Conversion gain54 lV/e-RMS
225 lV RMS
68 lV RMS
Thermal reset noise
Output levels disparities
Amplifier gain 12
Linear flux 98.5%
Dynamic range68 dB
Fig. 6 Images of fixed pattern noise a without CDS and b with CDS
for an integration time of 1 ms
Fig. 7 Various raw images
acquisition at 1,000, 5,000 and
394Analog Integr Circ Sig Process (2010) 65:389–398
pixels for each value to produce. The value corresponds to
an approximation of the gradient centered on the processed
The structure of our architecture is well-adapted to the
evaluation of the Sobel algorithm. It leads to the result
directly centered on the photo-sensor and directed along
the natural axes of the image. The gradient is computed in
each pixel of the image by performing successive linear
combinations of the four adjacent pixels. For this purpose,
each 3 9 3 kernel mask is decomposed into two 2 9 2
masks that successively operate on the whole image. For
the kernel h1, the corresponding 2 9 2 masks are:
The Fig. 9 represents the 3 9 3 mask centered on the
pixel ph5. Each octagonal photodiode phi(i ¼ 1;...;9) is
associated with a processing element PEi, represented with
a circle on the figure. Each PEiis positioned on the bottom
right of its photodiode, as in the real layout of the circuit
(see Fig. 1). The first mask m1contributes to evaluate the
following series of operations for the four PEis:
V11¼ ?ðVph1þ Vph4Þ
V12¼ ?ðVph2þ Vph5Þ
V14¼ ?ðVph4þ Vph7Þ
and the second mask m2computes:
Fig. 8 A 2,500 fps video
sequence of a milk drop
Analog Integr Circ Sig Process (2010) 65:389–398395
V21¼ þðVph2þ Vph5Þ
V22¼ þðVph3þ Vph6Þ
V24¼ þðVph5þ Vph8Þ
V25¼ þðVph6þ Vph9Þ
with Vijcorresponding to the result provided by the pro-
cessing element PEj (j ¼ 1;2;...;9) with the mask mi
(i = 1, 2), and Vphk(K ¼ 1;2;...;9), the voltages repre-
senting the incidental illumination on each photodiode phk.
Then, the evaluation of the gradient at the center of the
mask can be computed by summing the different values on
the external FPGA. Note that V12= -V21 and V15=
-V24. So, the final sum can be simplified and written as
Vh1= V11? V22? V25? V14. If we define a retina cycle
as the time spent for the configuration of the coefficients
kernel and the preprocessing of the image, the evaluation
of the gradient on the vertical direction only spends a frame
acquisition and two retina cycles. By generalization, the
estimation of the complete gradient along the two axis
spends four cycles because it involves four dynamic
In short, the dynamic assignment of coefficient values
from the external processor gives the system some inter-
esting dynamic properties. The system can be easily
reconfigured by changing the internal coefficients for the
masks between two successive computations. First, this
allows the possibility to dynamically change the image
processing algorithms embedded in the sensor. Secondly,
this enables the evaluation of some complex pixel-level
algorithms, implying different successive convolutions.
The images can be captured at higher framerates than the
standard framerate, processed by exploiting the analog
memories and the reconfigurable processing elements and
output at a lower framerate depending of the number of the
dynamic reconfigurations. Moreover, the analog arithmetic
units implementing these pixel-level convolutions drasti-
cally decrease the number of single operations such as
additions and multiplications executed by an external
processor (a FPGA in our case) as shown in Fig. 5. Indeed,
in the case of our experimental 64 9 64 pixel sensor, the
peak performance is equivalent to four parallel signed
multiplications by pixel at 10,000 fps, i.e. more than 160
million multiplications per second. With a VGA resolution
(640 9 480), the performance level would increase to a
factor of 75, leading to about 12 billion multiplications per
second. Processing this data flow by external processors
will imply important hardware resources in order to cope
with the temporal constraints.
As an illustration of the Sobel algorithm, Fig. 10 is an
example sequence of 16 images of a moving object,
namely, an electric fan. Two white specific markers are
placed on the fan, i.e. a small circle near the rotor and a
painted blade. The speed rotation of the fan is 3,750 rpm.
In order to capture such a rapidly moving object, a short
integration time (100 ls) was used for the frames acqui-
sition. The Sobel algorithm allows to distinguish clearly the
two white markers even with a high framerate.
6 Conclusion and perspectives
An experimental pixel sensor implemented in a standard
digital CMOS 0.35 lm process has been described in this
paper. The architecture of the chip is based on a dynami-
cally reconfigurable SIMD processor array, featuring a
massively parallel architecture dedicated to programmable
low-level image processing. Each 35 lm 9 35 lm pixel
contains 38 transistors implementing a circuit with photo-
current integration, two [AM]2and an A2U. A 64 9 64
pixel proof-of-concept chip was fabricated. A dedicated
Fig. 9 3 9 3 kernel used by the four processing elements
Fig. 10 Sequence of 16 images with Sobel operator
396Analog Integr Circ Sig Process (2010) 65:389–398
embedded platform including FPGA and ADCs has also
been designed to evaluate the vision chip.
Experimental results reveal that raw image acquisition
at 10,000 fps can be easily achieved using the parallel A2U
implemented at pixel-level. With basic image processing,
the maximal framerate slows down to about 5,000 fps. The
potential for dynamic reconfiguration of the sensor was
also demonstrated in the case of the Sobel operator.
The next step in our research will be the design of a
similar circuit in a modern 130 nm CMOS technology with
pixel size less than 10 lm 9 10 lm. In order to evaluate
this future chip in some realistic conditions, we would like
to design a CIF sensor (352 9 288 pixels), which leads to a
3.2 mm 9 2.4 mm in a 130 nm technology. In the same
time, we will focus on the development of a fast analog to
digital converter (ADC). The integration of this ADC on
future chips will allow us to provide new and sophisticated
vision systems on chip (ViSOC) dedicated to digital
embedded image processing at thousands of frames per
1. Fossum, E. (1993). Active pixel sensors: Are CCDs dinosaurs?
International Society for Optical Engineering (SPIE), 1900,
2. Fossum, E. (1997). CMOS image sensor: Electronic camera on a
CHIP. IEEE Transactions on Electron Devices, 44(10), 1689–
3. Seitz, P. (2000). Solid-state image sensing. Handbook of com-
puter Vision and Applications, 1, 165–222.
4. Litwiller, D. (2001). CCD vs. CMOS: Facts and fiction. Pho-
tonics Spectra, 35, 154–158.
5. Aw, C. H., & Wooley, B. (1996). A 128 9 128-pixel standard-
cmos image sensor with electronic shutter. IEEE Journal of Solid
State Circuits, 31(12), 1922–1930.
6. Loinaz, M., Singh, K., Blanksby, A., Inglis, D., Azadet, K., &
Ackland, B. (1998). A 200 mV 3.3 V CMOS color camera IC
producing 352 9 288 24-b video at 30 frames/s. IEEE Journal of
Solid-State Circuits, 33(12), 2092–2103.
7. Smith, S., Hurwitz, J., Torrie, M., Baxter, D., Holmes, A.,
Panaghiston, M., Henderson, R., Murrayn, A., Anderson, S., &
Denyer P. (1998). A single-chip 306 9 244-pixel CMOS NTSC
video camera. In ISSCC digest of technical papers, San Fransi-
sco, CA (pp. 170–171).
8. Yadid-Pecht, O., & Belenky, A. (2003). In-pixel autoexposure
CMOS APS. IEEE Journal of Solid-State Circuits, 38(8), 1425–
9. Acosta-Serafini, P., Ichiro, M., & Sodini, C. (2004). ‘‘A 1/3’’
VGA linear wide dynamic range CMOS image sensor imple-
menting a predictive multiple sampling algorithm with overlap-
ping integration intervals. IEEE Journal of Solid-State Circuits,
10. Kozlowski, L., Rossi, G., Blanquart, L., Marchesini, R., Huang,
Y., Chow, G., Richardson, J., & Standley, D. (2005). Pixel noise
suppression via SoC management of target Reset in a 1920
9 1080 CMOS image sensor. IEEE Journal of Solid-State Cir-
cuits, 40(12), 2766–2776.
11. Sakakibara, M., Kawahito, S., Handoko, D., Nakamura, N.,
Higashi, M., Mabuchi, K., & Sumi, H. (2005). A high-sensitivity
CMOS image sensor with gain-adaptative column amplifiers.
IEEE Journal of Solid-State Circuits, 40(5), 1147–1156.
12. Nixon, R. H., Kemeny, S. E., Staller, C. O., & Fossum, E. R.
(1995). 128 9 128 CMOS photodiode-type active pixel sensor
with on-chip timing, control, and signal chain electronics. In M.
M. Blouke (ed.), Proceedings of the SPIE charge-coupled devices
and solid state optical sensors V, April (Vol. 2415, pp. 117–123).
13. Wuu, S., Chien, H., Yaung, D., Tseng, C., Wang, C., Chang, C.,
& Hsaio Y. (2001). A high performance active pixel sensor with
0.18 lm CMOS color imager technology. In Electron devices
meeting, 2001. IEDM technical digest. International (pp. 555–
14. Decker, S., McGrath, D., Brehmer, K., & Sodini, C. (1998). A
256 9 256 CMOS imaging array with wide dynamic range pixels
and column-parallel digital output. IEEE Journal of Solid-State
Circuits, 33(12), 2081–2091.
15. Yoon, K., Kim, C., Lee, B., & Lee, D. (2002). Single-chip cmos
image sensor for mobile applications. IEEE Journal of Solid-
State Circuits, December.
16. Komuro, T., Ishii, I., Ishikawa, M., & Yoshida, A. (2003). A
digital vision chip specialized for high-speed target tracking.
IEEE Transactions on Electron Devices, 50(1), 191–199.
17. Cembrano, G., Rodriguez-Vazquez, A., Galan, R., Jimenez-
Garrido, F., Espejo, S., & Dominguez-Castro, R. (2004). A 1000
FPS at 128 9 128 vision processor with 8-bit digitized I/O. IEEE
Journal of Solid-State Circuits, 39(7), 1044–1055.
18. Rodriguez-Vasquez, G., Cembrano, A., Carranza, L., Roca-Mo-
reno, E., Carmona, R., Jimenez-Garrido, F., Dominguez-Castro,
R., & Meana, S. (2004). Ace16k: The third generation of mixed-
signal SIMD-CNN ACE chips toward VSoCs. IEEE Transactions
on Circuits and Systems I: Regular Papers, 51(5), 851–863.
19. Lindgren, L., Melander, J., Johansson, R., & Mller, B. (2005). A
multiresolution 100-GOPS 4-Gpixels/s programmable smart
vision sensor for multisense imaging. IEEE Journal of Solid-State
Circuits, 40(6), 1350–1359.
20. Sugiyama, Y., Takumi, M., Toyoda, H., Mukozaka, N., Ihori, A.,
kurashina, T., Nakamura, Y., Tonbe, T., & Mizuno, S. (2005). A
high-speed CMOS image with profile data acquiring function.
IEEE Journal of Solid-State Circuits, 40, 2816–2823.
21. Dudek, P., & Hicks, P. (2005). A general-purpose processor-per-
pixel analog SIMD vision chip. IEEE Transactions on Circuits
and Systems I: Regular Papers, 52(1), 13– 20.
22. Miao, W., Lin, Q., Zhang, W., & Wu, N. (2008). A program-
mable SIMD vision chip for real-time vision applications. Solid-
State Circuits, IEEE Journal of, 43(6), 1470–1479.
23. Krymski, A., Van Blerkom, D., Andersson, A., Bock, N., Mans-
oorian, B., & Fossum, E. (1999). A high speed, 500 frames/s, 1024
9 1024 CMOS active pixel sensor. In VLSI circuits, 1999. Digest
of technical papers. 1999 symposium on (pp. 137–138).
24. Stevanovic, N., Hillebrand, M., Hosticka, B., & Teuner A.
(2000). A CMOS image sensor for high-speed imaging. In Solid-
state circuits conference, 2000. Digest of technical papers. IS-
SCC. 2000 IEEE international (Vol. 449, pp. 104–105).
25. Kleinfelder, S., Lim, S., Liu, X., & El Gamal, A. (2001). A 10000
frames/s CMOS digital pixel sensor. IEEE Journal of Solid-State
Circuits, 36(12), 2049–2059.
26. Handoko, D., Takokoro, K. S, Y., Kumahara, M., & Matsuzawa,
A. (2000). A CMOS image sensor for local-plane motion vector
estimation. In Symposium of VLSI circuits, June (Vol. 3650, pp.
Analog Integr Circ Sig Process (2010) 65:389–398397
27. Lim, S., & El Gamal, A. (2001). Integrating image capture and
processing—beyond single chip digital camera. In Proceedings of
the SPIE electronic imaging ’2001 conference, January, San Jose,
CA (Vol. 4306).
28. Yang, D., El Gamal, A., Fowler, B., & Tian, H. (1999). A 640
9 512 CMOS image sensor with ultra wide dynamix range
floating-point pixel-level ADC. IEEE Journal of Solid-State
Circuits, 34, 1821–1834.
29. Yadid-Pecht, O., & Fossum, E. (1999). CMOS APS with auto-
scaling and customized wide dynamic range. In IEEE workshop
on charge-coupled devices and advanced image sensors, June
(Vol. 3650, pp. 48–51).
30. Wu, C.-Y., & Chiang, C.-T. (2004). A low-photocurrent CMOS
retinal focal-plane sensor with a pseudo-BJT smoothing network
and an adaptative current Schmitt trigger for scanner applications.
IEEE Sensors Journal, 4(4), 510–518.
31. Dubois, J., Ginhac, D., Paindavoine, M., & Heyrman, B. (2008).
A 10000 fps cmos sensor with massively parallel image pro-
cessing. IEEE Journal of Solid-State Circuits, 43(3), 706.
32. Wu, C., Shih, Y., Lan, J., Hsieh, C., Huang, C., & Lu, J. (2004).
Design, optimization, and performance analysis of new photo-
diode structures for CMOS active-pixel-sensor (APS) imager
applications. IEEE Sensors Journal, 4(1), 135–144.
33. Shcherback, I., Belenky, A., & Yadid-Pecht, O. (2002). Empirical
dark current modeling for complementary metal oxide semicon-
ductor active pixel sensor. Optical Engineering, 41(6), 1216–
34. Shcherback, I., & Yadid-Pecht, O. (2003). Photoresponse analysis
and pixel shape optimization for CMOS active pixel sensors.
IEEE Transactions on Electron Devices, 50(1), 12–18.
35. Chapinal, G., Bota, S., Moreno, M., Palacin, J., & Herms, A.
(2002). A 128 9 128 CMOS image sensor with analog memory
for synchronous image capture. IEEE Sensors Journal, 2(2), 120–
36. Ryan, C. (1970). Applications of a four-quadrant multiplier. IEEE
Journal of Solid-State Circuits, 5(1), 45–48.
37. Liu, S., & Hwang, Y. (1995). CMOS squarer and four-quadrant
multiplier. IEEE Transactions on Circuits and Systems-I:Fun-
damental Theory and Applications, 42(2), 119–122.
his PhD in electronics and
image processing from Cler-
France, in 1999. He is currently
associate professor at the Uni-
versity of Burgundy, France and
member of LE2I UMR CNRS
5158 (Laboratory of Electronic,
Computing and Imaging Sci-
ences). His main research topics
embedded image processing on
CMOS VLSI chips.
Je ´ro ˆme Dubois is a Normalien
of the 2001 promotion. He
obtained a comptitive examina-
tion, in Electrical Engineering,
for post on the teaching staff of
first cycle universities, in July
Degree, in Image Processing, in
June 2005. He is currently
Mph.D student and Instructor-
ship at Laboratory LE2I and
University of Burgundy. His
research interests include the
mentation, and testing of silicon retinas for multi-processing and high
speed image sensor.
Barthe ´le ´my Heyrman received
his PhD in electronics and
image processing from Bur-
gundy University, France, in
2005. He is currently associate
professor at the University of
Burgundy, France and member
of LE2I UMR CNRS 5158
Computing and Imaging Sci-
ences). His main research topics
are system on chip smart camera
and embedded image processing
his PhD in electronics and sig-
nal processing from Montpellier
University, France, in 1982. He
was with Fairchild CCD Com-
pany for two years as an engi-
sensors. He joined Burgundy
University in 1985 as maitre de
confe ´rence and is currently full
professor at LE2I UMR-CNRS,
Laboratory of Electronic, Com-
puting and Imaging Sciences,
Burgundy University, France.
His main research topics are
image acquisition and real-time image processing. He is also one of
the main managers of ISIS (a research group in signal and image
processing of the French National Scientific Research Committee).
398 Analog Integr Circ Sig Process (2010) 65:389–398