-
[show abstract]
[hide abstract]
ABSTRACT: It is well known that image processing requires a huge amount of computation, mainly at low level processing where the algorithms are dealing with a great number of data-pixel. One of the solutions to estimate motions involves detection of the correspondences between two images. For normalised correlation criteria, previous experiments shown that the result is not altered in presence of nonuniform illumination. Usually, hardware for motion estimation has been limited to simple correlation criteria. The main goal of this paper is to propose a VLSI architecture for motion estimation using a matching criteria more complex than Sum of Absolute Differences (SAD) criteria. Today hardware devices provide many facilities for the integration of more and more complex designs as well as the possibility to easily communicate with general purpose processors.
Oceans 2005 - Europe; 07/2005
-
[show abstract]
[hide abstract]
ABSTRACT: Many multimedia and telecommunications applications are modeled as multi-rate, parallel data flow systems. We present techniques to model and schedule such applications using structured systems of recurrence equations. We show that the schedule can be obtained first by computing the period of each component of the system, then by applying structured scheduling to the entire system. This method is implemented in the MMAlpha software, and it is applied to model a WCDMA uplink receiver.
Application-Specific Systems, Architectures and Processors, 2004. Proceedings. 15th IEEE International Conference on; 10/2004
-
[show abstract]
[hide abstract]
ABSTRACT: The goal of the MOVIE very large-scale integration chip is to
facilitate the development of software-only solutions for real-time
video processing applications. This chip can be seen as a building block
for single-instruction, multiple-data processing, and its architecture
has been designed so as to facilitate high-level language programming.
The basic architecture building block associates a subarray of
computation processors with an I/O processor. A module can be seen as a
small linear, systolic-like array of processing elements, connected at
each end to the I/O processor. The module can communicate with its two
nearest neighbors via two communication ports. The chip architecture
also includes three 16-bit video ports. One important aspect in the
programming environment is the C-stolic programming language. C stolic
is a C-like language augmented with parallel constructs, which allow the
differentiation between the array controller variables (scalar
variables) and the local variables in the array structure (systolic
variables). A statement operating on systolic variables implies a
simultaneous execution on all the cells of the structure. Implementation
examples of MOVIE-based architectures dealing with video compression
algorithms are given
IEEE Transactions on Circuits and Systems for Video Technology 10/1999; · 1.65 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: This paper introduces a flexible code generation framework
dedicated to the design of application specific programmable processors.
This tool allows the user to build specific compilation flows, using a
library of modules, implementing flexible compilation passes such as
code generation, resource allocation, scheduling, etc. Retargeting is
performed at two levels: minor changes in the target processor
architecture are handled by a retargeting of the modules of the defined
compilation flow, while major modifications require a structural
modification of the flow. To build a compiler for a target processor,
the user selects modules from the library, and links them together.
While the global compiler structure is user-defined, the retargeting of
modules is automatically performed by the framework. Target processors
are described using ARMOR, a programmable processor modeling language
especially defined for design space exploration. The proposed tool is
then suitable for a large range of instruction set architectures
Hardware/Software Codesign, 1999. (CODES '99) Proceedings of the Seventh International Workshop on; 02/1999
-
[show abstract]
[hide abstract]
ABSTRACT: This paper shows how a real-time simulator of moving pictures compression algorithms can be rapidly assembled using a basic building block, here called MOVIE (MOdule for Video Experimentation). The internal architecture of the MOVIE VLSI chip can be compared to a small systolic machine made of a 32-bit I/O processor, a reduced linear array of 16-bit computation processors and data video input/output mechanisms. Externally, the chip is provided with four 16-bit bidirectional data ports and three 16-bit bidirectional data video port. Several MOVIE chips can be easily clustered to allow the size of the linear array of computation processors to be increased. The MOVIE chip is fully programmable in a high level language in order to make program developments easier
Application Specific Array Processors, 1995. Proceedings., International Conference on; 08/1995
-
[show abstract]
[hide abstract]
ABSTRACT: We present a new algorithmic framework which enables making a full
use of the large potential of data parallelism available on 2D processor
arrays for the implementation of nonlinear multigrid relaxation methods.
This framework leads to fast convergence towards quasi-optimal
solutions. It is demonstrated on two different low-level vision
applications
Computer Vision and Pattern Recognition, 1994. Proceedings CVPR '94., 1994 IEEE Computer Society Conference on; 07/1994
-
[show abstract]
[hide abstract]
ABSTRACT: : Many tasks in computer vision and image analysis have recently been expressed as the minimization of global energy functions describing the local interactions between the observed data and the images features to be extracted. For the minimization of these global (often non-linear) energy functions, a variety of stochastic or deterministic relaxation algorithms have been described in the literature. Besides, multigrid techniques have significantly improved the convergence rate of iterative relaxation schemes. However, the major drawback of (iterative) relaxation algorithms remains the amount of computation required to update the image. For real world applications the computation time quickly becomes prohibitive on workstations. On the other hand, the computations involved by these algorithms are regular and local, and lead naturally to massive data parallelism, which is well suited for parallel processing on array processor architectures. Standard parallel implementations of relaxati...
06/1994;
-
[show abstract]
[hide abstract]
ABSTRACT: : Many tasks in computer vision and image analysis have recently been expressed as the minimization of global energy functions describing the local interactions between the observed data and the images features to be extracted. For the minimization of these global (often nonlinear) energy functions, a variety of stochastic or deterministic non-linear relaxation algorithms have been described in the literature. The major drawback of relaxation algorithms remains the amount of computation required to update the image. For real world applications the computation time quickly becomes prohibitive on workstations. Several efficient approaches have been proposed to alleviate this computational burden. Among them, multigrid techniques [7, 23, 43]. have shown to significantly improve the convergence rate of linear and non-linear relaxation schemes. It is also well known that the computations involved by these algorithms are regular and local, and lead naturally to massive data parallelism, whic...
06/1994;
-
[show abstract]
[hide abstract]
ABSTRACT: Systolic arrays for two connected speech recognition methods are presented. The first method is based on the dynamic time warping algorithm which is applied directly on acoustic feature patterns. The second method is the probabilistic matching algorithm which requires that the input sentence be preprocessed by a phonetic analyzer. It is shown that both methods may be implemented on either a two-dimensional or a linear systolic array. Advantages of each of these implementations are discussed. The architecture of a 12 000 transistors programmable NMOS prototype IC, which can be used as the basic processor of the probabilistic matching systolic arrays, is presented.
IEEE Transactions on Acoustics Speech and Signal Processing 09/1986;
-
[show abstract]
[hide abstract]
ABSTRACT: Les méthodes d'optimisation globale (relaxation déterministe ou stochastique), généralement utilisées dans le cadre de la théorie des champs markoviens, restent coûteuses dans un certain nombre d'applications mais possèdent des propriétés qui les rendent massivement parallélisables. Nous proposons dans cet article trois schémas généraux de parallélisation de tels algorithmes qui, s'appuyant sur une analyse multiéchelle de l'image, se révèlent particulièrement efficaces comparés aux techniques parallèles classiques utilisées dans ce contexte.
14° Colloque sur le traitement du signal et des images, 1993 ; p. 1108-1111.