Nelson Luiz Passos

University of Notre Dame, South Bend, Indiana, United States

Are you Nelson Luiz Passos?

Claim your profile

Publications (11)0 Total impact

  • Source
    Nelson Luiz Passos, Edwin Hsing-mean Sha, Steven C. Bass
    [Show abstract] [Hide abstract]
    ABSTRACT: Transformation techniques are usually applied to get optimal execution rates in parallel and/or pipeline systems. The retiming technique is a common and valuable tool in one-dimensional problems, represented by Data Flow Graphs (DFGs) such as DSP filters, which can maximize the parallelism of a loop body represented by a DFG. Since most scientific or DSP applications are recursive or iterative, to increase the parallelism of the loop body can substantially decrease the overall computation time. Few results on retiming have been obtained for multi-dimensional problems. The previous result of multi-dimensional retiming is only applied to a restricted class of Data Flow Graphs in which every total delay vector in a cycle has to be strictly non-negative. This paper develops a novel retiming technique that considers the final schedule as part of the process. To authors' knowledge, this is the first retiming algorithm for general multi-dimensional Data Flow Graphs. The description and the correctness of our algorithm are presented in the paper. Through the experiments, results have shown that our algorithm runs efficiently. Some DSP filters are used in the paper as an example of the application of our algorithm. 1
    09/2000;
  • Source
    Nelson Luiz Passos
    [Show abstract] [Hide abstract]
    ABSTRACT: Innovative areas of Computer Science such as artificial intelligence and massive parallel processing are almost becoming extinct. Industry research focuses on very short-term goals that do not allow the freedom of long-term scientific research. Consequently, Computer Science professionals are not motivated to create optimized solutions or to search for more advanced and efficient techniques. This paper describes a research methodology that improves the motivational aspect of participation of undergraduate students in new discoveries. Such methodology is now being applied in a study focused in developing new computer architecture features and supporting theory. This creative experience aims to the preparation of future Computer Science professionals for a more innovative and successful career in the industry or on pursue of an advanced degree. This paper reports a successful experience in this area, with a promising future, showing how the students were selected, motivated and coached to achieve the research goals.
    01/1999;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: VLSI circuit manufacturing may result in devices with different propagation delays. Hence, the estimation of such delays during the design procedure may not prove totally accurate due to the fabrication process. This paper presents a new optimization methodology, called probabilistic retiming, which transforms a circuit based on statistical data gathered from the production history. Such circuits are modeled as graphs where each vertex represents a combinational element that has a probabilistic timing characteristic. A polynomial-time algorithm, applicable to such a graph, is developed which retimes a circuit in order to produce a design operating on a specified cycle time c within a given confidence level `. In other words, the clock cycle of the retimed circuit is guaranteed to be less than or equal to c with at least probability `. Experiments show the effectiveness of the algorithm, subject to the designer requirements and to the manufacturing information, which is able to signif...
    05/1998;
  • Source
    Nelson Luiz Passos, Edwin Hsing-mean Sha
    [Show abstract] [Hide abstract]
    ABSTRACT: Two-dimensional filters are commonly used in digital image processing applications. These filters have the characteristic of processing recursive sets of instructions requiring high computational speed. In this paper, these sets are modeled as cyclic two-dimensional data flow graphs, which are also used to represent the equivalent circuit design. In this new method, such graphs are submitted to a multidimensional retiming in order to reduce their cycle time. Such a reduction can achieve a cycle equal to the longest atomic operation in the filter, by inserting a fixed number of registers, independent of the size of the problem, into the circuit paths. Examples, description and the correctness of our algorithm are presented in the paper. 1. INTRODUCTION Digital image signal processing applications such as high definition television (HDTV) and medical imaging devices are known to require high computing power. Such computation intensive applications usually depend on time critical sectio...
    04/1997;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Most scientific and Digital Signal Processing (DSP) applications are recursive or iterative. Transformation techniques are usually applied to get optimal execution rates in parallel and/or pipeline systems. The retiming technique is a common and valuable transformation tool in one-dimensional problems, when loops are represented by data flow graphs (DFGs). In this paper, uniform nested loops are modeled as multidimensional data flow graphs (MDFGs). Full parallelism of the loop body, i.e., all nodes in the MDFG executed in parallel, substantially decreases the overall computation time. It is well known that, for one-dimensional DFGs, retiming can not always achieve full parallelism. Other existing optimization techniques for nested loops also can not always achieve full parallelism. This paper shows an important and counter-intuitive result, which proves that we can always obtain full-parallelism for MDFGs with more than one dimension. This result is obtained by transforming the MDFG in...
    04/1997;
  • Source
    Ted Zhihong Yu, Nelson Luiz Passos, Edwin Hsing-mean Sha
    [Show abstract] [Hide abstract]
    ABSTRACT: Loop pipelining is an effective technique to explore parallelism found in program loops, when designing high performance application specific systems. However, branches within loops may degrade the performance of pipelined architectures. It is an open question how applications with multiple branches can be loop pipelined in the most efficient way. This paper presents properties, theories, and experiments of a new loop pipelining algorithm, called Branch Anticipation. This new method, based on the rotation technique, carries out conditional resource sharing and reduces additional hardware requirements incurred by branches within the loops. The optimization may require propagation of additional branch decision signals along the schedule. We show that hardware constraints for such propagation must be considered during the scheduling process. We further demonstrate that the method is practical and achieves the same schedule length as the regular rotation scheduling with a smaller number of...
    06/1996;
  • Source
    Qingyan Wang, Nelson Luiz Passos, Edwin Hsing-mean Sha
    [Show abstract] [Hide abstract]
    ABSTRACT: Massive uniform nested loops are broadly used in scientific and multi-dimensional Digital Signal Processing applications. Due to the amount of data handled by such applications, cache or on-chip memory are required to improve the data access and overall system performance. Most of existing application specific systems do not efficiently optimize the access to different levels of memory hierarchy. In this study, a static data scheduling method, carrot-hole data scheduling, is proposed for multi-dimensional applications represented by multi-dimensional data flow graphs, in order to control the data traffic between different levels of memory. Based on this data schedule, optimal partitioning and scheduling are selected. The partition size is also chosen in such a way to minimize memory access overhead. Experiments show that by using this technique, on-chip memory misses are significantly reduced as compared to results obtained from traditional methods. The carrot-hole data scheduling meth...
    05/1996;
  • Source
    Nelson Luiz Passos, Edwin Hsing-mean Sha, Steven C. Bass
    [Show abstract] [Hide abstract]
    ABSTRACT: Multi-dimensional (MD) systems are widely used in scientific applications such as image processing, geophysical signal processing and fluid dynamics. Earlier scheduling methods in synthesizing MD systems do not explore loop pipelining across different dimensions. This paper explores the basic properties of MD loop pipelining and presents an algorithm, called multi-dimensional rotation scheduling, to find an efficient schedule based on the multidimensional retiming technique we developed. The description and the correctness of our algorithm are presented in the paper. The experiments show that our algorithm can achieve optimal results efficiently. 1 Introduction Computation intensive applications usually depend on time-critical sections consisting of a loop of instructions. To optimize the execution rate of such applications, the designer needs to explore the parallelism embedded in repetitive patterns of a loop. However, the existence of resource constraints makes the problem of sche...
    04/1996;
  • Source
    Nelson Luiz Passos, Edwin Hsing-mean Sha
    [Show abstract] [Hide abstract]
    ABSTRACT: Transformation techniques are usually applied to get optimal execution rates in parallel and/or pipeline systems. The retiming technique is a common and valuable tool in onedimensional problems. Most scientific or DSP applications are recursive or iterative. Uniform nested loops can be modeled as multi-dimensional data flow graphs (DFGs). To achieve full parallelism of the loop body, i.e., all the computational nodes executed in parallel, substantially decreases the overall computation time. It is well known that for one-dimensional DFGs retiming can not always achieve full parallelism. This paper shows an important and counter-intuitive result, which proves that we can always obtain full-parallelism for DFGs with more than one dimension. It also presents two novel multi-dimensional retiming techniques to obtain full parallelism. Examples, description and the correctness of our algorithms are presented in the paper. 1 Introduction Applications such as image processing, fluid mechani...
    04/1996;
  • [Show abstract] [Hide abstract]
    ABSTRACT: The design of multi-dimensional systems using hardware/software codesign allows a significant improvement in the development cycle. This paper presents a technique that enables a design to have arbitrarily high throughput by using multi-dimensional retiming techniques while adjusting the composition of hardware and multiple software elements in order to satisfy the area requirements. A multi-dimensional graph representing the problem is transformed and scheduled such that all nodes are executed in a fully parallel way. The techniques presented are applicable to any problem which can be represented as a multi-dimensional data flow graph. Results are shown which illustrate the efficiency of the system as well as the savings achieved.
    01/1996;
  • Source
    Michael Sheliga, Nelson Luiz Passos, Edwin Hsing-Mean Sh
    [Show abstract] [Hide abstract]
    ABSTRACT: Applications such as image processing, fluid mechanics, and weather forecasting require high computer performance. Researchers and designers in those areas are looking for solutions to multi-dimensional problems through the use of parallel computers and/or specialized hardware. It is known that in highly parallel computers communication is often the limiting execution speed bottleneck. While the problem of calculating and minimizing communication costs due to loop data dependencies has been widely studied, such research has involved changing the way iterations are partitioned, not modifying the graph or loop data dependencies themselves. This paper uses algorithms that minimize loop data communication for multi-dimensional graphs by modifying the structure of the input graph and changing the distribution of the loop dependencies using multi-dimensional retiming. These algorithms are extended to large graphs and compared for effectiveness. Results are shown which illustrate the effici ency of the algorithms as well as the savings achieved for large graphs.