Alain Ketterlin's research while affiliated with University of Strasbourg and other places

Publications (32)

Conference Paper
In this paper, we propose Rec2Poly, a framework which detects automatically if recursive programs may be transformed into affine loops that are compliant with the polyhedral model. If successful, the replacing loops can then take advantage of advanced loop optimizing and parallelizing transformations as tiling or skewing. Rec2Poly is made of two m...
Article
Full-text available
The task-based approach is a parallelization paradigm in which an algorithm is transformed into a direct acyclic graph of tasks: the vertices are computational elements extracted from the original algorithm and the edges are dependencies between those. During the execution, the management of the dependencies adds an overhead that can become signifi...
Conference Paper
Loop optimizations span from vectorization, scalar promotion, loop invariant code motion, software pipelining to loop fusion, skewing, tiling and loop parallelization. These transformations are essential in the quest for automated high-performance code generation. Determining the validity of loop transformations at compile time requires analyzing a...
Article
Heterogeneous Many Cores (HMC) architectures that mix many simple/small cores with a few complex/large cores are emerging as a design alternative that can provide both fast sequential performance for single threaded workloads and power-efficient execution for through-put oriented parallel workloads. The availability of many small cores in a HMC pre...
Article
Full-text available
This paper describes an algorithm that takes a trace of a distributed program and builds a model of all communications of the program. The model is a set of nested loops representing repeated patterns. Loop bodies collect events representing communication actions performed by the various processes, like sending or receiving messages, and participat...
Conference Paper
Full-text available
X10 is a promising recent parallel language designed specifically to address the challenges of productively programming a wide variety of target platforms. The sequential core of X10 is an object-oriented language in the Java family. This core is augmented by a few parallel constructs that create activities as a generalization of the well known for...
Article
This paper deals with the binary analysis of executable programs, with the goal of understanding how they access memory. It explains how to statically build a formal model of all memory accesses. Starting with a control flow graph of each procedure, well-known techniques are used to structure this graph into a hierarchy of loops in all cases. The p...
Conference Paper
Full-text available
This paper describes a tool using one or more executions of a sequential program to detect parallel portions of the program. The tool, called Par wiz, uses dynamic binary instrumentation, targets various forms of parallelism, and suggests distinct parallelization actions, ranging from simple directive tagging to elaborate loop transformations. The...
Article
Many automatic software parallelization systems have been proposed in the past decades, but most of them are dedicated to source-to-source transformations. This paper shows that parallelizing executable programs is feasible, even if they require complex transformations, and in effect decouples parallelization from compilation, for example, for clos...
Conference Paper
Memory profiling is useful for a variety of tasks, most notably to produce traces of memory accesses for cache simulation. However, instrumenting every memory access incurs a large overhead, in the amount of code injected in the original program as well as in execution time. This paper describes how static analysis of the binary code can be used to...
Article
Full-text available
This paper describes a system that applies automatic parallelization techniques to binary code. The system works by raising x86-64 raw executable code to an intermediate representation that exhibits all memory accesses and relevant register definitions, but outlines detailed computations that are not relevant for paralleliza-tion. It then uses an o...
Article
Full-text available
Mining sequential data is an old topic that has been revived in the last decade, due to the increasing availability of sequential datasets. Most works in this field are centred on the definition and use of a distance (or, at least, a similarity measure) between sequences of elements. A measure called dynamic time warping (DTW) seems to be currently...
Article
This paper deals with the binary analysis of executable programs, with the goal of understanding how they access memory. It explains how to statically build a formal model of all memory accesses. Starting with a control-flow graph of each procedure, well-known techniques are used to structure this graph into a hierarchy of loops in all cases. The p...
Conference Paper
Full-text available
This paper describes an algorithm that takes a trace (i.e., a sequence of numbers or vectors of numbers) as input, and from that produces a sequence of loop nests that, when run, produces exactly the original sequence. The input format is suitable for any kind of program execution trace, and the output conforms to standard models of loop nests. The...
Conference Paper
Full-text available
Multi-date images present new challenges and new opportunities for image analysis. This paper considers the task of segmenting a multi-date image by clustering its pixels without requiring perfect time-based alignment. It first introduces the problem, and then proceeds with the definition of a similarity measure between sequences of observations, i...
Chapter
Most empirical learning algorithms describe objects as a list of attribute-value pairs. A flat attribute-value representation fails, however, to capture the internal structure of real objects. Mechanisms are therefore needed to represent the different levels of detail at which an object can be seen. A common structuring method is reviewed, and new...
Article
Dans cet article, nous étudions certains aspects que peut prendre le processus de classification dans le cadre des systèmes de représentation de connaissances par objets. Nous nous intéressons essentiellement aux opérations de classification de classes et d'instances. Nous évoquons ensuite trois applications particulièrement importantes dans le cad...
Article
This paper is about the unsupervised discovery of patterns in sequences of composite objects. A composite object may be described as a sequence of other, simpler data. In such cases, not only the nature of the components is important, but also the order in which these components appear. The present work studies the problem of generalizing sequences...
Article
This paper examines the task of remote-sensing image analysis as an unsupervised learning task. Images are usually (very) large, and represent complex objects. Unsupervised learning, or clustering, may be of great help at several phases of the analysis. First, this paper describes a clustering algorithm. Then, the application of this algorithm to t...
Article
Full-text available
Many machine-learning (either supervised or unsupervised) techniques assume that data present themselves in an attribute-value form. But this formalism is largely insufficient to account for many applications. Therefore, much of the ongoing research now focuses on first-order learning systems. But complex formalisms lead to high computational compl...
Article
This paper focuses on the problem of clustering sets of objects. An extension of the traditional attribute-value formalism is proposed, which conforms to well-known database modeling techniques. The mechanism described allows clustering over one-to-many relationships, i.e. objects may be represented with a variable number of components. Several exp...
Article
Full-text available
Unsupervised empirical machine learning algorithms aim at discovering useful concepts in a stream of unclassified data. Since image segmentation is a particular instance of the problem addressed by these methods, one of these algorithms has been employed to automatically segment remote-sensing images. The region under study is Nepalese Himalayas. B...
Article
Full-text available
This paper examines the task of remote-sensing image analysis as an unsupervised learning task. Images are usually (very) large, and represent complex objects. Unsupervised learning, or clustering, may be of great help at several phases of the analysis. First, this paper describes a clustering algorithm. Then, the application of this algorithm to t...
Article
Full-text available
. Most empirical learning algorithms describe objects as a list of attribute-value pairs. A flat attribute-value representation fails, however, to capture the internal structure of real objects. Mechanisms are therefore needed to represent the different levels of detail at which an object can be seen. A common structuring method is reviewed, and ne...
Article
This paper examines the problem of clustering a sequence of objects that cannot be described with a predefined list of attributes (or variables) . In many applications, such a crisp representation cannot be determined. An extension of the traditionnal propositionnal formalism is thus proposed, which allows objects to be represented as a set of comp...

Citations

... Polyhedral compilation methods have also been used for efforts on recognizing motifs, or well-known high-level computations [6]. The models have been extended beyond explicit loops to support recursive functions calls [7], or used to target emerging memory technologies by exploiting the detailed information in the model [8], [9]. ...
... The resulting scheme has also nice parallelization possibilities. In a poloidal plane, the blocktriangular linear systems resulting from the DG scheme that are well solved by an optimized task-based implementation [2,5]. In the toroidal direction, the transport equations are solved by a simple shift operator. ...
... Skipping such instructions, which may appear inside and outside loops, our method allows the reduction of the profiling overhead for a wide range of programs. Another hybrid-analysis framework was proposed by Sampaio et al. [14]. Their goal is providing theoretical and practical foundations to apply aggressive loop transformations. ...
... Crafa et al. propose a Coq formalization of a subset of the X10 and define a causality relation in [11], but only consider fork-join synchronization, and omit dynamic barrier synchronization that we formalize. Feautrier uses an informal definition of the causality relation over clock operations 1 to optimize X10 programs by reducing synchronization [12]. Tomofumi et al. use a causality relation to check for data races in the polyhedral subset of clocked X10 programs [13]. ...
... The compilers apply all the suitable optimization techniques based on the argument value. It is also possible to create more hardware specific versions from the LLVM IR [Hal+15]. The specialized version is then injected to the process using PADRONE and the value of Target Function in the corresponding table entry is changed to the specialized one as shown in Table 3. 3. ...
... Nandivada et al. 31 present techniques to reduce the overheads of X10 clock (and HJ phaser) operations by chunking parallel loops with synchronization operations. Feautrier et al. 22 propose a technique to transform code written using clocks-async-finish abstractions to code that does not use clocks. However, their scheme works for static-controlprograms; hence, it covers only for a restricted class of parallel-loops and unconditional advance operations. ...
... Fuchs et al. [20] designed a HW prefetcher for code block working sets that predicted the future memory accesses of stencil based codes. Swamy et al. [21] introduced a hardware/software framework to support efficient helper threading on heterogeneous manycores, where the helper thread would perform SW prefetching to achieve higher sequential performance for memory-intensive workloads. Zhao et al. [22] used a dynamic approach to pool memory allocations together and fine tuned the data prefetching based on the access patterns observed in the profiler. ...
... To assess the performance and accuracy of our method, we have extended the Padrone binary code modification system to implement the above analysis [7]. The main contributions of this article can be listed as: ...
... Target Architecture End-to-End Static Dynamic Multicore Hetero. SoC Tensorflow [1] Halide [2] HPVM [3] Chi et al. [4] Parwiz [5] SD3 [6] Wang et al. [7] Kremlin [8] Ours based on the state of the system resources. To harness this flexibility, programming models have been introduced where application developers or domain experts guide the compilation process by making task to PE mapping decisions based on offline profiling. ...
... Their tool, RELEASE, is a concolic execution based technique for generating inputs, whereas our approach is based on a pure static analysis technique. The work by Ketterlin and Clauss proposes a method of reducing instrumentation sites in binary tracing by introducing the idea of program skeletonization [24]. This is quite a different objective. ...