About
147
Publications
22,819
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,039
Citations
Introduction
Bruno Raffin is currently the leader of the DataMove team National Institute for Research in Computer Science and Control.
His current focus include in sItu and stream processing, sensitivity analysis and data assimilation, dynamics parallel data structures and task programming.
I have no time to answer all article requests. All my publications are available to anyone on HAL or my personal web page. Please go there to get the full papers.
Current institution
Publications
Publications (147)
Artificial intelligence is transforming scientific computing with deep neural network surrogates that approximate solutions to partial differential equations (PDEs). Traditional off-line training methods face issues with storage and I/O efficiency, as the training dataset has to be computed with numerical solvers up-front. Our previous work, the Me...
The spatiotemporal resolution of Partial Differential Equations (PDEs) plays important roles in the mathematical description of the world's physical phenomena. In general, scientists and engineers solve PDEs numerically by the use of computationally demanding solvers. Recently, deep learning algorithms have emerged as a viable alternative for obtai...
Particle filters are a group of algorithms to solve inverse problems through statistical Bayesian methods when the model does not comply with the linear and Gaussian hypothesis. Particle filters are used in domains like data assimilation, probabilistic programming, neural networkoptimization, localization and navigation. Particle filters estimate t...
Numerical simulations are ubiquitous in science and engineering. Machine learning for science investigates how artificial neural architectures can learn from these simulations to speed up scientific discovery and engineering processes. Most of these architectures are trained in a supervised manner. They require tremendous amounts of data from simul...
Numerical simulations are ubiquitous in science and engineering. Machine learning for science investigates how artificial neural architectures can learn from these simulations to speed up scientific discovery and engineering processes. Most of these architectures are trained in a supervised manner. They require tremendous amounts of data from simul...
Prediction of chaotic systems relies on a floating fusion of sensor data (observations) with a numerical model to decide on a good system trajectory and to compensate non-linear feedback effects. Ensemble-based data assimilation (DA) is a major method for this concern depending on propagating an ensemble of perturbed model realizations. In this pap...
Multi-run numerical simulations using supercomputers are increasingly used by physicists and engineers for dealing with input data and model uncertainties. Most of the time, the input parameters of a simulation are modeled as random variables, then simulations are run a (possibly large) number of times with input parameters varied according to a sp...
A widening performance gap is separating CPU performance and IO bandwidth on large scale systems. In some fields such as weather forecast and nuclear fusion, numerical models generate such amounts of data that classical post hoc processing is not feasible anymore due to the limits in both storage capacity and IO performance. In situ approaches are...
The ubiquity of fluids in the physical world explains the need to accurately simulate their dynamics for many scientific and engineering applications. Traditionally, well established but resource intensive CFD solvers provide such simulations. The recent years have seen a surge of deep learning surrogate models substituting these solvers to allevia...
The ubiquity of fluids in the physical world explains the need to accurately simulate their dynamics for many scientific and engineering applications. Traditionally, well-established but resource-intensive CFD solvers provide such simulations. Recent years have seen a surge of deep learning surrogate models substituting these solvers to alleviate t...
Prediction of chaotic systems relies on a floating fusion of sensor data (observations) with a numerical model to decide on a good system trajectory and to compensate nonlinear feedback effects. Ensemble-based data assimilation (DA) is a major method for this concern depending on propagating an ensemble of perturbed model realizations.In this paper...
In situ analysis and visualization have mainly been applied to the output of a single large-scale simulation. However, topics involving the execution of multiple simulations in supercomputers have only received minimal attention so far. Some important examples are uncertainty quantification, data assimilation, and complex optimization. In this posi...
Regardless of its origin, in the near future the challenge will not be how to generate data, but rather how to manage big and highly distributed data to make it more easily handled and more accessible by users on their personal devices. VELaSSCo (Visualization for Extremely Large-Scale Scientific Computing) is a platform developed to provide new vi...
The classical approach for quantiles computation requires availability of the full sample before ranking it. In uncertainty quantification of numerical simulation models, this approach is not suitable at exascale as large ensembles of simulation runs would need to gather a prohibitively large amount of data. This problem is solved thanks to an on-t...
With the goal of performing exascale computing, the importance of input/output (I/O) management becomes more and more critical to maintain system performance. While the computing capacities of machines are getting higher, the I/O capabilities of systems do not increase as fast. We are able to generate more data but unable to manage them efficiently...
Apache Hadoop is a widely used MapReduce framework for storing and processing large amounts of data. However, it presents some performance issues that hinder its utilization in many practical use cases. Although existing alternatives like Spark or Hama can outperform Hadoop, they require to rewrite the source code of the applications due to API inc...
In this paper, an on-line parallel analytics framework is proposed to process and store in transit all the data being generated by a Molecular Dynamics (MD) simulation run using staging nodes in the same cluster executing the simulation. The implementation and deployment of such a parallel workflow with standard HPC tools, managing problems such as...
Quantiles are important order statistics for analysis tasks such as outlier detection or computation of non-parametric confidence intervals. Quantiles being order statistics, the classical approach for their computation requires availability of the full sample before ranking it. This approach is not suitable at exascale. Large ensembles would need...
CALL FOR PAPERS
High Performance Machine Learning Workshop - HPML 2018
https://hpml2018.github.io/
To be held in conjunction with the 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2018)
September 24th, 2018 - Lyon, France
This workshop is intended to bring together the Machine Learning (ML), Artif...
The in situ paradigm proposes to co-locate simulation and analytics on the same compute node to analyze data while still resident in the compute node memory, hence reducing the need for post-processing methods. A standard approach that proved efficient for sharing resources on each node consists in running the analytics processes on a set of dedica...
Global sensitivity analysis is an important step for analyzing and validating numerical simulations. One classical approach consists in computing statistics on the outputs from well-chosen multiple simulation runs. Simulation results are stored to disk and statistics are computed postmortem. Even if supercomputers enable to run large studies, scien...
In situ workflows contain tasks that exchange messages composed of several data fields. However, a consumer task may not necessarily need all the data fields from its producer. For example, a molecular dynamics simulation can produce atom positions, velocities, and forces; but some analyses require only atom positions. The user should decide whethe...
Parallelizing industrial simulation codes like the EUROPLEXUS software dedicated to the analysis of fast transient phenomena, is challenging. In this paper we focus on the efficient parallelization on shared memory node coupling. We propose to have each thread gather the data it needs for processing a given iteration range, before to actually advan...
In situ processing proposes to reduce storage needs and I/O traffic by processing results of parallel simulations as soon as they are available in the memory of the compute processes. We focus here on computing in situ statistics on the results of N simulations from a parametric study. The classical
approach consists in running various instances of...
V oronoi diagrams are fundamental data structures in computational geometry, with applications in such areas as physics-based simulations. For non-Euclidean distances, the Voronoi diagram must be performed over a grid-graph, where the edges encode the required distance information. Th e major bottleneck in this case is a shortest path algorithm tha...
VELaSSCo (Visual Analysis for Extremely Large-Scale Scientific Computing) is an
EC FP7 project involving a consortium of seven European partners (Fig. 1).
VELaSSCo aims to provide new visual analysis methods for large-scale
simulations serving the petabyte era.
The main output of the project is the VELaSSCo platform which has been
designed a...
Over the past few years, the increasing amounts of data produced by large-scale simulations have motivated a shift from traditional offline data analysis to in situ analysis and visualization. In situ processing began as the coupling of a parallel simulation with an analysis or visualization library, motivated primarily by avoiding the high cost of...
Numerical simulations using supercomputers are producing an ever growing amount of data. Efficient production and analysis of these data are the key to future discoveries. The In-Situ paradigm is emerging as a promising solution to avoid the I/O bottleneck encounter on the file system for both the simulation and the analytics by treating the data a...
The PetaFlow application aims to contribute to the use of high performance computational resources forthe benefit of society. To this goal the emergence of adequate information and communication technologies withrespect to high performance computing-networking-visualisation and their mutual awareness is required. Thedeveloped technology and algorit...
In this paper, we present a comparison of scheduling strategies for heterogeneous multi-CPU and multi-GPU architectures. We designed and evaluated four scheduling strategies on top of XKaapi runtime: work stealing, data-aware work stealing, locality-aware work stealing, and Heterogeneous Earliest-Finish-Time (HEFT). On a heterogeneous architecture...
While studied over several decades, the computation of boolean operations on polyhedra is almost always addressed by focusing on the case of two polyhedra. For multiple input polyhedra and an arbitrary boolean operation to be applied, the operation is decomposed over a binary CSG tree, each node being processed separately in quasilinear time. For l...
Voronoi diagrams are fundamental data structures in computational geometry with applications on different areas. Recent soft object simulation algorithms for real time physics engines require the computation of Voronoi diagrams over 3D images with non-Euclidean distances. In this case, the computation must be performed over a graph, where the edges...
High performance computing systems are today composed of tens of thousands of processors and deep memory hierarchies. The next generation of machines will further increase the unbalance between I/O capabilities and processing power. To reduce the pressure on I/Os, the in situ analytics paradigm proposes to process the data as closely as possible to...
The amount of data generated by molecular dynamics simulations of large molecular assemblies and the sheer size and complexity of the systems studied call for new ways to analyse, steer and interact with such calculations. Traditionally, the analysis is performed off-line once the huge amount of simulation results have been saved to disks, thereby...
Combining molecular dynamics simulations with user interaction would have various applications in both education and re- search. By enabling interactivity the scientist will be able to visualize the experiment in real time and drive the simulation to a desired state more easily. However, interacting with systems of interesting size requires signifi...
This paper presents preliminary performance comparisons of parallel applications developed natively for the Intel Xeon Phi accelerator using three different parallel programming environments and their associated runtime systems. We compare Intel OpenMP, Intel CilkPlus and XKaapi together on the same benchmark suite and we provide comparisons betwee...
The paper presents X-KAAPI, a compact runtime for multicore architectures that brings multi parallel paradigms (parallel independent loops, fork-join tasks and dataflow tasks) in a unified framework without performance penalty. Comparisons on independent loops with OpenMP and on dense linear algebra with QUARK/PLASMA confirm our design decisions. A...
Nowadays shared memory HPC platforms expose a large number of cores organized in a hierarchical way. Parallel application programmers struggle to express more and more fine-grain parallelism and to ensure locality on such NUMA platforms. Independent loops stand as a natural source of parallelism. Parallel environments like OpenMP provide ways of pa...
The Petaflow project aims to contribute to the use of high performance computational resources to the benefit of society. To this goal the emergence of adequate information and communication technologies with respect to high performance computing-networking-visualisation and their mutual awarness is required. The developed technology and algorithms...
Moving the simulation results produced by thousands of computing cores to the scientist office is not anymore an option. Remote visualization proposes to perform heavy duty postprocessing tasks at the computing center while transferring images to the scientist. To be effective, such environement needs to be flexible and interactive. Data loading fr...
In this paper, we present a specification of a RESTful based networking interface for the efficient exchange and manipulation of visual computing resources. It is designed to include web applications by using modern web-technology such as Typed Arrays and WebSockets. The specification maps internal structures and data containers to two types, Eleme...
Most recent HPC platforms have heterogeneous nodes composed of multi-core CPUs and accelerators, like GPUs. Programming such nodes is typically based on a combination of OpenMP and CUDA/OpenCL codes; scheduling relies on a static partitioning and cost model. We present the XKaapi runtime system for data-flow task programming on multi-CPU and multi-...
Scientific simulations produce more and more memory consuming datasets. The required processing resources need to keep pace with this increase. Though parallel visualization algorithms with strong performance gains have been developed, there is a need for a parallel programming environment tailored for scientific visualization algorithms that would...
Neighbor identification is the most computationally intensive step in particle based simulations. To contain its cost, a common approach consists in using a regular grid to sort particles according to the cell they belong to. Then, neighbor search only needs to test the particles contained in a constant number of cells. During the simulation, a usu...
In this paper, we present two different FlowVR systems aiming to render remote data using a Particle-based Volume Renderer (PBVR). The huge size of irregular volume datasets has always been one of the major obstacles in the field of scientific visualization. We developed an application with the software FlowVR, using its functionalities of "modules...
The benefit of using the discrete element method (DEM) for simulations of fracture in heterogeneous media has been widely highlighted. However, modelling large structure leads to prohibitive computations times. We propose to use graphics processing units (GPUs) to reduce the computation time, taking advantage of the highly data parallel nature of D...
Ray casting on graphics processing units (GPUs) opens new possibilities for molecular visualization. We describe the implementation and calculation of diverse molecular representations such as licorice, ball-and-stick, space-filling van der Waals spheres, and approximated solvent-accessible surfaces using GPUs. We introduce HyperBalls, an improved...
Networked virtual environments like Second Life enable distant people to meet for leisure as well as work. But users are represented through avatars controlled by keyboards and mouses, leading to a low sense of presence especially regarding body language. Multi-camera real-time 3D modeling offers a way to ensure a significantly higher sense of pres...
Reordering instructions and data layout can bring significant performance improvement for memory bounded applications. Parallelizing
such applications requires a careful design of the algorithm in order to keep the locality of the sequential execution. In
this paper, we aim at finding a good parallelization of memory bounded applications on multico...
Today, it is possible to associate multiple CPUs and multiple GPUs in a single shared memory architecture. Using these resources efficiently in a seamless way is a challenging issue. In this paper, we propose a parallelization scheme for dynamically balancing work load between multiple CPUs and GPUs. Most tasks have a CPU and GPU implementation, so...
The Petaflow project aims to contribute to the use of high performance computational resources to the benefit of society. To this goal the emergence of adequate information and communication technologies with respect to high performance computing-networking-visualisation and their mutual awareness is required. The developed technology and algorithm...
The applications of virtual and augmented reality require high performance equipment normally associated with high costs,
inaccessible for small organizations and educational institutions. By adding computing power and storage capacity of several
PCs interconnected by a network we have an alternative of high performance and low cost. The objective...
Reordering instructions and data layout can bring significant performance improvement for memory bounded applications. Parallelizing such applications requires a careful design of the algorithm in order to keep the locality of the sequential execution. On one hand, parallel computation tends to create concurrent tasks that work on independent data...
This paper focuses on the design of high performance VR applications. These applications usually involve various I/O devices and complex simulations. A parallel architecture or grid infrastructure is required to provide the necessary I/O and processing capabilities. Developing such applications faces several difficulties, two important ones being s...
This paper proposes to revisit isosurface extraction algorithms taking into consideration two specific aspects of recent multicore architectures: their intrinsic parallelism associated with the presence of multiple computing cores and their cache hierarchy that often includes private caches as well as caches shared between all cores. Taking advanta...
We present a multicamera real-time 3D modeling system that aims at enabling new immersive and interactive environments. This system, called Grimage, allows to retrieve in real-time a 3D
mesh of the observed scene as well as the associated textures. This information enables a strong visual presence of the user into virtual worlds. The 3D shape infor...
The Vgate project introduces a new type of immersive environment that allows full-body immersion and interaction with virtual worlds. The project is a joint initiative between computer scientists from research teams in computer vision, parallel computing and computer graphics at the INRIA Grenoble Rhone-Alpes, and the 4D View Solutions company.
This project associates multi-camera 3D modeling, physical simulation, and tracked head-mounted displays for a strong full-body immersion and presence in virtual worlds. Three-dimensional modeling is based on the EPHV algorithm, which provides an exact geometrical surface according to input data. The geometry enables computation of full-body collis...
We present a framework for new 3D tele-immersion applications that allows collaborative and remote 3D interactions. This framework is based on a multiple-camera platform that builds, in real-time, 3D models of users. Such models are embedded into a shared virtual environment where they can interact with other users or purely virtual objects. 3D mod...
Real-time multi-camera 3D modeling provides full-body geometric and photometric data on the objects present in the acquisition space. It can be used as an input device for rendering textured 3D models, and for computing interactions with virtual objects through a physical simulation engine. In this paper we present a work in progress to build a col...
In this paper we propose a parallelization of interactive physical simulations. Our approach relies on a task parallelism where the code is instrumented to mark tasks and shared data between tasks, as well as parallel loops even if they have dynamics conditions. Prior to running a simulation step, we extract a task dependency graph that is partitio...
Interactions are a key part of Virtual Reality systems and can lead to complex software assembly for multi-modal and multi-site collaborative environments. This is even harder, when each participant is interacting in the same virtual world by very different hardware and software capabilities. This paper outlines a software architecture and interact...
La puissance de calcul disponible poursuit sa progression exponentielle mais en offrant plus de parallélisme. Cette progression de la puissance disponible peut être mise à profit pour rendre interactifs certains calculs. La structure et les objectifs de l'application diffèrent alors sensiblement de ceux du calcul intensif traditionnel. Le rôle de l...
Real-time multi-camera 3D modeling provides full-body geometric and photometric data on the objects present in the acquisition space. It can be used as an input device for rendering textured 3D models, and for computing interactions with virtual objects through a physical simulation engine. In this paper we present a work in progress to build a col...
This paper focuses on parallel interactive applications ranging from scientific visualization, to virtual reality or computational
steering. Interactivity makes them particular on three main aspects: they are endlessly iterative, use advanced I/O devices,
and must perform under strong performance constraints (latency, refresh rate). A data flow gra...
In the late 90’s, the emergence of high-performance 3D commodity graphics cards paved the way to the use of PC clusters for high-performance Virtual Reality (VR) applications. Today PC clusters are broadly used to drive multi-projector immersive environments, among other high-performance VR tasks such as tracking and sound synthesis. This survey pr...
One important bottleneck when visualizing large data sets is the data transfer between processor and memory. Cache-aware (CA) and cache-oblivious (CO) algorithms take into consideration the memory hierarchy to design cache efficient al-gorithms. CO approaches have the advantage to adapt to unknown and varying memory hierar-chies. Recent CA and CO a...
This paper presents a new approach to collision detection and modeling between deformable volumetric bod- ies. It allows deep intersections while alleviating the difficulties of distance field update. A ray is shot from each surface vertex in the direction of the inward normal. A collision is detected when the first intersection be- longs to an inward...
La puissance de calcul disponible poursuit sa progression exponentielle mais en offrant plus de parallélisme. Cette progression de la puissance disponible peut être mise à profit pour rendre interactifs certains calculs. La structure et les objectifs de l'application diffèrent alors sensiblement de ceux du calcul intensif traditionnel. Le rôle de l...
This paper focuses on parallel interactive applications ranging from scientific visualization, to virtual reality or computational
steering. Interactivity makes them particular on three main aspects: they are endlessly iterative, use advanced I/O devices,
and must perform under strong performance constraints (latency, refresh rate). In this paper,...
Grimage glues multi-camera 3D modeling, physical simulation and parallel execution for a new immersive experience. Put your hands or any object into the interaction space. It is instantaneously mod- eled in 3D and injected into a virtual world populated with solid and soft objects. Push them, catch them and squeeze them.
Physically-based computer graphics offers the potential of achieving high-fidelity virtual environments in which the propagation of light in real environments is accurately simulated. However, such global illumination computations for even simple scenes ...
We present a parallel octree carving algorithm applied to real time 3D modeling from multiple video streams. Our contribution is to propose a parallel adaptive algorithm for high performance width-first octree computation. It enables to stop the algorithm at anytime while ensuring a balanced octree exploration
This paper introduces a dynamic work balancing algorithm, based on work stealing, for time-constrained parallel octree carving. The performance of the algorithm is proved and confirmed by experimental results where the algorithm is applied to a real-time 3D modeling from multiple video streams. Compared to classical work stealing, the proposed algo...
This paper presents an approach to recover body mo- tions from multiple views using a 3D skeletal model. It takes, as input, foreground silhouette sequences from multi- ple viewpoints, and computes, for each frame, the skeleton pose which best fit the body pose. Skeletal models encode mostly motion information and allows therefore to separate motio...
In the late 90s the emergence of high performance 3D commodity graphics cards opened the way to use PC clusters for high performance Virtual Reality (VR) applications. Today PC clusters are broadly used to drive multi projector immersive environments. In this paper, we survey the different approaches that have been developed to use PC clusters for...
We propose in this article a classification of the different notions of hybridization and a generic framework for the automatic hybridization of algorithms. Then, we detail the results of this generic framework on the example of the parallel solution of multiple linear systems.
In this paper, we present a scalable architecture to compute, visualize and interact with 3D dynamic models of real scenes. This architecture is designed for mixed reality applications requiring such dynamic models, tele-immersion for instance. Our system consists in 3 main parts: the acquisition, based on standard firewire cameras; the computation...
a) (b) (c) (d) (e) (f) Figure 1: Coupling multiple codes such as a rigid body simulation (a) and a fluid solver (b) enables to build complex worlds (c)-(d). Different distribution and parallelization approaches can next be applied to achieve real-time user interactions (e)-(f). ABSTRACT We present a novel software framework for developing highly an...
Existing parallel or remote rendering solutions rely on communicating pixels, OpenGL commands, scene-graph changes or application-specific data. We propose an intermediate solution based on a set of independent graphics primitives that use hardware shaders to specify their visual appearance. Compared to an OpenGL based approach, it reduces the comp...