Ciprian Docan

Rutgers, The State University of New Jersey, New Brunswick, New Jersey, United States

Are you Ciprian Docan?

Claim your profile

Publications (26)4.16 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Complex coupled multiphysics simulations are playing increasingly important roles in scientific and engineering applications such as fusion, combustion, and climate modeling. At the same time, extreme scales, increased levels of concurrency, and the advent of multicores are making programming of high-end parallel computing systems on which these simulations run challenging. Although partitioned global address space (PGAS) languages attempt to address the problem by providing a shared memory abstraction for parallel processes within a single program, the PGAS model does not easily support data coupling across multiple heterogeneous programs, which is necessary for coupled multiphysics simulations. This paper explores how multiphysics-coupled simulations can be supported by the PGAS programming model. Specifically, in this paper, we present the design and implementation of the XpressSpace programming system, which extends existing PGAS data sharing and data access models with a semantically specialized shared data space abstraction to enable data coupling across multiple independent PGAS executables. XpressSpace supports a global-view style programming interface that is consistent with the PGAS memory model, and provides an efficient runtime system that can dynamically capture the data decomposition of global-view data-structures such as arrays, and enable fast exchange of these distributed data-structures between coupled applications. In this paper, we also evaluate the performance and scalability of a prototype implementation of XpressSpace by using different coupling patterns extracted from real world multiphysics simulation scenarios, on the Jaguar Cray XT5 system at Oak Ridge National Laboratory. Copyright © 2013 John Wiley & Sons, Ltd.
    Concurrency and Computation Practice and Experience 03/2014; 26(3). · 0.85 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Emerging scientific application workflows are composed of heterogeneous coupled component applications that simulate different aspects of the physical phenomena being modeled, and that interact and exchange significant volumes of data at runtime. With the increasing performance gap between on-chip data sharing and off-chip data transfers in current systems based on multicore processors, moving large volumes of data using communication network fabric can significantly impact performance. As a result, minimizing the amount of inter-application data exchanges that are across compute nodes and use the network is critical to achieving overall application performance and system efficiency. In this paper, we investigate the in-situ execution of the coupled components of a scientific application workflow so as to maximize on-chip exchange of data. Specifically, we present a distributed data sharing and task execution framework that (1) employs data-centric task placement to map computations from the coupled applications onto processor cores so that a large portion of the data exchanges can be performed using the intra-node shared memory, (2) provides a shared space programming abstraction that supplements existing parallel programming models (e.g., message passing) with specialized one-sided asynchronous data access operators and can be used to express coordination and data exchanges between the coupled components. We also present the implementation of the framework and its experimental evaluation on the Jaguar Cray XT5 at Oak Ridge National Laboratory.
    Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: As computing power increases exponentially, vast amount of data is created by many scientific research activities. However, the bandwidth for storing the data to disks and reading the data from disks has been improving at a much slower pace. These two trends produce an ever-widening data access gap. Our work brings together two distinct technologies to address this data access issue: indexing and in situ processing. From decades of database research literature, we know that indexing is an effective way to address the data access issue, particularly for accessing relatively small fraction of data records. As data sets increase in sizes, more and more analysts need to use selective data access, which makes indexing an even more important for improving data access. The challenge is that most implementations of indexing technology are embedded in large database management systems (DBMS), but most scientific datasets are not managed by any DBMS. In this work, we choose to include indexes with the scientific data instead of requiring the data to be loaded into a DBMS.We use compressed bitmap indexes from the FastBit software which are known to be highly effective for query-intensive workloads common to scientific data analysis. To use the indexes, we need to build them first. The index building procedure needs to access the whole data set and may also require a significant amount of compute time. In this work, we adapt the in situ processing technology to generate the indexes, thus removing the need of reading data from disks and to build indexes in parallel. The in situ data processing system used is ADIOS, a middleware for high-performance I/O. Our experimental results show that the indexes can improve the data access time up to 200 times depending on the fraction of data selected, and using in situ data processing system can effectively reduce the time needed to create the indexes, up to 10 times with our in situ technique when using identical parallel settings.
    Large Data Analysis and Visualization (LDAV), 2011 IEEE Symposium on; 11/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: Managing the large volumes of data produced by emerging scientific and engineering simulations running on leadership-class resources has become a critical challenge. The data has to be extracted off the computing nodes and transported to consumer nodes so that it can be processed, analyzed, visualized, archived, etc. Several recent research efforts have addressed data-related challenges at different levels. One attractive approach is to offload expensive I/O operations to a smaller set of dedicated computing nodes known as a staging area. However, even using this approach, the data still has to be moved from the staging area to consumer nodes for processing, which continues to be a bottleneck. In this paper, we investigate an alternate approach, namely moving the data-processing code to the staging area rather than moving the data. Specifically, we present the Active Spaces framework, which provides (1) programming support for defining the data-processing routines to be downloaded to the staging area, and (2) run-time mechanisms for transporting binary codes associated with these routines to the staging area, executing the routines on the nodes of the staging area, and returning the results. We also present an experimental performance evaluation of Active Spaces using applications running on the Cray XT5 at Oak Ridge National Laboratory. Finally, we use a coupled fusion application workflow to explore the trade-offs between transporting data and transporting the code required for data processing during coupling, and we characterize the sweet spots for each option.
    25th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2011, Anchorage, Alaska, USA, 16-20 May, 2011 - Conference Proceedings; 01/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: One of the most pressing issues with petascale analysis is the transport of simulation results data to a meaningful analy-sis. Traditional workflow prescribes storing the simulation results to disk and later retrieving them for analysis and visualization. However, at petascale this storage of the full results is prohibitive. A solution to this problem is to run the analysis and visualization concurrently with the simulation and bypass the storage of the full results. One mechanism for doing so is in transit visualization in which analysis and visualization is run on I/O nodes that receive the full sim-ulation results but write information from analysis or pro-vide run-time visualization. This paper describes the work in progress for three in transit visualization solutions, each using a different transport mechanism.
    01/2011;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Complex coupled multi-physics simulations are playing increasingly important roles in scientific and engineering applications such as fusion plasma and climate modeling. At the same time, extreme scales, high levels of concurrency and the advent of multicore and many core technologies are making the high-end parallel computing systems on which these simulations run, hard to program. While the Partitioned Global Address Space (PGAS) languages is attempting to address the problem, the PGAS model does not easily support the coupling of multiple application codes, which is necessary for the coupled multi-physics simulations. Furthermore, existing frameworks that support coupled simulations have been developed for fragmented programming models such as message passing, and are conceptually mismatched with the shared memory address space abstraction in the PGAS programming model. This paper explores how multi-physics coupled simulations can be supported within the PGAS programming framework. Specifically, in this paper, we present the design and implementation of the XpressSpace programming system, which enables efficient and productive development of coupled simulations across multiple independent PGAS Unified Parallel C (UPC) executables. XpressSpace provides the global-view style programming interface that is consistent with the memory model in UPC, and provides an efficient runtime system that can dynamically capture the data decomposition of global-view arrays and enable fast exchange of parallel data structures between coupled codes. In addition, XpressSpace provides the flexibility to define the coupling process in specification file that is independent of the program source codes. We evaluate the performance and scalability of Xpress Space prototype implementation using different coupling patterns extracted from real world multi-physics simulation scenarios, on the Jaguar Cray XT5 system of Oak Ridge National Laboratory.
    11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2011, Newport Beach, CA, USA, May 23-26, 2011; 01/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: The effect of Type-I ELM activity on divertor plate heat load is a key component of the DOE OFES Joint Research Target milestones for this year. In this talk, we present simulations of kinetic edge physics, ELM activity, and the associated divertor heat loads in which we couple the discrete guiding-center neoclassical transport code XGC0 with the nonlinear extended MHD code M3D using the End-to-end Framework for Fusion Integrated Simulations, or EFFIS. In these coupled simulations, the kinetic code and the MHD code run concurrently on the same massively parallel platform and periodic data exchanges are performed using a memory-to-memory coupling technology provided by EFFIS. The M3D code models the fast ELM event and sends frequent updates of the magnetic field perturbations and electrostatic potential to XGC0, which in turn tracks particle dynamics under the influence of these perturbations and collects divertor particle and energy flux statistics. We describe here how EFFIS technologies facilitate these coupled simulations and discuss results for DIII-D, NSTX and Alcator C-Mod tokamak discharges.
    11/2010;
  • [Show abstract] [Hide abstract]
    ABSTRACT: EFFIS is a set of tools developed for working with large-scale simulations. EFFIS is used by researchers in the Center for Plasma Edge Simulation, as well as many other areas of science. EFFIS is composed of services including adaptable I/O, workflows, dashboards, visualization, code coupling, wide-area data movement, and provenance capturing. One of the unique aspects of EFFIS is that it transparently allows users to switch from code coupling on disk to coupling in memory, using the concept of a shared space in a staging area. The staging area is a small fraction of the compute nodes needed to run the large-scale simulation, but it is used for the construction of I/O pipelines and a code-coupling infrastructure. This allows the scientist to make minor changes for the code to work with ADIOS), and then with no changes perform complex transformations, and analytics, which all occur in situ with the simulation. In this talk, we will focus on the technologies CPES uses, which are scalable and can be used on anything from workstations to petascale machines.
    11/2010;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Peta-scale scientific applications running on High End Computing (HEC) platforms can generate large volumes of data. For high performance storage and in order to be useful to science end users, such data must be organized in its layout, indexed, sorted, and otherwise manipulated for subsequent data presentation, visualization, and detailed analysis. In addition, scientists desire to gain insights into selected data characteristics `hidden' or `latent' in these massive datasets while data is being produced by simulations. PreDatA, short for Preparatory Data Analytics, is an approach to preparing and characterizing data while it is being produced by the large scale simulations running on peta-scale machines. By dedicating additional compute nodes on the machine as `staging' nodes and by staging simulations' output data through these nodes, PreDatA can exploit their computational power to perform select data manipulations with lower latency than attainable by first moving data into file systems and storage. Such intransit manipulations are supported by the PreDatA middleware through asynchronous data movement to reduce write latency, application-specific operations on streaming data that are able to discover latent data characteristics, and appropriate data reorganization and metadata annotation to speed up subsequent data access. PreDatA enhances the scalability and flexibility of the current I/O stack on HEC platforms and is useful for data pre-processing, runtime data analysis and inspection, as well as for data exchange between concurrently running simulations.
    Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on; 05/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The convergence of game technology, the Internet, and rehabilitation science forms the second-generation virtual rehabilitation framework. This paper presents the first pilot study designed to look at the feasibility of at-home use of gaming technology adapted to address hand impairments in adolescents with hemiplegia due to perinatal stroke or intraventricular hemorrhage. Three participants trained at home for approximately 30 min/day, several days a week, for six to ten months. During therapy, they wore a fifths dimension technologies ultra sensing glove and played custom-developed Java 3D games on a modified PlayStation 3. The games were designed to accommodate the participants' limited range of motion, and to improve finger range and speed of motion. Trials took place in Indiana, while monitoFring/data storage took place at Rutgers Tele-Rehabilitation Institute (New Jersey). Significant improvements in finger range of motion (as measured by the sensing glove) were associated with self- and family-reported improvements in activities of daily living. In online subjective evaluations, participants indicated that they liked the system ease of use, clarity of instructions, and appropriate length of exercising. Other telerehabilitation studies are compared to this study and its technology challenges. Directions for future research are included.
    IEEE transactions on information technology in biomedicine: a publication of the IEEE Engineering in Medicine and Biology Society 03/2010; 14(2):526-34. · 1.69 Impact Factor
  • Source
    Ciprian Docan, Manish Parashar, Scott Klasky
    [Show abstract] [Hide abstract]
    ABSTRACT: Emerging high-performance distributed computing environments are enabling new end-to-end formulations in science and engineering that involve multiple interacting processes and data-intensive application workflows. For example, current fusion simulation efforts are exploring coupled models and codes that simultaneously simulate separate application processes, such as the core and the edge turbulence, and run on different high performance computing resources. These components need to interact, at runtime, with each other and with services for data monitoring, data analysis and visualization, and data archiving. As a result, they require efficient support for dynamic and flexible couplings and interactions, which remains a challenge. This paper presents DataSpaces, a flexible interaction and coordination substrate that addresses this challenge. DataSpaces essentially implements a semantically specialized virtual shared space abstraction that can be associatively accessed by all components and services in the application workflow. It enables live data to be extracted from running simulation components, indexes this data online, and then allows it to be monitored, queried and accessed by other components and services via the space using semantically meaningful operators. The underlying data transport is asynchronous, low-overhead and largely memory-to-memory. The design, implementation, and experimental evaluation of DataSpaces using a coupled fusion simulation workflow is presented.
    Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC 2010, Chicago, Illinois, USA, June 21-25, 2010; 01/2010 · 0.78 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Scientific applications are striving to accurately simulate multiple interacting physical processes that comprise complex phenomena being modeled. Efficient and scalable parallel implementations of these coupled simulations present challenging interaction and coordination requirements, especially when the coupled physical processes are computationally heterogeneous and progress at different speeds. In this paper, we present the design, implementation and evaluation of a memory-to-memory coupling framework for coupled scientific simulations on high-performance parallel computing platforms. The framework is driven by the coupling requirements of the Center for Plasma Edge Simulation, and it provides simple coupling abstractions as well as efficient asynchronous (RDMA-based) memory-to-memory data transport mechanisms that complement existing parallel programming systems and data sharing frameworks. The framework enables flexible coupling behaviors that are asynchronous in time and space, and it supports dynamic coupling between heterogeneous simulation processes without enforcing any synchronization constraints. We evaluate the performance and scalability of the coupling framework using a specific coupling scenario, on the Jaguar Cray XT5 system at Oak Ridge National Laboratory.
    10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, CCGrid 2010, 17-20 May 2010, Melbourne, Victoria, Australia; 01/2010
  • Source
    Ciprian Docan, Manish Parashar, Scott Klasky
    [Show abstract] [Hide abstract]
    ABSTRACT: As the complexity and scale of applications grow, managing and transporting the large amounts of data they generate are quickly becoming a significant challenge. Moreover, the interactive and real-time nature of emerging applications, as well as their increasing runtime, make online data extraction and analysis a key requirement in addition to traditional data I-O and archiving. To be effective, online data extraction and transfer should impose minimal additional synchronization requirements, should have minimal impact on the computational performance and communication latencies, maintain overall quality of service, and ensure that no data is lost. In this paper we present Decoupled and Asynchronous Remote Transfers (DART), an efficient data transfer substrate that effectively addresses these requirements. DART is a thin software layer built on RDMA technology to enable fast, low-overhead, and asynchronous access to data from a running simulation, and supports high-throughput, low-latency data transfers. DART has been integrated with applications simulating fusion plasma in a Tokamak, being developed at the Center for Plasma Edge Simulation (CPES), a DoE Office of Fusion Energy Science (OFES) Fusion Simulation Project (FSP). A performance evaluation using the Gyrokinetic Toroidal Code and XGC-1 particle-in-cell-based FSP simulations running on the Cray XT3-XT4 system at Oak Ridge National Laboratory demonstrates how DART can effectively and efficiently offload simulation data to local service and remote analysis nodes, with minimal overheads on the simulation itself. Copyright © 2010 John Wiley & Sons, Ltd.
    Concurrency and Computation Practice and Experience 01/2010; 22:1181-1204. · 0.85 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: EFFIS is a set of tools developed for working with large-scale simulations. EFFIS is used by researchers in the Center for Plasma Edge Simulation, as well as many other areas of science. EFFIS is composed of services including adaptable I/O, workflows, dashboards, visualization, code coupling, wide-area data movement, and provenance capturing. One of the unique aspects of EFFIS is that it transparently allows users to switch from code coupling on disk to coupling in memory, using the concept of a shared space in a staging area. The staging area is a small fraction of the compute nodes needed to run the large-scale simulation, but it is used for the construction of I/O pipelines and a code-coupling infrastructure. This allows the scientist to make minor changes for the code to work with ADIOS), and then with no changes perform complex transformations, and analytics, which all occur in situ with the simulation. In this talk, we will focus on the technologies CPES uses, which are scalable and can be used on anything from workstations to petascale machines.
    Proceedings of the 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, PDP 2010, Pisa, Italy, February 17-19, 2010; 01/2010
  • Source
    Li Zhang, Ciprian Docan, Manish Parashar
    12/2009: pages 283 - 309; , ISBN: 9780470558027
  • C. Docan, M. Parashar, C. Marty
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper explores the effectiveness of using the CBE platform for Value-at-Risk (VaR) calculations. Specifically, it focuses on the design, optimization and evaluation of pricing European and American stock options across Monte-Carlo VaR scenarios. This analysis is performed on two distinct platforms with CBE processors, i.e., IBM Q22 blade server and the Playstation3 gaming console.
    Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on; 06/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In order to understand the complex physics of mother nature, physicist often use many approximations to understand one area of physics and then write a simulation to reduce these equations to ones that can be solved on a computer. Different approximations lead to different equations that model different physics, which can often lead to a completely different simulation code. As computers become more powerful, scientists can either write one simulation that models all of the physics or they produce several codes each for different portions of the physics and then 'couple' these codes together. In this paper, we concentrate on the latter, where we look at our code coupling approach for modeling a full device fusion reactor. There are many approaches to code coupling. Our first approach was using Kepler workflows to loosely couple three codes via files (memory-to-disk-to-memory coupling). This paper describes our new approach moving towards using memory-to-memory data exchange to allow for a tighter coupling. Our approach focuses on a method which brings together scientific workflows along with staging I/O methods for code coupling. Staging methods use additional compute nodes to perform additional tasks such as data analysis, visualization, and NxM transfers for code coupling. In order to transparently allow application scientist to switch from memory to memory coupling to memory to disk to memory coupling, we have been developing a framework that can switch between these two I/O methods and then automate other workflow tasks. Our hybrid approach allows application scientist to easily switch between in-memory coupling and file-based coupling on-the-fly, which aids debugging these complex configurations.
    Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, WORKS 2009, November 16, 2009, Portland, Oregon, USA; 01/2009
  • [Show abstract] [Hide abstract]
    ABSTRACT: The computation of value at risk (VaR) can be parallelized to boost performance, but different parallel platforms entail different gains in performance, as well as different costs. This paper explores the cost and performance tradeoffs inherent in the computation of VaR when implemented on different parallel platforms.
    Proceedings of the 2nd Workshop on High Performance Computational Finance, WHPCF 2009, November 15, 2009, Portland, Oregon, USA; 01/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The convergence of game technology (software and hardware), the Internet, and rehabilitation science forms the second-generation virtual rehabilitation framework. This reduced-cost and patient/therapist familiarity facilitate adoption in clinical practice. This paper presents a PlayStation 3-based hand physical rehabilitation system for children with hemiplegia due to perinatal brain injury (hemiplegic cerebral palsy) or later childhood stroke. Unlike precursor systems aimed at providing hand training for post-stroke adults in a clinical setting, the experimental system described here was developed for in-home tele-rehabilitation on a game console for children and adults with chronic hemiplegia after stroke or other focal brain injury. Significant improvements in Activities of Daily Living function followed three months of training at home on the system. Clinical trials are ongoing at this time.
    Virtual Rehabilitation, 2008; 09/2008
  • Ciprian Docan, Manish Parashar, Scott Klasky
    [Show abstract] [Hide abstract]
    ABSTRACT: Large scale simulations of complex physics phenomena have long run times and generate massive amounts of data. Saving this data to external storage systems or transferring it to remote locations for analysis is a costly operation that quickly becomes a performance bottleneck. In this paper, we present DART (Decoupled and Asynchronous Remote Transfers), an efficient data transfer substrate that effectively minimizes the data I/O overhead on the running simulations. DART is a thin software layer built on RDMA technology to enable fast, low-overhead and asynchronous access to data from a running simulation, and support high-throughput, low-latency data transfers.
    Proceedings of the 17th International Symposium on High-Performance Distributed Computing (HPDC-17 2008), 23-27 June 2008, Boston, MA, USA; 01/2008