Claudia Fohry

Claudia Fohry
Universität Kassel · FB 16

About

22
Publications
642
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
91
Citations
Citations since 2016
16 Research Items
77 Citations
2016201720182019202020212022051015
2016201720182019202020212022051015
2016201720182019202020212022051015
2016201720182019202020212022051015

Publications

Publications (22)
Article
Full-text available
With the advent of exascale computing, issues such as application irregularity and permanent hardware failure are growing in importance. Irregularity is often addressed by task-based parallel programming implemented with work stealing. At the task level, resilience can be provided by two principal approaches, namely checkpointing and supervision. F...
Article
Global spatially explicit land system models are important tools to gain a better scientific understanding and to investigate future development trajectories in form of scenarios. However, uncertainties e.g. of input data and model structure affect their simulation results. This article investigates the impact of model initialization on land change...
Preprint
Full-text available
While checkpointing is typically combined with a restart of the whole application, localized recovery permits all but the affected processes to continue. In task-based cluster programming, for instance, the application can then be finished on the intact nodes, and the lost tasks be reassigned. This extended abstract suggests to adapt a checkpointin...
Article
Fault tolerance is an important requirement for successful program execution on exascale systems. The common approach, checkpointing, regularly saves a program’s state, such that the execution can be restarted after permanent node failures. Checkpointing is often performed on system level, but its deployment on application level can reduce the runn...
Article
Full-text available
Since large parallel machines are typically clusters of multicore nodes, parallel programs should be able to deal with both shared memory and distributed memory. This paper proposes a hybrid work stealing scheme, which combines the lifeline-based variant of distributed task pools with the node-internal load balancing of Java’s Fork/Join framework....
Chapter
Since today’s clusters consist of nodes with multicore processors, modern parallel applications should be able to deal with shared and distributed memory simultaneously. In this paper, we present a novel hybrid work stealing scheme for the APGAS library for Java, which is a branch of the X10 project. Our scheme extends the library’s runtime system,...
Article
Full-text available
Fault tolerance is gaining importance in parallel computing, especially on large clusters. Traditional approaches handle the issue on system-level. Application-level approaches are becoming increasingly popular, since they may be more efficient. This paper presents a fault-tolerant work stealing technique on application level, and describes its imp...
Conference Paper
Work stealing can be implemented in either a cooperative or a coordinated way. We compared the two approaches for lifeline-based global load balancing, which is the algorithm used by X10's Global Load Balancing framework GLB. We conducted our study with the APGAS library for Java, to which we ported GLB in a first step. Our cooperative variant rese...
Article
Scalability postulates fault tolerance to be efficient. One approach handles permanent node failures at user level. It is supported by Resilient X10, a Partitioned Global Address Space language that throws an exception when a place fails. We consider task pools, which are a widely used pattern for load balancing of irregular applications, and refer...
Conference Paper
X10's Global Load Balancing framework GLB implements a user-level task pool for inter-place load balancing. It is based on work stealing and deploys the lifeline algorithm. A single worker per place alternates between processing tasks and answering steal requests. We have devised an efficient fault-tolerance scheme for this algorithm, improving on...
Article
Scalability postulates fault tolerance to be effective. We consider a user-level fault tolerance technique to cope with permanent node failures. It is supported by X10, one of the major Partitioned Global Address Space (PGAS) languages. In Resilient X10, an exception is thrown when a place (node) fails. This paper investigates task pools, which are...
Conference Paper
The Partitioned Global Address Space (PGAS) model is a promising approach to combine programmability and performance in an architecture-independent way. Well-known representatives of PGAS languages include Chapel and X10. Both languages incorporate object orientation, but fundamentally differ in their way of accessing remote memory as well as in sy...
Article
p>Current processor architectures are diverse and heterogeneous. Examples include multicore chips, GPUs and the Cell Broadband Engine (CBE). The recent Open Compute Language (OpenCL) standard aims at efficiency and portability. This paper explores its efficiency when implemented on the CBE, without using CBE-specific features such as explicit async...
Conference Paper
Current processor architectures are diverse and heterogeneous. Examples include multicore chips, CPUs and the Cell Broadband Engine (CBE). The recent Open Compute Language (OpenCL) standard aims at efficiency and portability. This paper explores its efficiency when implemented on the CBE, without using CBE-specific features such as explicit asynchr...

Network

Cited By