
Masoud Jami- Zuse Institute Berlin
Masoud Jami
- Zuse Institute Berlin
About
6
Publications
759
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
27
Citations
Introduction
Current institution
Publications
Publications (6)
The performance of data intensive applications is often dominated by their input/output (I/O) operations but the I/O stack of systems is complex and severely depends on system specific settings and hardware components. This situation makes generic performance optimisation challenging and costly for developers as they would have to run their applica...
Non-functional properties of IO-streams are typically not specified and passed to the used middleware or operating system. Knowing properties such as the expected access pattern, reliability, and visibility for data would allow a better storage resource selection by the infrastructure and thus could improve overall performance. With pragma annotati...
Checkpoint/restart (C/R) makes large-scale parallel jobs resilient against multiple node failures but typically takes considerable time and storage space. Efficient C/R strategies try to gain high levels of fault-tolerance while keeping the involved I/O and computation low. By combining XOR and partner checkpointing, two relatively weak C/R strateg...
The FFMK project designs, builds and evaluates a system-software architecture to address the challenges expected in Exascale systems. In particular, these challenges include performance losses caused by the much larger impact of runtime variability within applications, hardware, and operating system (OS), as well as increased vulnerability to failu...
User-defined and system-level checkpointing have contrary properties. While user-defined checkpoints are smaller and simpler to recover, system-level checkpointing better knows the global system's state and parameters like the expected mean time to failure (MTTF) per node. Both approaches lead to non-optimal checkpoint time, intervals, sizes, or I/...