Masoud Jami

Masoud Jami
  • Zuse Institute Berlin

About

6
Publications
759
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
27
Citations
Current institution
Zuse Institute Berlin

Publications

Publications (6)
Preprint
Full-text available
The performance of data intensive applications is often dominated by their input/output (I/O) operations but the I/O stack of systems is complex and severely depends on system specific settings and hardware components. This situation makes generic performance optimisation challenging and costly for developers as they would have to run their applica...
Article
Full-text available
Non-functional properties of IO-streams are typically not specified and passed to the used middleware or operating system. Knowing properties such as the expected access pattern, reliability, and visibility for data would allow a better storage resource selection by the infrastructure and thus could improve overall performance. With pragma annotati...
Conference Paper
Checkpoint/restart (C/R) makes large-scale parallel jobs resilient against multiple node failures but typically takes considerable time and storage space. Efficient C/R strategies try to gain high levels of fault-tolerance while keeping the involved I/O and computation low. By combining XOR and partner checkpointing, two relatively weak C/R strateg...
Chapter
Full-text available
The FFMK project designs, builds and evaluates a system-software architecture to address the challenges expected in Exascale systems. In particular, these challenges include performance losses caused by the much larger impact of runtime variability within applications, hardware, and operating system (OS), as well as increased vulnerability to failu...
Conference Paper
User-defined and system-level checkpointing have contrary properties. While user-defined checkpoints are smaller and simpler to recover, system-level checkpointing better knows the global system's state and parameters like the expected mean time to failure (MTTF) per node. Both approaches lead to non-optimal checkpoint time, intervals, sizes, or I/...

Network

Cited By