CCK: An Improved Coordinated Checkpoint/Rollback Protocol for Dataflow Applications in KAAPI

 Authors

Xavier Besseron, Samir Jafar

Abstract

Fault tolerance protocols play an important role in today long runtime scientific parallel applications because the probability of failure may be important due to the number of unreliable components involved during simulation. In this paper we present our approach and preliminary results about a new checkpoint/recovery protocol based on a coordinated scheme. This protocol is highly coupled to the availability of an abstract representation of the execution.

Keywords

Parallel Application, Dataflow Graph, Checkpoint/Recovery

الملفات المرفقة

Syrian Private University - Scentafic Research @ 2024 by Syrian Monster - Web Service Provider | All Rights Reserved