The checkpointing approach of rollback-recovery has been widely used for fault-tolerance in distributed computing system. There are many com-munication messages resulting in much dependency during the time of program running. Once a process generates faults, many processes that are directly or indirectly related with the faulting process will be in-fluenced. These processes in turn rollback to
... [Show full abstract] some pre-viously stored state, respectively. What's worse, the rollback action may repeatedly trigger another roll-back action of other dependent processes. This is what we know as the domino effect[11]. The main cause of generating domino effect is Z-cycles[2]. So far there is no effective method to detect Z-cycles with length more than two. In this paper, we propose a distributed algorithm to detect Z-cycles with long length.