Distributed Fault Tolerance entails detecting errors, con fining the damage caused, recovery from the errors, and providing continued service on a network of co-operating machines. Functional languages potentially offer bene- fits for distributed fault tolerance: many computations are pure, and hence have no side-effects to be reversed during error recovery. Moreo ver functional lan- guages have
... [Show full abstract] a high-level runtime system (RTS) where computations and data are readily manipulated. We propose a new RTS level of fault tolerance for distributed functional languages, and outline a design for its implementation for the GdH lan- guage. Glasgow distributed Haskell is a small extension to the Haskell language and the fault tolerance design utilises existing distribut ed graph reduction mecha- nisms. The design distinguishes between pure and impure computations; impure or side effecting computations must be recovered using conventional exception- based techniques, but the RTS attempts implicit backward recovery of pure com- putations.