System integration issues in Apollo 11
ABSTRACT “Houston, Tranquility Base here. The Eagle has landed.” Two obscure errors almost prevented these words from being spoken. The errors were not made by the crew of Apollo 11 or by the controllers in Houston, nor were they made during the mission. Rather, they were made by engineers and managers, years before the flight. How they happened, and how they went substantially undetected and effectively ignored, is a pair of lessons in system integration that avionics engineers must never forget. The Apollo Program is justly famed as a giant leap for the techniques of management of complex system design and implementation. Nonetheless, these tools were used by human beings and so, necessarily, imperfectly. One of the most challenging tasks in any complex system is controlling and testing the interfaces between major components that are developed by different organizations. Among the management tools deployed by NASA were ICDs (Interface Control Documents). The author has not been able to determine whether this phrase was first coined for the Apollo program or the Mercury and Gemini programs that preceded it, but it was certainly a major tool in Apollo. One of the errors under discussion here was caused by a blatant failure to update an ICD in response to an engineering change, which can be classed as a management error of omission. The other is much subtler, involving a question of how previously unsuspected vulnerabilities (to crew procedures, in this case) should be communicated when they fall outside the scope of an ICD, yet turn out to have relevance to the way the interface is used. This becomes a problem because an ICD is a top-level document limited to specifying the design parameters of one subsystem insofar as they are of concern to one other subsystem. It's not surprising that the symptoms caused by the latter problem have been totally misunderstood by almost everyone from President Nixon on down, and only partially understood even by Buzz Aldrin,-
who along with Neil Armstrong had to deal with them at the time. This misunderstanding is so widespread that almost everyone with any acquaintance with the Program Alarms during the Apollo 11 landing believes that the LM's Primary Guidance Navigation System (PGNS) “failed” in some way and had to be rescued by human intervention. That is the exact opposite of the truth, which is that performance margins built into this very robust system quarantined the effects of the errors so that the landing could proceed with the designed level of human involvement, specifically dodging the “field of boulders” that the PGNS could know nothing about. This paper is largely a retelling of the higherlevel parts of a paper, Tales from the Lunar Module Guidance Computer by the author's colleague Don Eyles , but with the orientation changed from a historical narrative to a cautionary tale with recommendations for modern avionics development management. Results of more recent research by the author and two colleagues are also incorporated.