Most message passing parallel programs employ logical process topologies with regular characteristics to support their computation. Since process topologies define the relationship between processes, they present an excellent opportunity for debugging. The primary benefit is that process behaviours can be correlated, allowing expected behaviour to be abstracted and identified, and undesirable behaviour reported. However, topology support is inadequate in most message passing parallel programming environments, including the popular Message Passing Interface (MPI) and the Parallel Virtual Machine (PVM). Programmers are forced to implement topology support themselves, increasing the possibility of introducing errors.This paper proposes a trace- and topology-based approach to parallel program debugging, driven by four distinct types of specifications. Trace specifications allow trace data from a variety of sources and message passing libraries to be interpreted in an abstract manner, and topology specifications address the lack of explicit topology knowledge, whilst also facilitating the construction of user-consistent views of the debugging activity. Loop specifications express topology-consistent patterns of expected trace events, allowing conformance testing of associated trace data, and error specifications specify undesirable event interactions, including mismatched message sizes and mismatched communication pairs. Both loop and error specifications are simplified by having knowledge of the actual topologies being debugged.The proposed debugging framework enables a wealth of potential debugging views and techniques. Copyright © 2004 John Wiley & Sons, Ltd.
|Journal||CONCURRENCY AND COMPUTATION : PRACTICE & EXPERIENCE|
|Publication status||Published - 2004|