1 Debugging of Distributed Systems 2 Debugging of
1 Debugging of Distributed Systems
2 Debugging of Distributed Systems • Example of a tool for distributed systems • Approach to fault search during testing • Control and inspection of internal program runtime
3 Debugging of Distributed Systems Requirements – User-friendliness – Problem-orientation (symbolic Debugging) (String c = „xyz“ instead of „LOC FF 2243 AC 32. . . ) – Reproducibility (quasi-deterministic) – Presentation of state information (Variables, Registers, Ports etc: „show c“) – Modification of system state (set c = „ABC“) – Supervision mechanisms Query / Modification User Debugger state information Tested program
4 Special problems • • Parallel processing Indeterminism Absence of a global state Absence of a common clock Interference “Debugger System” Resulting information flooding Semantics of special constructs (breakpoint, break conditions) • Improved functionality (inter-process communication)
5 Inter-process communication • State information contains in addition to process-/object state also communication state Manipulated intervention preferable • Separation in intra-process layer (conventional) and interprocess-layer (special) Functionality of the inter-process layer • Access to messages: – – insert <m> in <port> read <m> from <port> extract <m> from <port> forward <m> to <port>
6 Inter-process communication • Break points – set break <port> <mtype> [send | receive] – set break <port 1>. . . <portn> • Statistic accounting records • Access to operating system objects (Semaphore, Processes)
7 Consistent state representations Problem: no common clock and storage no consistent state representation • Approaches – Clock synchronization (in the range of milliseconds) – Logic arrangement of the events • Basis: Lamport-Approach – Half-order „Pre-Relation“ – Events are ordered by causal context (sending before receiving) – Unordered if events are independent
8 Consistent state presentations • Rules – a and b in the same process, a before b : a b – a to send, b to receive a message : a b – a b, b c a c (transitively) All essential events for distributed processing can be ordered (consistent logic “snapshots“)
9 Lamport-Approach Realization via the algorithm – each process has event counter Z (initially “Null”) – each inter-process event has a number N(E), as well as the messages ( = N(E)) • Sending: – increment of Z (Z: =Z+1) – marking Sending Event: N(E ) : = Z – marking message: : =Z • Receiving of message with number – if > Z (Receiver) set Z: = + 1 – otherwise set Z: =Z+1 – Receiving Event N(E ) : = Z • Intra-process Event: – Z: =Z+1 – N(E ) : = Z
10 Lamport-Approach P 1 1 2 3 4 5 6 7 P 2 4 P 3 1 2 3 5 7 8 9 10 11 12 9 12 • Causal events ordered completely • Non-causal events unordered (for instance, Nr. 12 within P 2 and P 3)
11 Semantics of breakpoints Problem: When does a break point satisfy distributed conditions? Approach: – simple predicates (a process, „call proc“) – disjunctive predicates („P 1: call proc | P 2: call xy“) – subjunctive predicates („P 1: call proc & P 1: x=1“) only a process inside – joint predicates: coupling of events in pre-relation: t 11 Process 1 t 22 t 23 Process 2 Process 3 t 12 t 31, t 22 : ordered t 11, t 21 : unordered t 31 t 32 t 33
12 Consistent stopping of processes Problem: Time delay after issuing of a halt-command Approach: Backtracking to consistent state directly before a stopping event („reset line“) Procedure: Backtracking of the causal contexts regarding to the pre-relation of messages t 11 Process 1 t 22 t 23 Process 2 Process 3 t 12 t 13 t 14 t 31 t 32 t 33 t 12: stop point event t 24 t 34 Process 2: Backtracking on t 23 Process 3: Backtracking on t 32
13 Distributed trace-steps Basis: Step-Mode from sequential Debuggers (interactive) – one trace-step means movement up to the next point (inter-process event) – local calculations build a entity – sending operations are carried out on all participating processes – receiving operations only if a message exists (as the case may be after sending step) 1 2 3 Distributed trace-steps Calculation phase Interaction point
14 Indeterminism handling Indeterministic program behavior: race conditions Decisions: – Testing of different possible execution sequences via distributed Single Step – Re-execution / Replay via output recording Approach: – recording of all inter-process events – control of repeated execution based on this (Re-execution) – high storage requirements but reduction via check points without precedent events – Replay also to a single process possible (important also in the technical processes)
15 Handling of information flooding Requirement: Recorded / output information to be reduced • Limitation on inter-process events • Limitation on relevant time intervals • Abstraction forms for – process groups – execution (Timing-Diagram) – ports (abstract message flow) • Graphics support (control windows, animation tools)
16 Distributed debugging: concepts Hierarchized influencing • Level 1 : „Free runtime“ – no modification, only trace-recording – minimal interference • Level 2 : „Self-responsibility“ – freely modifiable execution – strong interference – full responsibility of the tester for execution control • Level 3 : „Pseudo-Real-time“ – – “the best possible compensation for strong interference” “private clock” per process “private clock” runs, except in the Debugger-Code “private clock” synchronized via, for instance, Lamport-Algorithm on partial order
17 Architecture principles Alternatives: 1. Separate processes: Program / Debugger 2. Separate processes with common data (also lightweight processes) 3. Integrated processes with direct instrumentation as a rule alternative 2 or 3 are most common
18 Architecture proposal Computer A Process 1 local debugging control Centralized dialogue process Process 2 Computer B Process 3 local debugging control Process 4
- Slides: 18