Failure injection detection experiments Marin Bertier Sbastien Monnet
Failure injection / detection experiments Marin Bertier & Sébastien Monnet 14. 01. 2022 GDS meeting - Rennes 1
Goal n Experimenting both n n Failure injection mechanisms Failure detectors Being able to control volatility Evaluate the behavior of the failure detectors 1/14/2022 GDS meeting - Rennes 2
Failure injection n n Goal: control volatility Principle Dependencies expression in JDF test. xml <failure dep=“profile. Name” grp=“tag”/> n User parameters: TBF / MTBF Used by failure schedule generator Failure schedules deployed by JDF to kill peers at the computed date (reusable) n 1/14/2022 GDS meeting - Rennes 3
Failure detectors n Heartbeat based Adaptable Factorisable n Hierarchical n n 1/14/2022 All-to-all within clusters Mandatory-to-mandatory among clusters GDS meeting - Rennes 4
Experimental setup n Parasol cluster (64 nodes) n n n Dual 2. 2 Ghz opteron / Gigabit Ethernet NTP client on each node (for measurement purposes) Logical partition in 8 group of 8 nodes to emulate a cluster federation (without dummynet) 1/14/2022 GDS meeting - Rennes 5
FI validation 1/14/2022 GDS meeting - Rennes 6
Correlated failures 1/14/2022 GDS meeting - Rennes 7
Intra-cluster detection delays 1/14/2022 GDS meeting - Rennes 8
Network load 1/14/2022 GDS meeting - Rennes 9
Inter-cluster detection delays 1/14/2022 GDS meeting - Rennes 10
New mandatory selection time n In case of mandatory failure n A new one is selected in 147 ms (average time) 1/14/2022 GDS meeting - Rennes 11
Conclusions n We are able to control node volatility n n Accurate / scalable / reproducible The failure detectors work and are n n 1/14/2022 Customizable Efficient GDS meeting - Rennes 12
Current work n Implementation n n JXTA independence Consistency protocol selection Self-Organizing Group selection Experimentations with failure detection and multiple CP / SOG (2 months) 1/14/2022 GDS meeting - Rennes 13
Future work n Introspection + application needs Automatic CP/SOG selection Collaboration with Fabio / Marin / Julien ? C port… (c. f. Mathieu’s talk) 1/14/2022 GDS meeting - Rennes 14
- Slides: 14