Sentomist Unveiling Transient Sensor Network Bugs via Symptom
Sentomist: Unveiling Transient Sensor Network Bugs via Symptom Mining Yangfan Zhou, Xinyu Chen, Michael R. Lyu Dept. of Computer Science & Engineering The Chinese University of Hong Kong Jiangchuan Liu School of Computing Science Simon Fraser University The. Unveiling 30 th International on Distributed Computing Systems Sentomist: transient WSNConference bugs via symptom mining 1
Wireless sensor networks • Wireless Sensor Networks (WSNs) – For environmental data collection and monitoring – Networked wireless sensor nodes • Sensor nodes – Sensor + Processor + Wireless – Simple hardware, e. g. , XBOX Mica. Z • RAM: 8 K • Processor: Atmel ATMega 128 series (16 MHz) – Cannot run large software Sentomist: Unveiling transient WSN bugs via symptom mining 2
Wireless sensor networks • Most WSN applications look short and simple – With Tiny. OS, customizing less than 100 lines of codes is enough for a sensor node to sense and forward data • But, WSN deployments notoriously keep encountering various system failures – Caused by software bugs – A major barrier to their extensive applications • Potential WSN users rank reliability as the top 1 concern towards adopting WSNs Sentomist: Unveiling transient WSN bugs via symptom mining 3
Objective of this research • Fighting against WSN bugs are of critical concern towards maturing their applications Sentomist: Unveiling transient WSN bugs via symptom mining 4
Contents Preliminary: nature of WSN programs A motivating example Sentomist: our bug symptom mining approach Case studies Conclusions Sentomist: Unveiling transient WSN bugs via symptom mining 5
Concurrency model of WSN applications Why simple codes are still so buggy? • Special concurrency model – Event-driven – Multi-tasking • Not extensively tested before deployment Sentomist: Unveiling transient WSN bugs via symptom mining 6
Concurrency model of WSN applications • Event-driven programming model • Purpose: energy saving – Enter a power-conserving sleep mode when there is no event – Wake up upon event arrivals • Events: interrupts – Packet arrival – Timer timeout • Event procedure – Specific application logic for handling an event Sentomist: Unveiling transient WSN bugs via symptom mining 7
Concurrency model of WSN applications • Split an event procedure into two phases – Interrupt handler: triggered immediately – Tasks: put in a queue executed in a FIFO manner (deferred function calls) • Purpose – Multi-tasking: avoid resource monopolization • Rules – Interrupt handler: triggered only by its corresponding hardware interrupt – Interrupt handlers and tasks: all run to completion unless preempted by other interrupt handlers – Tasks are posted by interrupt handlers or other tasks and executed in a FIFO manner Sentomist: Unveiling transient WSN bugs via symptom mining 8
Concurrency model of WSN applications Event Procedure An event procedure instance starts at the entry of its corresponding interrupt handler. It ends when its last task has been executed if the interrupt handler posts tasks, otherwise ends when the interrupt handler exits. Sentomist: Unveiling transient WSN bugs via symptom mining 9
Concurrency model of WSN applications • Concurrency model of WSN applications – Random starting and random interleaving of event procedures – A new and complicated concurrency model • Resulting in transient bugs – Caused by occasionally interleaving event procedures bearing implicit dependency – Hard to be triggered by simple testing scenarios – Hard to identify their symptoms in a long term system running process – Possible to cause fatal results Sentomist: Unveiling transient WSN bugs via symptom mining 10
Contents Preliminary: nature of WSN programs A motivating example Sentomist: our bug symptom mining approach Case studies Conclusions Sentomist: Unveiling transient WSN bugs via symptom mining 11
A motivating example: data pollution 10 26 24 22 A buggy function in an ADC event procedure, where packet->data will be polluted if the function is called again before the task prepare. And. Send. Packet runs. Sentomist: Unveiling transient WSN bugs via symptom mining 12
Motivating example • Data-race bug – packet->data is not protected • Caused by the interleaving of event procedures, – triggered when a new data item arrives before the task prepare. And. Send. Packet runs • Hard to be triggered – Need a variety of random interleaving scenarios to hit the condition • Even if it is triggered, the symptom of the bug is not obvious • No way to figure out the data pollution automatically – However, whether an application runs correctly is critical for current approaches of testing or troubleshooting Sentomist: Unveiling transient WSN bugs via symptom mining 13
Motivating example • A new approach for fighting against transient bugs is critical for WSN applications • Observation – most execution patterns are • ADC interrupt, interrupt exit • ADC interrupt, posting a task, interrupt exit, running the task – When the bug is triggered, the pattern is something like • ADC interrupt, posting a task, interrupt exit, ADC interrupt, interrupt exit, running the task • Outlier!!! Sentomist: Unveiling transient WSN bugs via symptom mining 14
Notion of Sentomist • An important notion – Using the transient nature of such bugs – Although tremendous testing scenarios are needed to trigger a bug, the application, however, behaves normally in most testing scenarios – Hence, we can summarize the normal behaviors, since they are dominant features – Outlier behaviors indicates transient bug symptoms Sentomist: Unveiling transient WSN bugs via symptom mining 15
Contents Preliminary: nature of WSN programs A motivating example Sentomist: our bug symptom mining approach Case studies Conclusions Sentomist: Unveiling transient WSN bugs via symptom mining 16
Sentomist: Sensor Application Anatomist 1 Generating tremendous testing scenarios and run the tests 2 Anatomizing program runtime into event procedure instances 3 Quantifying instances of event procedure with instruction counter 4 Mining bug symptom with an outlier detection approach Sentomist: Unveiling transient WSN bugs via symptom mining 17
Sentomist design • Three critical issues in implementing Sentomist – How to decompose the program runtime into a set of time intervals • The program behaviors of the majority of the intervals can exhibit certain statistical similarity (normal system behaviors) • A natural granularity: the runtime of an event procedure instance • But, how? – How to select a set of good attributes to feature each interval • Distinguishing normal system behaviors from outliers – We need a generic outlier detection algorithm to find the intervals containing bug symptoms Sentomist: Unveiling transient WSN bugs via symptom mining 18
Task 1: Identify event procedure in Tiny. OS • Rules – Interrupt handler: triggered only by its corresponding hardware interrupt – Interrupt handlers and tasks: all run to completion unless preempted by other interrupt handlers – Tasks are posted by interrupt handlers or other tasks and executed in an FIFO manner • Track task functions and interrupt handlers – Post a task (put it to the task queue) – Run a task (get it from the task queue) – Interrupt entry and exit • Analyze such sequence can tell us when each event procedure starts and ends Sentomist: Unveiling transient WSN bugs via symptom mining 19
Sentomist design • Three critical issues in implementing Sentomist – How to decompose the program runtime into a set of time intervals • The program behaviors of the majority of the intervals can exhibit certain statistical similarity (normal system behaviors) • A natural granularity: the runtime of an event procedure instance • But, how? – How to select a set of good attributes to feature each interval • Distinguishing normal system behaviors from outliers – We need a generic outlier detection algorithm to find the intervals containing bug symptoms Sentomist: Unveiling transient WSN bugs via symptom mining 20
Featuring event procedure Instruction counter An instruction counter of an event procedure consists of N elements, where N is the total # of instructions of the program’s corresponding machine codes. The ith element denotes the execution number of the ith instruction during the runtime of the event procedure. It can well feature system the behaviors during an event procedure Sentomist: Unveiling transient WSN bugs via symptom mining 21
Sentomist design • Three critical issues in implementing Sentomist – How to decompose the program runtime into a set of time intervals • The program behaviors of the majority of the intervals can exhibit certain statistical similarity (normal system behaviors) • A natural granularity: the runtime of an event procedure instance • But, how? – How to select a set of good attributes to feature each interval • Distinguishing normal system behaviors from outliers – We need a generic outlier detection algorithm to find the intervals containing bug symptoms Sentomist: Unveiling transient WSN bugs via symptom mining 22
Outlier detection approach • One-class support vector machine (SVM) – Assume all data belong to one class, the normal class and the origin point belongs to another class, the outlier class – Model the majority characteristics of a set of unlabelled samples – Most input samples belong to the normal class • If a sample is on the normal side, the closer it is to the boundary, the more suspicious it is as an outlier • Otherwise, the farther it is away from the boundary, the more certain it is as an outlier. • Rank the samples to see how certain a sample contains bug symptoms – The rank can direct the order of human inspections to check whether a bug manifests Sentomist: Unveiling transient WSN bugs via symptom mining 23
Contents Preliminary: nature of WSN programs A motivating example Sentomist: our bug symptom mining approach Case studies Conclusions Sentomist: Unveiling transient WSN bugs via symptom mining 24
Emulation Environment • AVRORA – A state-of-the-art emulator for real WSN applications – Running a binary WSN application in the instruction code level – Providing a cycle-accurate emulation of the sensor node hardware functionalities and their interactions • Exactly meets our requirements: we aim at the transient bugs caused by interleaving executions of event procedures, where timing accuracy is of a critical concern • Why emulation – To extensively explore the program execution space for triggering transient bugs – Real deployment is not cost-effective Sentomist: Unveiling transient WSN bugs via symptom mining 25
Case study I: data pollution • The aforementioned motivating example Cause – Each sensor node requests its sensor reading periodically with a timer Data racing – After collecting three sensor readings, post a task to send the three readings in a data packet • Testing scenario Solution – Data sampling periods are 20 ms, 40 ms, 60 ms, 80 ms, and 100 ms Data protection • Collect 1099 instances of ADC event procedure • Sentomist outputs Sentomist: Unveiling transient WSN bugs via symptom mining 26
Case study I: data pollution • Even for this simple application, the program trace of each testing run is very long – e. g. , when D = 20 ms, the size of the function-level log can reach tens of megabytes – If without Sentomist: It is labor-intensive to manually inspect whether the WSN application runs correctly in each testing run Sentomist: Unveiling transient WSN bugs via symptom mining 27
Case study II: Packet loss • Testing program: a multi-hop packet forwarding protocol based on Blink. To. Radio distributed with Tiny. OS • Testing scenarios – Three motes are located in a straight line – Node 2 sends messages to node 0 via node 1 – The packet forwarding mechanism at node 1 • Obtain the packet content. • Forward the packet to node 0 immediately – Randomize the packet sending ratio of node 2 to inject a random sequence of packet arrival events for node 1 Sentomist: Unveiling transient WSN bugs via symptom mining 28
Case study II: Packet loss • Monitor each instance of event procedure when node 1 forwards messages • Collect 195 instances • Sentomist output Sentomist: Unveiling transient WSN bugs via symptom mining 29
Case study II: Packet loss Cause Improper design: attempting to send a packet immediately when receiving it Solution Queue up a received packet and send them when the busy flag is cleared Sentomist: Unveiling transient WSN bugs via symptom mining 30
Conclusions • Transient bugs in WSN applications caused by random interleaving of event procedures are very difficult to identify – Long term execution is needed for triggering the bugs – Identifying the bug symptom in long term system execution data is labor-intensive • We utilize the transient nature of such bugs – Most event procedure instances behaves similarly – Outliers are indicators of bug symptoms • We design Sentomist – Anatomize the long term system runtime data into a set of event procedure instances – Detect abnormal event procedure instances with plug-in outlier detection algorithm • The effectiveness of Sentomist is demonstrated via representative case studies Sentomist: Unveiling transient WSN bugs via symptom mining 31
Q&A • Sentomist is a GUI-based open-source tool • Download at – http: //www. hkcloud. net/Sentomist – Including all case studies Thank you! Sentomist: Unveiling transient WSN bugs via symptom mining 32
- Slides: 32