Failure in the PATHFINDER Mission Chandan Kumar EE

  • Slides: 16
Download presentation
Failure in the PATHFINDER Mission Chandan Kumar EE 585: Fault Tolerant Computing CHANDAN EE

Failure in the PATHFINDER Mission Chandan Kumar EE 585: Fault Tolerant Computing CHANDAN EE 585: Case Study

Outline n n n Background Simplified view of H/W architecture S/W architecture Failure Cause

Outline n n n Background Simplified view of H/W architecture S/W architecture Failure Cause Correction CHANDAN EE 585: Case Study 2

Background Launched Dec 4 1996 n Landed July 4 1997. Mission Objectives: n To

Background Launched Dec 4 1996 n Landed July 4 1997. Mission Objectives: n To prove that the development of "faster, better and cheaper" spacecraft is possible (with three years for development and a cost under US$ 150 million). n To show that it is possible to send a load of scientific instruments to another planet with a simple system and at one fifth the cost of a Viking mission. n CHANDAN EE 585: Case Study 3

Background Contd. To demonstrate NASA's commitment to lowcost planetary exploration finishing the mission with

Background Contd. To demonstrate NASA's commitment to lowcost planetary exploration finishing the mission with a total expenditure of US$ 280 million, including the launch vehicle and mission operations. n Demonstrate the mobility and usefulness of a micro rover on the surface of Mars n It carried a number of scientific instruments like Mars Pathfinder Lander: n Imager for Mars Pathfinder (IMP), (includes magnetometer and anemometer) n Atmospheric and meteorological sensors (ASI/MET) n CHANDAN EE 585: Case Study 4

Background Contd. Rover Sojourner: n Imaging system (three cameras: front B&W stereo, 1 rear

Background Contd. Rover Sojourner: n Imaging system (three cameras: front B&W stereo, 1 rear color) n Laser striper hazard detection system n Alpha Proton X-ray Spectrometer (APXS) n Wheel Abrasion Experiment n Material Adherence Experiment n Accelerometers n Potentiometers n Final transmission Sept 27 1997. n 16500 images sent from lander, 550 from rover n 15 analysis of rocks. CHANDAN EE 585: Case Study 5

Simplified view of Hardware Architecture n n n n Single CPU – Controls the

Simplified view of Hardware Architecture n n n n Single CPU – Controls the Spacecraft. Resides on VME bus. Interface cards for Radio and Camera. Interface to 1553 bus connects to ‘cruiser’ and ‘lander’ stages. H/W on Cruiser – controls thrusters. etc H/W on Lander – interface to instruments like accelerometer, radar altimeter and ASI/MET etc. CHANDAN EE 585: Case Study 6

The Software Architecture |< ------------. 125 seconds -------------->| |<********| |****| |<- bc_dist active ->|

The Software Architecture |< ------------. 125 seconds -------------->| |<********| |****| |<- bc_dist active ->| | < - bus active - >| |**>| bc_sched active |<->| ----|-------------------------|------|-----|--t 1 t 2 t 3 t 4 t 5 t 1 The *** are periods when tasks other than the ones listed are executing. There is some idle time. t 1 - bus hardware starts via hardware control on the 8 Hz boundary. The transactions for the this cycle had been set up by the previous execution of the bc_sched task. t 2 - 1553 traffic is complete and the bc_dist task is awakened. t 3 - bc_dist task has completed all of the data distribution t 4 - bc_sched task is awakened to setup transactions for the next cycle t 5 - bc_sched activity is complete CHANDAN EE 585: Case Study 7

The Failure: n n The spacecraft began experiencing total system resets. This reset reinitializes

The Failure: n n The spacecraft began experiencing total system resets. This reset reinitializes all of the hardware and software. It also terminates the execution of the current ground commanded activities. n The remainder of the activities for that day were not accomplished until the next day CHANDAN EE 585: Case Study 8

The Cause n n The Failure - a case of Priority Inversion In scheduling,

The Cause n n The Failure - a case of Priority Inversion In scheduling, priority inversion is the scenario where a low priority task holds a shared resource that is required by a high priority task. This causes the execution of the high priority task to be blocked until the low priority task has released the resource, effectively "inverting" the relative priorities of the two tasks. If some other medium priority task attempts to run in the interim, it will take precedence over both the low priority task and the high priority task. CHANDAN EE 585: Case Study 9

The Cause Contd. n n The failure was identified by the spacecraft as a

The Cause Contd. n n The failure was identified by the spacecraft as a failure of the bc_dist task to complete its execution before the bc_sched task started The ASI/MET task is delivered its information via an interprocess communication mechanism (IPC). IPC mechanism based on using Pipes. The higher priority bc_dist task was blocked by the much lower priority ASI/MET task that was holding a shared resource. CHANDAN EE 585: Case Study 10

The Cause contd. . n n n CHANDAN The resource that caused this problem

The Cause contd. . n n n CHANDAN The resource that caused this problem was a mutual exclusion semaphore used within the select() mechanism. The ASI/MET task had acquired this resource and then been preempted by several of the medium priority tasks. The bc_dist task attempted to send the newest ASI/MET data via the IPC mechanism which called a Pipe. This pipe blocked taking the semaphore. EE 585: Case Study 11

The Cause contd. . n n The medium priority tasks ran, still not allowing

The Cause contd. . n n The medium priority tasks ran, still not allowing the ASI/MET task to run, until the bc_sched task was awakened. At that point, the bc_sched task determined that the bc_dist task had not completed its cycle (a hard deadline in the system) and declared the error that initiated the reset. CHANDAN EE 585: Case Study 12

Correction n n CHANDAN Changing the creation flags for the semaphore so as to

Correction n n CHANDAN Changing the creation flags for the semaphore so as to enable the priority inheritance Modify the semaphore associated with the pipe used for bc_dist task to ASI/MET task communications corrected the problem. EE 585: Case Study 13

S/W modification on the spacecraft n n n Patching is a specialised process. Send

S/W modification on the spacecraft n n n Patching is a specialised process. Send the difference b/w what you have onboard and what you want on the spacecraft. S/W on the spacecraft modifies the onboard copy. CHANDAN EE 585: Case Study 14

Questions? ? CHANDAN EE 585: Case Study 15

Questions? ? CHANDAN EE 585: Case Study 15

References n n n http: //mars. jpl. nasa. gov/missions/past/pathfinder. html http: //research. microsoft. com/%7

References n n n http: //mars. jpl. nasa. gov/missions/past/pathfinder. html http: //research. microsoft. com/%7 embj/Mars_Pathfinder/A uthoritative_Account. html http: //en. wikipedia. org/wiki/Mars_Pathfinder CHANDAN EE 585: Case Study 16