Rewind Repair Replay Three Rs to improve dependability
Rewind, Repair, Replay: Three R’s to improve dependability Aaron Brown and David Patterson ROC Research Group University of California at Berkeley SIGOPS European Workshop, 23 September 2002
What if computer systems could travel in time? • We could have retroactive repair – travel back and fix problems before they had a chance to corrupt data • We could eliminate human operator error – make a mistake? Just travel back and try it again. • Our systems could be more robust – we could eliminate the dangers of upgrades – we could better tolerate buggy software – we might even be able to tolerate viruses and hackers • We could make more dependable systems Slide 2
Sci-fi vs. computer time travel • Sci-fi time travel – our hero loses a loved one or lives through disaster – hero uses time machine to travel back in time – hero alters the past to avert the future disaster – hero returns to the present; past changes have been merged into the original timeline • Computer time travel – human error, software bug, or attack causes data loss – Rewind: roll system state backwards in time – Repair: make changes to avert foretold disaster – Replay: roll system state forward, merging the original timeline with the effects of repairs • Three R’s are the fundamental primitives of computer time travel Slide 3
Key properties of the 3 R’s • Recovery from problems at any system layer – rewind, repair, replay cover OS through application • Recovery from unanticipated problems – arbitrary repair • No assumptions about correct application behavior – physical rewind • Integrated interface – provide “undo for sysadmins” Slide 4
What about existing approaches? Approach Rewind Repair Replay Backups, checkpoints, snapshots, no -overwrite storage ü physical RDBMS log replay Workflow w/ compensating transactions Timewarp (PARC collaborative productivity apps) ü ü physical ü phys/log ü logical ü ü (limited) Comments read-only view of history ü application-level only; cannot alter committed transactions ü limited apps; mechanisms not usefully integrated for time travel ü application-level only; repair limited to wellunderstood history edits Slide 5
Designing a 3 R system • Goals – application-neutrality – provide abstractions for reasoning about 3 R behavior • Target domain: network services – accessed by remote users via well-defined interfaces – email, messaging, e-commerce, auctions, forums, web hosting, enterprise applications (J 2 EE, . NET), . . . • Challenges, learned from first attempt – integrating history and repair during replay – managing inconsistency in externally-visible state Slide 6
Basic architecture • Application-independent undo manager – coordinates 3 R cycle; manages external inconsistencies – linked via a set of APIs to application, time-travel storage, history log, and control UI Control UI App. Service Includes: - user state - application - operating system Time-travel storage layer 3 R API Undo Manager History Log l contro Slide 7
Abstracting the application service • To the undo manager, the application is: – a collection of state – a history of events affecting the state » an event is typically a user interaction with the service – a model of acceptable external consistency • These are encoded into application-defined verbs – high-level encodings of user interactions (events) » records of intent to alter state, not actual state changes – reference application state by opaque UIDs – provide policies that define external consistency Slide 8
Verbs and the 3 R cycle • Normal operation – undo manager logs application-provided verbs to disk User interaction Control UI App. Service Includes: - user state - application - operating system Time-travel storage layer Verbs Undo Manager History Log l contro Slide 9
Verbs and the 3 R cycle • Rewind – time-travel storage layer reverts system hard state to rewind point – all changes since rewind point are discarded Control UI App. Service Undo Manager Includes: - user state - application - operating system Time-travel storage layer History Log l contro Slide 10
Verbs and the 3 R cycle • Repair – operator edits logged history and/or makes arbitrary changes to system Control UI Repairs Edits App. Service Undo Manager Includes: - user state - application - operating system Time-travel storage layer History Log l contro Slide 11
Verbs and the 3 R cycle • Replay – undo manager feeds verbs back to application for reexecution in the context of repaired system Control UI App. Service Includes: - user state - application - operating system Time-travel storage layer Verbs Undo Manager History Log l contro Slide 12
The fundamental roles of verbs • Providing application-independence – verbs encapsulate application semantics, but remain semi-opaque to undo manager • Integration of repair into history – high-level specification of intent makes verbs relatively independent of system changes – verbs are re-executed, not restored, so they inherit effects of repairs • Scoping restored history – only changes logged as verbs will be preserved by 3 Rs » effects of bugs, corruption, human error are discarded – can reason about what is preserved/lost in 3 R cycle Slide 13
Managing external inconsistency • External inconsistency == time paradox? – system is internally-consistent after a 3 R cycle – but external observers see inexplicable state changes – external inconsistency is OK unless affected state was externalized (observed) before the 3 R cycle • Coping with external inconsistency – cannot eliminate – must manage: ignore, explain, compensate, encompass • Verbs let us manage external inconsistency Slide 14
Managing inconsistency with verbs • To detect inconsistencies: – verbs specify the state that they depend upon – undo manager tracks signatures of that state – if verb is altered or if signatures don’t match, there is an inconsistency » applications supporting relaxed consistency can replace signature-check with arbitrary consistency predicates • To detect state viewed externally: – verbs indicate what state they externalize » example: IMAP fetch verb externalizes email message • To handle externalized inconsistencies: – verb supplies compensation functions Slide 15
Email example: original timeline m m e iv el tc olle. H Fe Inbox h r ! System state olle. H D System boundary Hello Mov e olle. H Folder 1 Verbs History log Deliver. Msg Move. Msg Externalizes: — Content. Dep: — Exists. Dep: Inbox, Folder 1 + input “Hello” Time Fetch. Msg Externalizes: m Content. Dep: m Exists. Dep: m, Folder 1 + Signature(m)=“olle. H” Slide 16
Email example: replay timeline m el e iv X olle. H Hello Fe tc h Inbox r ! System state m D System boundary Hello olle. H Mov e olle. H Hello Folder 1 Verbs History log Deliver. Msg Move. Msg Externalizes: — Content. Dep: — Exists. Dep: Inbox, Folder 1 + input “Hello” Time mismatch! => inconsistency Fetch. Msg Externalizes: m Content. Dep: m Exists. Dep: m, Folder 1 + Signature(m)=“olle. H” Slide 17
Recap: 3 R architecture • Goal: application-neutral implementation of 3 R’s – verb abstraction couples generic undo manager to app. – verbs provide tools to reason about 3 R behavior • Challenges – integrating history and repair during replay » re-executing verbs restores intent of history – managing inconsistency in externally-visible state » verbs track externalization, state dependencies, and define compensations Slide 18
Status • Prototype implementation of 3 R primitives nearly complete – app-independent undo manager written in Java – all APIs defined as Java interfaces – Network Appliance filer as time-travel storage layer – Berkeley. DB as history log • First target app: web-based email service – 3 R-enhanced Java. Mail API provider classes » plus additional hooks to verb-ify operator maintenance tasks like account creation – JWeb. Mail web front-end – RDBMS-based backend mail store (DB 2 or My. SQL) – implementation in progress Slide 19
Open issues & future work • Resource impact of the 3 R’s – what are the performance/space penalties for the 3 R’s? • Verb definition – can we specify verbs & consistency policy declaratively? • Providing the 3 R’s at multiple granularities – can we track & manage cross-granularity dependencies? • Measuring the dependability benefit of 3 R’s – how do we build recovery/dependability benchmarks? • Other uses for verb-based characterizations – easy georeplication? online self-checking? automatic verification of upgrades? Slide 20
Conclusions • We can build time travel for computers – using the 3 R’s: Rewind, Repair, Replay • An architecture for the 3 R primitives – generic undo manager coupled to application by verbs • Verbs are a useful abstraction for the 3 R’s – can use to reason about effects of 3 R’s on state – help address problem of external inconsistencies • Prototype 3 R-enabled email system under construction – hope to demonstrate increased dependability and faster recovery from problems Slide 21
Rewind, Repair, Replay: Three R’s to improve dependability For more information: http: //roc. cs. berkeley. edu/ abrown@cs. berkeley. edu
Backup slides Slide 23
Verbs vs. transactions • Both encapsulate state-altering events • But, unlike transactions: – verbs are higher-level, recording end-user intent, not specific state changes – verbs do not depend on internal data models (but do depend on external protocols) » transactions are the reverse – verbs do not necessarily conform to ACID consistency » verbs inherit consistency model provided by application at the external-protocol level Slide 24
Implementing verbs • Verbs are defined by a type hierarchy – base type defines interfaces for state dependencies, externalizations, predicates, compensations – applications subclass the base type for their verbs » additions to the type are opaque to the undo manager • Referencing state – all user-visible state named by time-invariant UIDs – undo manager requires signature method for all state • Consistency predicates and compensations are application-supplied functions – they encode the app’s external consistency model Slide 25
Defining verbs • Currently, verbs are defined procedurally – provide dependency information via lists of state IDs – provide functions for special consistency predicates – provide functions for compensation • Better: declarative specification – compile textual specification into verb code using libraries of predicates and compensation fns – reduces complexity of adding 3 R’s to the application – increases confidence in undo system via easier testing Slide 26
External consistency policies • Verbs capture external consistency policies • Example: email – message order in folder is irrelevant » Append. Message verb does not express dependency on content of target folder, only its existence – content of messages is relevant, except for headers » Read. Msg verb depends on hash of target message body; if changed, compensate by inserting explanatory text • Example: e-commerce – order total depends on item prices, not descriptions » Checkout verb depends on prices of items in cart, not their hash-values; if sum of prices changed, compensate by emailing customer for approval Slide 27
External consistency policies (2) • Example: auctions – new bid must be larger than prior bids » Place. Bid verb depends on content of all bids in bid set; if one is now larger than new bid, compensate by canceling new bid and informing bidder Slide 28
Application implications • To support the 3 R’s, an application must have: – a high-level, verb-structured interface/API for user, operator, and external actions – a state model where all user-visible state: » is nameable via the API » is tagged with GUIDs » supports a signature/hash method – a relaxed external consistency model that allows compensation for externalized inconsistent verbs Slide 29
Example: a 3 R email store IMAP, internal SMTP Transport internal Store verbs • State Undo. Mgr verbs Web. UI HTTP LDAP, internal Directory/ Auth. – mailstores, folders, messages, user properties, aliases • Verbs – transport: create/delete/alter mapping; deliver msg – directory: create/alter/delete user-entry; create/alter/delete filter-rule; add/remove maildrop – store: create/delete store; create/rename/delete folder; expunge folder; list folder; set folder flags; copy msg; append msg; fetch msg; set msg flags Slide 30
- Slides: 30