Tivoli Software Using Machine Learning Techniques to Enhance

Tivoli Software Motivation IBM’s Fastback Automatic backup and recovery system § Incremental back up

Tivoli Software Outline § The Fastback system § Algorithm for automatic determination of read-ahead

Tivoli Software Fast. Back’s Instant Restore and Mount Instant Restore allows users to start

Tivoli Software CNF: An Algorithm for Readahead Amount Determination 5 © 2010 IBM Corporation

Tivoli Software The problem § A block is needed from repository Xpress Restore Server

Tivoli Software Simple cost model: T ~ T 1 + n. T 2 +

Tivoli Software Problem 1 § The latency T 1 and the block cost T

Tivoli Software Problem 2 § What if the n-values are similar so we will

Tivoli Software The Algorithm § Hold a window of the last k requests §

Tivoli Software Impact on Fastback § Added latency per each request § Outperformed the

Tivoli Software Comments & open issues § The algorithm may be applicable elsewhere §

Tivoli Software Block Prediction and Prefetching for Enhancing Instant Restore 13 © 2010 IBM

Tivoli Software Motivation § IR needs to fetch blocks from the repository according to

Tivoli Software A model for the prefetch problem Workload is an unknown sequence of

Tivoli Software A model for the prefetch problem (cont. ) Slowdown Let L 1,

Tivoli Software Simple prefetch algorithms Delta rule § Whenever Bj is accessed put Bj+1

Tivoli Software Frequent pattern mining based algorithms CMiner (Li et el. FAST 2004) §

Tivoli Software Novel variants of CMiner( ) § Identifies generic frequent delta rules §

Tivoli Software Simulations Setup § Used traces from OLTP financial transactions and of an

Tivoli Software Simulations (cont) § Simple delta rules were hard to bit § Cminer(

Tivoli Software Summary and open issues Automatic read-ahead determination § Highly effective § Can

Slides: 22

Download presentation

Tivoli Software Using Machine Learning Techniques to Enhance The Performance of an Automatic Backup and Recovery System Amir Ronen, Dan Pelleg, Machine Learning Group, HRL Eran Raichstein (IBM Software Group) Amir Ronen 1 © 2010 IBM Corporation

Tivoli Software Motivation IBM’s Fastback Automatic backup and recovery system § Incremental back up of disk volumes to repository § Instant restore (IR): allows applications to start working immediately after recovery § Xpress mount: allows access to back up data without recovering it (e. g. for taking tape dumps) Goal § Accelerate IR and mount via machine learning and algorithmic techniques § Minimum intervention in Fastback’s internals Benefits: minimize bugs, easy upgrading, generality, … 2 © 2010 IBM Corporation

Tivoli Software Outline § The Fastback system § Algorithm for automatic determination of read-ahead – Basic observations – The algorithm – Experiments in the Fastback system § Prefetching – Theoretical model and observation – Basic prefetching algorithms – Frequent pattern based algorithms – Controlling and combining prefetch algorithms § Summary 3 © 2010 IBM Corporation

Tivoli Software Fast. Back’s Instant Restore and Mount Instant Restore allows users to start using applications on the same disk to which the volume is being restored, while the restore operation is still in process. From an architectural perspective, mount is somewhat similar 1. Activate Instant Restore 2. Read IOs from un-recovered areas trigger block fetch from the repository 3. All other reads are performed as usual New Production Disk Typical Production Disk Production server Xpress Restore Server New Production server repository 4 © 2010 IBM Corporation

Tivoli Software The problem § A block is needed from repository Xpress Restore Server New Production server § Suppose that we are allowed to bring additional subsequent blocks § How many to bring? - too many may slow down the system (in particular if they will not be used) - too few will cause high total latency 6 repository © 2010 IBM Corporation

Tivoli Software Simple cost model: T ~ T 1 + n. T 2 + § T 1 “fixed” latency § T 2 time to bring one block § n number of blocks § noise (assumed zero) Key idea Suppose that we choose n such that T 1 = n. T 2 § The cost never more than doubles § In many settings n can be large The algorithm is 2 competitive 7 © 2010 IBM Corporation

Tivoli Software Problem 1 § The latency T 1 and the block cost T 2 are not known § May vary over time Solution § Hold a window of last k requests (e. g. 200) § Use linear regression to estimate T 1 and T 2 § Update can be done in O(1) Latency ~ 6. 5 Block cost ~ 3 8 © 2010 IBM Corporation

Tivoli Software Problem 2 § What if the n-values are similar so we will not be able to estimate? Sampling ideas § We only need a few samples § If mean(n) is large we sample small values § If mean(n) is small, we sample 2*mean(n) § Low amortized cost 9 © 2010 IBM Corporation

Tivoli Software The Algorithm § Hold a window of the last k requests § At each step update the linear regression (Refresh from time to time) § If regression is possible: – Estimate T 1, T 2 – Compute desired n value – If the system asked for less, recommend readahead § Otherwise – Sample as described Additional Heuristics unreasonable values, smoothing, mis-estimation… 10 © 2010 IBM Corporation

Tivoli Software Comments & open issues § The algorithm may be applicable elsewhere § Extensions to more complicated cost models § Analyzing executions of parallel copies of the algorithm 12 © 2010 IBM Corporation

Tivoli Software Motivation § IR needs to fetch blocks from the repository according to its workload Xpress Restore Server New Production server § Ideally, blocks will predicted and brought before they are needed Comments § The network is not preemptive so prefetching can also be harmful § Typical workloads are parallel processes, each with some locality of reference 14 repository © 2010 IBM Corporation

Tivoli Software A model for the prefetch problem Workload is an unknown sequence of events L 1, … Ln. Each Lj is either: § An access to a block Bj § A process event System is composed of a CPU and network that can be ran in parallel. At each step j the system can do one of the following 1. Process (Lj is a process event, cost = 1 unit) 2. Access its local memory (If Lj is an access event and Bj is already in the local memory, cost = 1 unit) 3. Fetch a block from the repository (this occupies the network for C time units, can be done in parallel to 1 or 2) 15 © 2010 IBM Corporation

Tivoli Software A model for the prefetch problem (cont. ) Slowdown Let L 1, … Ln be a workload. The slowdown of the system on L is the ratio between the total system time and the time to perform the workload locally, i. e. Tsys / n. Workload B 17 Process Access CPU B 18 Process Access Fetch 18 … … Process C=2 Delta Network Fetch 17 § Slowdown is ~1, § Without prefetching, slowdown is around 2 16 © 2010 IBM Corporation

Tivoli Software Simple prefetch algorithms Delta rule § Whenever Bj is accessed put Bj+1 in queue § Whenever network is idle, prefetch in LIFO order § Very effective rule, simple to implement No prefetch § Can be shown as 2 -competitive! Order by frequency § In train time, order blocks by their frequency OPT Hypothetical optimal offline algorithm 17 © 2010 IBM Corporation

Tivoli Software Frequent pattern mining based algorithms CMiner (Li et el. FAST 2004) § Identifies reoccurring block sub-sequences in train time § Problematic runtime and space complexity in our settings A, E, L Z Hot item B-tree 18 © 2010 IBM Corporation

Tivoli Software Simulations Setup § Used traces from OLTP financial transactions and of an SQL stress tool. § Simulated the system under various parameters and measured slowdown in various time points 20 © 2010 IBM Corporation

Tivoli Software Summary and open issues Automatic read-ahead determination § Highly effective § Can be applicable elsewhere § Calls for more generalized cost models Block prediction and prefetch § Simple delta rules seem hard to beat § Potential for improvement § Novel frequent pattern mining based algorithms. Might be interesting in other context (e. g. caching) 22 © 2010 IBM Corporation