Bitmap Indices for Fast EndUser Physics Analysis in
Bitmap Indices for Fast End-User Physics Analysis in ROOT Kurt Stockinger 1, Kesheng Wu 1, Rene Brun 2, Philippe Canal 3 (1) Berkeley Lab, Berkeley, USA (2) CERN, Geneva, Switzerland (3) Fermi Lab, Batavia, USA Int. Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT 2005), Zeuthen, Germany, May 2005
Contents u Introduction u Integration to Bitmap Indices of Bitmap Indices into ROOT n Support for TTree: : Draw and TChain: : Draw n Example Usage u Experimental Results n Index Size n Performance of Bitmap Index vs. TTree. Formula u Conclusions Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 2
Bitmap Indices u Bitmap indices are efficient data structures for accelerating multi-dimensional queries: n u u E. g. p. T > 195 AND n. Tracks < 4 AND muon. Tight 1 cm > 12. 4 Supported by most commercial database management systems and data warehouses Optimized for read-only data Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 3
Equality Encoding vs. Range Encoding a) list of attributes b) equality encoding with cardinality 10 c) range encoding Range encoding optimized for one-sided range queries, e. g. a 0 <= 3 Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 4
Bitmap Indices with Binning u Simple bitmap indices work well for low-cardinality attributes, i. e. number of distinct values per attribute is low ( < 10, 000) u For high-cardinality attributes, the size of the bitmap index is often too large to be of practical usage (also with good compression algorithms) u Solution: n n Keep bitmap for attribute range rather than for each distinct attribute value (binning) Requires additional step for evaluating candidates in bin (“Candidate Check”) – see example on the next slide Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 5
Range Query on Bitmap Index with Binning bitmap 3 XOR bitmap 4 “Candidate check” is performed on bitmap 4 to identify attribute values where x < 63 Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 6
Implementation Details u Fast. Bit: n Bitmap Index software developed at Berkeley Lab n Includes very efficient bitmap compression algorithm u Integrated bitmap indices to support: n TTree: : Draw n TTree: : Chain u Each attribute to be indexed is stored as a separate branch u Index is currently stored as binary file Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 7
Example - Build Index // open ROOT-file TFile f("data/root/data. root"); TTree *tree = (TTree*) f. Get("tree"); TBitmap. Index bitmap. Index; bitmap. Index. Init(); char index. Location[1024] = “/data/index/"; bitmap. Index. Read. Root. Write. Index. File(tree, index. Location); // build index for two attributes bitmap. Index. Build. Index(tree, "a 1", index. Location); bitmap. Index. Build. Index(tree, "a 2", index. Location); Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 8
Example - Tree: : Draw with Index // open ROOT-file TFile f("data/root/data. root"); TTree *tree = (TTree*) f. Get("tree"); TBitmap. Index bitmap. Index; bitmap. Index. Init(); bitmap. Index. Draw(tree, "a 1: a 2", "a 1 < 200 && a 2 > 700"); Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 9
Performance Measurements u Compare performance of TTree. Formula with TBitmap. Index: : Evaluate. Query u Do not include time for drawing histograms u Run multi-dimensional queries (cuts with multiple predicates) Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 10
Experimental Setup u Software/Hardware: n Bitmap Index Software is implemented in C++ n Tests carried out on: s s s Linux Cent. OS 2. 8 GHz Intel Pentium IV with 1 GB RAM Hardware RAID with SCSI disk u Data: n 7. 6 million records with ~100 attributes each n Babar data set: u Bitmap Indices: n 10 out of ~100 attributes n 1000 equality-encoded bins n 100 range-encoded bins n Bitmap Index Compression algorithm: WAH (Word-Aligned Hybrid) Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 11
Size of Compressed Bitmap Indices EE-BMI: equality-encoded bitmap index RE-BMI: range-encoded bitmap index Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 12
Query Performance TTree. Formula vs. Bitmap Indices Performance improvement of bitmap indices over TTree. Formula up to a factor of 10. Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 13
Query Performance TTree. Formula vs. Bitmap Indices Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 14
Query Performance TTree. Formula vs. Bitmap Indices Performance improvement of bitmap indices over TTree. Formula up to a factor of 10. Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 15
Approximate Answers u For bitmap indices with binning the exact answers are yielded during the Candidate Check Phase n Read certain records from disk to check if they fulfill the query constraint u Approximate omitted u The answers are returned if the Candidate Check is error of the approximate depends on the number of bins: n Note: the query result includes more events n However, no correct events are dropped u We used two different binning strategies: n Equality Encoding with 1000 bins: error rate 0. 1% n Range Encoding with 100 bins: error rate 1% Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 16
Query Performance - Approximate Answers (Error 0. 1 - 1%) Performance improvement of bitmap indices over TTree. Formula up to a factor of 30. Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 17
Query Performance - Approximate Answers (Error 0. 1 - 1%) Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 18
Query Performance - Approximate Answers (Error 0. 1 - 1%) Performance improvement of bitmap indices over TTree. Formula up to a factor of 30. Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 19
Conclusions u We integrated bitmap indices into ROOT to support: n TTree: : Draw n TChain: : Draw u Bitmap indices significantly improve the performance of enduser analysis by up to a factor of 10. u With approximate answers of 0. 1 -1% error the performance improvement is up to a factor of 30. u Bitmap indices are also used successfully in STAR experiment at Brookhaven to access ROOT-files with Grid. Collector. u Future work: n Store bitmap indices as ROOT-tree. n Integrate with PROOF to support parallel index evaluation. Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 20
- Slides: 20