Module Bit Vector Bit Vector more than the
Module “Bit: : Vector” “Bit: : Vector - more than the name suggests” Steffen Beyer YAPC: : Europe, London, UK, sd&m software design & management Gmb. H & Co. KG Thomas-Dehler-Straße 27 81737 München Telefon (0 89) 6 38 12 -0 Telefax (0 89) 6 38 12 -150 http: //www. sdm. de ICA, September 22 -24 2000 1
Agenda • What does it do? • Purpose(s) • Summary of available methods • Characteristics • Alternatives • Some Applications • Questions & Answers, Suggestions 2
What does it do? The Bit: : Vector module implements bit arrays of arbitrary size. Not very sexy, you may think. But actually bit vectors are the base of all computations performed by a computer! Your CPU calls them "processor registers". . . By the way, is everybody familiar with two's complement binary representation and arithmetics? 3
Purpose(s) • Efficient storage and handling of bit arrays • Extend your CPU to any desired number of bits • Efficient set operations • Efficient big integer arithmetic 4
Summary of available methods (See file "Bit. Vector. txt") • Especially interesting methods: – "Interval_Substitute()" (is to bit vectors what "splice" is to Perl arrays) – "Interval_Scan_. . . ()" (finds contiguous blocks of set bits) – "Chunk_. . . ()" (allows access to packets of bits at a time of chooseable size) – ". . . Reverse()" (same to bit vectors as Perl's "reverse" for strings) 5
Characteristics (1/3) • Internally written in C (thus fast) • Relies on CPU's machine word operations for maximum speed • Auto-adapts to size of machine word at runtime • Uses efficient algorithms (mostly "divide-and-conquer"), time complexity of many functions O(1), O(n ld n) • C library at the core can also be used stand-alone (without Perl) • Free Software (GPL+Artistic), C library also LGPL 6
Characteristics (2/3) - Efficient Algorithms • Example: Exponentiation (xk) E. g. 2713 (base 10) k = 13 = 27*27*27*27*27*27*27 = 110111101 (base 2) n = int(ld k) = 3 = (110118)1 * (110114)1 * (110112)0 * (110111)1 Worst case: 2 n multiplications = O(n) = O(ld k) instead of k - 1 = O(k) – here: only 5 instead of 12 • Example: Conversion to decimal representation Divides bit vector modulo largest power of 10 fitting into a machine word, then uses machine word math operations to break remainder down further • Example: Bit counting (number of set bits) 7
Characteristics (3/3) • Object-oriented interface, e. g. $vec 1 ->intersection($vec 2, $vec 3); • Optionally(*) provides overloaded operators – one set of operands for set operations, e. g. $set 1 = $set 2 & $set 3; – one set of operands for big integer math, e. g. $bigsum += $bigint; (* ): will be optional in version 6. 0 (for improved loading speed of "plain" module), is always loaded now 8
Alternatives (1/2) • vec() – confusing – insufficiently powerful for many applications • PDL – complicated – designed primarily for astronomical data analysis and heavy duty number crunching (written in C, internally) • Math: : PARI – very powerful – requires separate C library "PARI" • Math: : Big. Int (is in the Core of Perl 5. 6) – slow (written entirely in Perl, stores digits in Perl arrays) • Math: : Big. Integer – unmaintained, doesn't compile (uses XS and a C library) 9
Alternatives (2/2) • Set: : Bag - implements multisets • Set: : Int. Span - optimized for. newsrc file type sets (also supported by Bit: : Vector, but need more memory) • Set: : Object - implements sets of arbitrary objects (can be simulated with Bit: : Vector using lookup table, set operations will then be faster) • Set: : Scalar - similar to Set: : Object (? ), but also allows recursion (set of sets) • Set: : Window - optimized for intervals of integers (needs much less memory than Bit: : Vector, but only of limited use since the whole interval is either in or out) 10
Simulating Set: : Object using lookup table • See file "Set. Object. pl" 11
Some Applications • Set: : Int. Range - sets of integers (universe = some interval) • Math: : Matrix. Bool - useful for graph algorithms (e. g. shortest paths / Kleene's Algorithm) • Slice (multiple document version generator) • Parse table generators for compiler-compilers à la "yacc" (calculating first, follow & lookahead character sets) • Cryptography • Easy manipulation of data (files), any number of bits at a time 12
Application "Slice" • See – homepage screenshot "Slice. bmp" – file "file. in" – file "Slice. txt" – file "file. html. en. OK" – file "file. html. de. OK" – URL http: //www. engelschall. com/sw/slice/ 13
Application "Date: : Calc" v 5. 0 (coming soon) • Stores years in bit vectors (one year = one bit vector, one day = one bit) • Bit is "on" if corresponding day is a holiday • Performs calculations taking holidays into account 14
Questions & Answers, Suggestions • Please feel free to ask! • Suggestions are welcome. 15
- Slides: 15