Performing Advanced Bit Manipulations Efficiently in GeneralPurpose Processors
Performing Advanced Bit Manipulations Efficiently in General-Purpose Processors Yedidya Hilewitz and Ruby B. Lee Princeton Architecture Lab for Multimedia and Security Department of Electrical Engineering Princeton University 18 th IEEE Symposium on Computer Arithmetic (ARITH-18) Montpellier, France, Juneand 25 -27, 2007 Yedidya Hilewitz Ruby B. Lee PALMS University Princeton Performing Advanced Bit Manipulations Efficiently
Background and Motivation n Advanced bit manipulations are not well supported by commodity microprocessors q n n These operations are performed using “programming tricks” (cf. Hacker’s Delight) Bit manipulations play a role in applications of increasing importance We propose a brand new shifter architecture that replaces the shifter with a new unit that directly supports bit manipulation operations PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 2
Outline n n Background and motivation Advanced bit manipulation operations q n n Delineation and example usage New shift-permute functional unit Summary and conclusions PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 3
Advanced Bit Manipulation Instructions n Bit Permutation q n Butterfly (bfly) and Inverse Butterfly (ibfly) Bit Gather and Bit Scatter q Parallel Extract and Parallel Deposit PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 4
Any of the n! permutations of n bits can be done with one pass of bfly and ibfly instructions n n n bfly+ibfly = general permutation circuit n 8 -bit Inverse Butterfly 8 -bit Butterfly lg(n) stages of n 2: 1 MUXes split into n/2 pairs that pass through or swap inputs PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 5
Bit Gather (Parallel Extract) and Bit Scatter (Parallel Deposit) n Parallel Extract q q n pex r 1 = r 2, r 3 extracts bits from r 2 flagged by 1’s in r 3 and compresses and right justifies in result register Parallel extract maps to ibfly datapath PALMS Princeton University n Parallel Deposit q q n pdep r 1 = r 2, r 3 deposits in the result register, at positions flagged by 1’s in r 3, the right justified bits from r 2 Parallel deposit maps to bfly datapath Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 6
Example Usage: Bioinformatics - DNA Sequence DNA Bases A, C, G and T Reversal n n represented by two bit codes Reversing DNA sequence is equivalent to reversing order of bit pairs q n bfly or ibfly permutation 1 ibfly instruction equivalent to 11 -23 ALU and shifter instructions q 2×(and, shift, or) + byte reverse instruction, at minimum PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 7
Advanced Bit Manipulation Functional Unit n n We propose adding a new functional unit to directly perform advanced bit manipulations To minimize the cost, we intend for this new functional unit to replace the shifter unit q n Shifter currently performs basic bit manipulation operations Our new functional unit represents an evolution of shifter designs PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 8
Basic Bit Manipulation Operations n shift r 1 = r 2, s n rotate r 1 = r 2, s n extract r 1 = r 2, pos, len n deposit r 1 = r 2, pos, len n mix r 1 = r 2, r 3 PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 9
Parallel Extract and Parallel Deposit n n Parallel Deposit Parallel Extract PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 10
Evolution of Shifter Designs n Barrel Shifter n Log Shifter n Our proposed design ? PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 11
New Shifter Design n Inverse butterfly (or butterfly) circuit enhanced with extra multiplexer stage is basis of new shifter design We will show that either butterfly or inverse butterfly individually can do rotate Rotations are the basic operation underlying shift, extract, deposit and mix q Model other basic bit manipulation operations as rotate + n n n zeroing sign bit propagation or merging PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 12
New Shift-Permute Functional Unit Implementation PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 13
Configuring Inverse Butterfly for Rotations n n Hard Problem: generating control bits for rotations on inverse butterfly circuit We derive an expression for the control bits based on recursive function of shift amount, s, and stage number, j PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 14
Example: Right Rotation by 5 on 8 -bit Inverse Butterfly Circuit n The input is right rotated by 5 after each stage within each subcircuit PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 15
Example: Right Rotation by 5 on 8 -bit Inverse Butterfly Circuit n After stage 1, input is right rotated by 5 (mod 2) = 1 within each 2 -bit subcircuit PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 16
Example: Right Rotation by 5 on 8 -bit Inverse Butterfly Circuit n n After stage 2, input is right rotated by 5 (mod 4) = 1 within each 4 -bit subcircuit Bits that wrapped at output of previous stage are swapped PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 17
Example: Right Rotation by 5 on 8 -bit Inverse Butterfly Circuit n n After stage 2, input is right rotated by 5 (mod 4) = 1 within each 4 -bit subcircuit Bits that wrapped at output of previous stage are swapped PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 18
Example: Right Rotation by 5 on 8 -bit Inverse Butterfly Circuit n n After stage 3, input is right rotated by 5 Bits that wrapped at output of previous stage are passed through PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 19
Rotations in general on n-bit Inverse Butterfly Circuit n shift amount, s < n/2 → swap bits that wrapped n shift amount, s ≥ n/2 → pass through bits that wrapped PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 20
Circuit Implementation of Rotation Control Bit Generator PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 21
Comparison to Barrel and Log Shifters Barrel Log IBFLY # of Gates n 2 n×log 4(n) n×lg(n) Control Lines n lg(n) n/2×lg(n) Gate delay (of datapath) 1 log 4(n) lg(n) Mux Width (Capacitance) n 4 2 Relative Delay (Logical Effort) 1. 16× 1 1. 19× basic + advanced Bit Manipulation basic Capabilities PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 22
Summary and Conclusions n We proposed evolving the shifter to a new design using butterfly and inverse butterfly datapaths q n We have shown how to perform basic shifter operations on these datapaths q q n New shifter subsumes basic shifter, multimedia shiftpermute unit and advanced bit manipulation unit Rotation control bit generator Extra multiplexer stage for masking and merging Use of the new shifter design in future microprocessor implementations allows for increased capabilities at only marginal cost PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 23
- Slides: 23