MAP ART Mapping Architectural Properties to an Algorithm

















- Slides: 17
MAP ART Mapping Architectural Properties to an Algorithm for Redundant Triangulation Chris Savarese, Yashesh Shroff, Greg Lawrence Advisor: Dr. Jan Rabaey April 27, 2000 CS 252
Outline • • • Introduction Background Time and Energy Profiling Parallel Architectures Conclusions: Our Dream Architecture Future Work 2
Introduction • Goal: Given a basic localization algorithm, explore architectural alternatives for the minimization of energy consumption. • The concept of localization • Energy saving techniques • What we did… 3
Outline • • • Introduction Background Time and Energy Profiling Parallel Architectures Conclusions: Our Dream Architecture Future Work 4
The Localization Algorithm N 1 (x 1 -xn) U N 2 N 3 . . . (y 1 -yn) Am 3 (z 1 -zn) x b 1 . . . y . . . (xn-1 -xn) (yn-1 -yn) (zn-1 -zn) z U 3 1 N 3(x 3, y 3, z 3) U (x, y, z) bn-1 Bn-1 1 N 1(x 1, y 1, z 1) N 2(x 2, y 2, z 2) = [Am 3] QRdcmp() Solve: U = R-1 QT b [Qm 3] ·[R 3 3] 5
The Strong. ARM Architecture • Power: 200 m. W, 0. 25 m, 1. 5 V • Clock Speed: 200 MHz • Cache: - 16 KB I-cache - 8 KB D-cache - 32 -way set-associative, round-robin replacement - 512 B, 2 -way Minicache • 31/16 GPR (32 -bit) • Auto-increment addressing • No FP processor • MAC 6
The Tensilica Xtensa Architecture Processor Configuration • Power: 200 m. W, 0. 25 m, 1. 5 V • Clock Speed: 170 MHz • Cache: - 16 KB I-cache - 16 KB D-cache - Direct mapped • 32 Registers (32 -bits) • Xtensibility Use of TIE instructions • No FP processor • Zero overhead loops 7
Outline • • • Introduction Background Time and Energy Profiling Parallel Architectures Conclusions: Our Dream Architecture Future Work 8
Profiling Results Profiler Output: -----------------------_fmul 18. 21% 0. 00% 188000 -----------------------lubksb 15. 27% 5. 17% 10. 10% 10000 _fneq 0. 37% 0. 00% 14000 _fdiv 4. 23% 0. 00% 30000 _fmul 5. 03% 0. 00% 52000 _frsb 0. 46% 0. 00% 52000 Floating Point Strong. ARM Processor 68 J Xtensa Processor 144 J Energy = nom. core power #cycles clock period 9
Fixed Point Arithmetic • Floating Point vs. Fixed Point 1 8 23 S E Mantissa 16 16 0000 • Add / Sub are straightforward • Multiply / Divide require shifting • Why can we use it for localization? • Low accuracy requirements • Limited range in measurements (< 10 m) • Small matrices small error propagation 10
Fixed Point Profiling Results Profiler Output: -----------------------_fmul 18. 21% 0. 00% 188000 -----------------------lubksb 15. 27% 5. 17% 10. 10% 10000 _fneq 0. 37% 0. 00% 14000 _fdiv 4. 23% 0. 00% 30000 _fmul 5. 03% 0. 00% 52000 _frsb 0. 46% 0. 00% 52000 Floating Point Strong. ARM Processor 68 J Xtensa Processor 144 J Fixed Point Strong. ARM Processor 43 J (37% less) Xtensa Processor 69 J (52% less) Energy = nom. core power #cycles clock period 11
Outline • • • Introduction Background Time and Energy Profiling Parallel Architectures Conclusions: Our Dream Architecture Future Work 12
Parallel Architectures - Write sequential code in Matlab - Extract data-dependencies - Workload analysis P CP 1 CP 2 CP 3 13
Outline • • • Introduction Background Time and Energy Profiling Parallel Architectures Conclusions: Our Dream Architecture Future Work 14
Our Dream Architecture • • • Floating point hardware MAC hardware Zero overhead loops Auto increment Register file size Cache Direct mapped 15
Future Work • FPGA implementation • Xtensa customizations • TIE instructions • Floating Point Coprocessor • Realistic algorithm for Pico. Radio 16
Many Thanks To… • Dr. Bart Kienhuis, EECS Post Doc • Ptolemy and other tools: Parallel issues • Fred Burghardt, BWRC Technical Staff • Pico. Radio Testbed • Marlene Wan, BWRC Student • Strong. ARM Energy Profiling • Vandana Prabhu, BWRC Student • Tensilica Tools • The Berkeley Wireless Research Center 17