MAP ART Mapping Architectural Properties to an Algorithm

  • Slides: 17
Download presentation
MAP ART Mapping Architectural Properties to an Algorithm for Redundant Triangulation Chris Savarese, Yashesh

MAP ART Mapping Architectural Properties to an Algorithm for Redundant Triangulation Chris Savarese, Yashesh Shroff, Greg Lawrence Advisor: Dr. Jan Rabaey April 27, 2000 CS 252

Outline • • • Introduction Background Time and Energy Profiling Parallel Architectures Conclusions: Our

Outline • • • Introduction Background Time and Energy Profiling Parallel Architectures Conclusions: Our Dream Architecture Future Work 2

Introduction • Goal: Given a basic localization algorithm, explore architectural alternatives for the minimization

Introduction • Goal: Given a basic localization algorithm, explore architectural alternatives for the minimization of energy consumption. • The concept of localization • Energy saving techniques • What we did… 3

Outline • • • Introduction Background Time and Energy Profiling Parallel Architectures Conclusions: Our

Outline • • • Introduction Background Time and Energy Profiling Parallel Architectures Conclusions: Our Dream Architecture Future Work 4

The Localization Algorithm N 1 (x 1 -xn) U N 2 N 3 .

The Localization Algorithm N 1 (x 1 -xn) U N 2 N 3 . . . (y 1 -yn) Am 3 (z 1 -zn) x b 1 . . . y . . . (xn-1 -xn) (yn-1 -yn) (zn-1 -zn) z U 3 1 N 3(x 3, y 3, z 3) U (x, y, z) bn-1 Bn-1 1 N 1(x 1, y 1, z 1) N 2(x 2, y 2, z 2) = [Am 3] QRdcmp() Solve: U = R-1 QT b [Qm 3] ·[R 3 3] 5

The Strong. ARM Architecture • Power: 200 m. W, 0. 25 m, 1. 5

The Strong. ARM Architecture • Power: 200 m. W, 0. 25 m, 1. 5 V • Clock Speed: 200 MHz • Cache: - 16 KB I-cache - 8 KB D-cache - 32 -way set-associative, round-robin replacement - 512 B, 2 -way Minicache • 31/16 GPR (32 -bit) • Auto-increment addressing • No FP processor • MAC 6

The Tensilica Xtensa Architecture Processor Configuration • Power: 200 m. W, 0. 25 m,

The Tensilica Xtensa Architecture Processor Configuration • Power: 200 m. W, 0. 25 m, 1. 5 V • Clock Speed: 170 MHz • Cache: - 16 KB I-cache - 16 KB D-cache - Direct mapped • 32 Registers (32 -bits) • Xtensibility Use of TIE instructions • No FP processor • Zero overhead loops 7

Outline • • • Introduction Background Time and Energy Profiling Parallel Architectures Conclusions: Our

Outline • • • Introduction Background Time and Energy Profiling Parallel Architectures Conclusions: Our Dream Architecture Future Work 8

Profiling Results Profiler Output: -----------------------_fmul 18. 21% 0. 00% 188000 -----------------------lubksb 15. 27% 5.

Profiling Results Profiler Output: -----------------------_fmul 18. 21% 0. 00% 188000 -----------------------lubksb 15. 27% 5. 17% 10. 10% 10000 _fneq 0. 37% 0. 00% 14000 _fdiv 4. 23% 0. 00% 30000 _fmul 5. 03% 0. 00% 52000 _frsb 0. 46% 0. 00% 52000 Floating Point Strong. ARM Processor 68 J Xtensa Processor 144 J Energy = nom. core power #cycles clock period 9

Fixed Point Arithmetic • Floating Point vs. Fixed Point 1 8 23 S E

Fixed Point Arithmetic • Floating Point vs. Fixed Point 1 8 23 S E Mantissa 16 16 0000 • Add / Sub are straightforward • Multiply / Divide require shifting • Why can we use it for localization? • Low accuracy requirements • Limited range in measurements (< 10 m) • Small matrices small error propagation 10

Fixed Point Profiling Results Profiler Output: -----------------------_fmul 18. 21% 0. 00% 188000 -----------------------lubksb 15.

Fixed Point Profiling Results Profiler Output: -----------------------_fmul 18. 21% 0. 00% 188000 -----------------------lubksb 15. 27% 5. 17% 10. 10% 10000 _fneq 0. 37% 0. 00% 14000 _fdiv 4. 23% 0. 00% 30000 _fmul 5. 03% 0. 00% 52000 _frsb 0. 46% 0. 00% 52000 Floating Point Strong. ARM Processor 68 J Xtensa Processor 144 J Fixed Point Strong. ARM Processor 43 J (37% less) Xtensa Processor 69 J (52% less) Energy = nom. core power #cycles clock period 11

Outline • • • Introduction Background Time and Energy Profiling Parallel Architectures Conclusions: Our

Outline • • • Introduction Background Time and Energy Profiling Parallel Architectures Conclusions: Our Dream Architecture Future Work 12

Parallel Architectures - Write sequential code in Matlab - Extract data-dependencies - Workload analysis

Parallel Architectures - Write sequential code in Matlab - Extract data-dependencies - Workload analysis P CP 1 CP 2 CP 3 13

Outline • • • Introduction Background Time and Energy Profiling Parallel Architectures Conclusions: Our

Outline • • • Introduction Background Time and Energy Profiling Parallel Architectures Conclusions: Our Dream Architecture Future Work 14

Our Dream Architecture • • • Floating point hardware MAC hardware Zero overhead loops

Our Dream Architecture • • • Floating point hardware MAC hardware Zero overhead loops Auto increment Register file size Cache Direct mapped 15

Future Work • FPGA implementation • Xtensa customizations • TIE instructions • Floating Point

Future Work • FPGA implementation • Xtensa customizations • TIE instructions • Floating Point Coprocessor • Realistic algorithm for Pico. Radio 16

Many Thanks To… • Dr. Bart Kienhuis, EECS Post Doc • Ptolemy and other

Many Thanks To… • Dr. Bart Kienhuis, EECS Post Doc • Ptolemy and other tools: Parallel issues • Fred Burghardt, BWRC Technical Staff • Pico. Radio Testbed • Marlene Wan, BWRC Student • Strong. ARM Energy Profiling • Vandana Prabhu, BWRC Student • Tensilica Tools • The Berkeley Wireless Research Center 17