Adaptive beamforming using QR in FPGA Richard Walke
Adaptive beamforming using QR in FPGA Richard Walke, Real-time System Lab Advanced Processing Centre S&E Division
3 Contents 1 Architecture of adaptive beamformer 2 FPGA components • Digital receiver • QR processor – for adaptive weight calculation 3 Design methodology 4 Demonstration overview 5 Conclusions
Section 1 Architecture of adaptive beamformer
Architecture of adaptive beamformer 5 Adaptive beamformer Rx DRx W 1 Rx DRx + W 2 Rx DRx Antenna Analogue Digital Array receivers Wn Adaptive Weight Calculation Spatial filter Beamformed signal
Section 2 FPGA components
FPGA Components 7 Software configurable FIR Radar receiver specification Array of programmable ‘tap-processors’ 2 Image reject Bandwidth control Complex equaliser Runtime configuration parameters
FPGA Components Software configurable FIR • Software programmable parameters include: – filter length – decimation ratio – complex/real arithmetic – number of channels – time varying filtering (inter & intra-pulse) • Performance 20 -30 GOPS on XC 2 V 6000 -5 – 100 GOPS on Virtex 2 Pro (2003) 8
FPGA Components 9 Weight calculation using QR • Building block for a range of adaptive algorithms – Sample matrix inversion (SMI) – Soft constraints – ESMI Beamforming weights Antenna data – STAP Constraints Pre-processor Post-processing QR decomposition RISC/DSP FPGA
FPGA Components 10 QR decomposition Vectorise cell: 11 real floatingpoint operations (r, x) x 1 Input data x 2 x 3 x 4 y r 1, 1 r 1, 2 r 1, 3 r 1, 4 u 1 r 2, 2 r 2, 3 r 2, 4 u 2 Rotate cell: 16 real floating-point operations (r’, 0) r 3, 3 cos = sin = r r 2+x 2 r 3, 4 u 3 r 4, 4 u 4 (u’, y’) (u, y) u’ cos sin u = y’ -sin cos y x r 2+x 2 For SMI: Rw=u
FPGA Components Features of QR • Good numerical properties. Arithmetic choices: – CORDIC: shift-add – Fixed-point: multiply-add – Floating-point: Higher dynamic range, allows algorithms with fewer operations & lower wordlength. Smallest! • Highly parallel (Givens rotations) – Suits FPGA – Need to reduce parallelism for many applications! 11
FPGA Components 12 Obtaining lower-levels of parallelism 1. Mixed mapping 1, 1 1, 2 1, 3 1, 4 1, 5 1, 6 100% 67% utilisation 2, 2 2, 3 2, 4 2, 5 2, 6 83% 3, 3 3, 4 3, 5 3, 6 67% 4, 4 4, 5 4, 6 50% 2. Discrete mapping 20% 40% 5, 5 5, 6 33% 60% 80% 100% 67% utilisation
FPGA Components 13 Novel mapping of QR Linear systolic array
FPGA Components 14 FPGA implementation XCV 3200 E-8 11 16 8 x. FP multipliers 16 Complex 2 x Complex multiplier complex multiplier adder FP FP add FP divider vectorise processor rotate processor 16 16 processor rotate processor 16 16 Number of Ops 139 14 -bit FP operators @ 160 MHz = 22 GFLOPS
FPGA Components 15 QR processor - main features Size Mantissa wordlength 1 Boundary 14 -bit 3 Internal mantissa 1 Boundary 12 Internal 1 Boundary 9 Internal 14 -bit mantissa 17 -bit mantissa Clock Utilisation (XC 2 V 6000) Mults Rams 3 5 101 MHz LUTS FFs 32 34 15 K 16 K 100 MHz 2 82%2 97 MHz 74% 22% 23% Pentium. TM 4 2 GHz 1 1 Estimated (based on data from Richard Linderman) 2 Estimated (design too large for PC) 3 Also depends upon number of inputs 4 Obtained via Xpower 5 For XC 2 V 6000 -5 6 Extrapolated Operations Power 6 GFLOPS 2. 24 W 4 20. 3 GFLOPS 8 W 6 15 GFLOPS 4 GFLOPS 7 W 6 70 W x 50
Section 3 Heterogeneous design methodology
Heterogeneous design methodology GEDAETM • Graphically specify system 17
Heterogeneous design methodology GEDAETM • Graphically specify system – primitive functions in ‘c’ 18
Heterogeneous design methodology GEDAETM • Graphically specify system – primitive functions in ‘c’ – executable specification 19
Heterogeneous design methodology GEDAETM • Graphically specify system – primitive functions in ‘c’ – executable specification • Auto-code generation – parallel programme constructed by GEDAE 20
Heterogeneous design methodology GEDAETM • Graphically specify system – primitive functions in ‘c’ – executable specification • Auto-code generation – parallel programme constructed by GEDAE • Currently no support for FPGA – highly compatible model 21
Heterogeneous design methodology 22 Core based methodology • Cores used for key functions – FFT, QR, FIR filter. . . – Build in parallelism (manually) – Parameterised Processor
Heterogeneous design methodology 23 Core based methodology • Cores used for key functions – FFT, QR, FIR filter. . . – Build in parallelism (manually) Processor – Parameterised • Automatically generated system – communications inserted FFT core from library FPGA Processor
Heterogeneous design methodology 24 Core based methodology • Cores used for key functions – FFT, QR, FIR filter. . . – Build in parallelism (manually) Processor – Parameterised • Automatically generated system – communications inserted FFT core from library • Architectural exploration – Compaan gives Matlab NLP to VHDL – RTL output in future version FPGA Processor
Section 4 Adaptive beamformer demonstration overview
26 Demonstration overview System mapping Host PC (laptop) Power. PC (DNA VQG 4 VME) Host PC (laptop) Delay Environme nt synthesis and system configurati on Sample Backsubstitution Verification and display Virtual Channel API PCI IF Virtual channel IF Chan 0 Chan 1 Chan 2 DRx FPGA (Trans. Tech PMC) QR Chan 3
27 Conclusion • FPGAs – Performance dependent upon level of optimisation – Floating-point is realistic – 10 x compute improvement – 5 - 20 x power improvement • Design is main issue – Hardware design: High levels of parallelism required – Core-based design approaches offer interim solution – Architectural synthesis tools are emerging
28 Acknowledgements • The project to develop a core-based design methodology is a collaboration between: – Qineti. Q Ltd • poc: rlwalke@qinetiq. com – BAE SYSTEMS ATC, Gt Baddow • poc: ian. alston@baesystems. com • Contributions have been made by John Mc. Allister under contract with the Queen’s University of Belfast. • This work was sponsored by the United Kingdom Ministry of Defence Corporate Research Programme.
- Slides: 29