Channel Equalization in MIMO Downlink and ASIP Architectures

Wireless System n n Downlink transmission in MIMO wireless system Physical layer of the

Motivation MIMO Downlink and Equalization n MIMO: high data rate and high spectral efficiency

Motivation ASIP Hardware Implementation n Future generations of mobile handsets: high speed, flexibility and

Thesis Contributions n Channel equalization in broad range of environments n n 16 -bit

Outline n Data model n Channel equalization n ASIP hardware implementations n Conclusions and

Data Model: Transmission Side n n Alternating symbols over transmit antennas Spreading: orthogonality between

Receiver Implementations n n RAKE Receiver, Multiuser Detector, Kalman filter, LMMSE equalization RAKE: n

LMMSE Equalization n Lower complexity in comparison with other receivers n n n Independent

LMMSE Equalization n Linear system to be solved: n Covariance: block Toeplitz and positive

LMMSE Approaches n LMMSE solution: n Cholesky decomposition n n Conjugate Gradient (CG) n

Equalization in Time-Varying Channels n Spatially correlated, frequency selective (multipaths), fading channels Data-rate: 1.

Channel Equalization: CG Algorithm n N samples: 4096 in slow fading channels

CG Equalization in Veh. A 30 km/h n n Sliding Window (SW) approach Faster

CG Equalization: Velocity of 120 km/h n Multiple sub-blocks instead of two blocks Partial

Architectural Alternative: LMS Equalization n Adaptive LMS:

Performance: Slow Fading Environments Pedestrian A – 3 km/h n n Pedestrian B –

Performance: Vehicular A 30 km/h n CG with sliding window (CG-SW): Improvement in comparison

CG–SW Approach: Fixed Point Vehicular A – 30 km/h n n 32 -bit floating

Performance: Velocity of 120 km/h Pedestrian A - 120 km/h n CG with sliding

Computational Complexity n n Number of operations per chip in 1 second CG filter

Directions for Architecture Implementation n Equalization in different environments n n Block CG, adaptive

ASIP Architecture for Equalization: Required Features n Flexible architecture able to operate in different

ASIP Architecture Based on TTA n Flexible architecture n n Customizable architecture n n

General Structure of TTA n Transport of operands triggers the appropriate operation as a

TTA Design Flow: MOVE Tool n Design space exploration for optimal architecture

Customization of ASIP n Implementation of application specific operations n n n User-defined Special

SFU: Complex Multiplication n Reduction of data transports between FUs n n n Less

Performance Improvement with SFUs n n Bus reduction of 50% Instruction word length reduction

TTA Processors for MIMO Equalization Two co-processors (CG equalization) 1. n n 2. n

Single Processor vs. Two Coprocessors n Single processor n n Smaller area and power

Processor Flexibility n n Identical customized processor for broad range of channel environments Identical

Example of Designed Processor Coprocessor for CG filter update

Hardware synthesis design flow n n n MOVEGen: generates VHDL representation of processor core

VHDL Template of TTA Processor n n Automatic VHDL generation of processor core, control

Move. Proc Synthesis on Xilinx FPGA n CG/LMS equalizer including user detection n no

Move. Proc Synthesis on Xilinx FPGA n Customized CG/LMS equalizer including user detection n

Gate Level CMOS Synthesis n Mentor Graphics Tools n 0. 5 m CMOS library

Conclusions n Equalization algorithms for broad range of channel environments n n n ASIP

Future Work n Processor layout synthesis n n n Implementation of hybrid word length

Acknowledgements n Thanks to: n n n n Professor Cavallaro Dr. De Baynast Professor

Slides: 41

Download presentation

Channel Equalization in MIMO Downlink and ASIP Architectures Predrag Radosavljevic Rice University March 29, 2004

Wireless System n n Downlink transmission in MIMO wireless system Physical layer of the mobile handset Linear channel equalization Hardware implementation using ASIP architectures

Motivation MIMO Downlink and Equalization n MIMO: high data rate and high spectral efficiency n n Interference from each antenna that introduces MAI DS-CDMA signals in multipath environment – user orthogonality is destroyed which causes ISI Solution: powerful channel equalization to mitigate ISI and MAI in order to restore user’s orthogonality Chip level channel equalization based on iterative CG and adaptive LMS algorithms

Motivation ASIP Hardware Implementation n Future generations of mobile handsets: high speed, flexibility and low power Traditional approaches: ASIC and DSP processors ASIC: n n n DSP: n n n No flexibility: Family of ASICs are needed High probability of design errors, high design cost Not optimized for a given application Often limited instruction and data level parallelism ASIP: n Tradeoff between efficiency of ASICs and flexibility of DSPs

Thesis Contributions n Channel equalization in broad range of environments n n 16 -bit fixed point implementation Flexible ASIP architecture design n Same hardware - different equalization (slow/fast fading, CG/LMS) Extension of ASIP instruction set with application-specific operations Customized architecture: n n n Real-time requirements for 1 x. EV-DV standard (1. 2288 Mc/s) Reasonable clock frequency (up to 150 MHz) and power dissipation Automatic hardware design: from C to gate level n Hardware synthesis for FPGA and CMOS libraries

Outline n Data model n Channel equalization n ASIP hardware implementations n Conclusions and future work

Data Model: Transmission Side n n Alternating symbols over transmit antennas Spreading: orthogonality between users Scrambling: Reduction of inter-cell interference Transmission over multipath correlated channels

Receiver Implementations n n RAKE Receiver, Multiuser Detector, Kalman filter, LMMSE equalization RAKE: n n n Multiuser Detectors: n n n Deteriorated performance in highly loaded system Not appropriate for MIMO environments High computational complexity Limited knowledge about the activity of other users Kalman filter: n n Optimal solution in the sense of MSE Prohibitive complexity in MIMO environments

LMMSE Equalization n Lower complexity in comparison with other receivers n n n Independent on the number of users Iterative Solutions Good performance in highly scattered environments LMMSE Receiver

LMMSE Equalization n Linear system to be solved: n Covariance: block Toeplitz and positive definite n n A and B: Toeplitz Hermitian matrices C: Toeplitz matrix

LMMSE Approaches n LMMSE solution: n Cholesky decomposition n n Conjugate Gradient (CG) n n n More complex hardware primitives Iterative solution, fast convergence Block algorithm – modifications for fast fading channels Least Mean Square (LMS) n n Adaptive algorithm Sensitivity to learning step

Equalization in Time-Varying Channels n Spatially correlated, frequency selective (multipaths), fading channels Data-rate: 1. 2288 MChips/sec n Antenna correlation: n n n Base Station: 50. 18% Mobile: 43. 99%

Channel Equalization: CG Algorithm n N samples: 4096 in slow fading channels

CG Equalization in Veh. A 30 km/h n n Sliding Window (SW) approach Faster variations: more frequent update of filter coefficients

CG Equalization: Velocity of 120 km/h n Multiple sub-blocks instead of two blocks Partial channel estimation for each sub-block n Apply weights for global channel estimation: n n Weights are adjusted according to the channel variations n If channel fading is faster, faster the coefficients drop to 0

Architectural Alternative: LMS Equalization n Adaptive LMS:

Performance: Slow Fading Environments Pedestrian A – 3 km/h n n Pedestrian B – 10 km/h From 32 -bit floating to 16 -bit fixed point Control of quantization error

Performance: Vehicular A 30 km/h n CG with sliding window (CG-SW): Improvement in comparison with basic CG

CG–SW Approach: Fixed Point Vehicular A – 30 km/h n n 32 -bit floating point and 16 -bit fixed point About 1 % BER difference

Performance: Velocity of 120 km/h Pedestrian A - 120 km/h n CG with sliding window and weights averaging n n Vehicular A 120 km/h CG-SW-WA with different numbers of sub-blocks Performance improvement if weights are applied

Computational Complexity n n Number of operations per chip in 1 second CG filter update is less complex n Reason: block-level filter update algorithm

Directions for Architecture Implementation n Equalization in different environments n n Block CG, adaptive LMS for slow fading environments Modifications of CG for fast fading channels Different computational complexity and amount of parallelism Flexible hardware for different equalizations and CG modifications n n Programmable architecture Application specific

ASIP Architecture for Equalization: Required Features n Flexible architecture able to operate in different channel environments n n n Architecture customization n n Implementation of application-specific operations Instruction and data level parallelism n n Slow/fast fading Low/high scattering Fast execution of complex algorithms Automatic hardware-software co-design n Fast processor design starting from C/C++ code of application

ASIP Architecture Based on TTA n Flexible architecture n n Customizable architecture n n Implementation of Special Function Units (SFUs) Instruction and data level parallelism n n n No limitations to add new FUs, buses, registers VLIW architecture principle Efficient and parallel data flow Fast processor design n n Automatic search for best processor VHDL processor representation

General Structure of TTA n Transport of operands triggers the appropriate operation as a side effect n n Only one instruction: “move” instruction 32 -bit architecture

TTA Design Flow: MOVE Tool n Design space exploration for optimal architecture

Customization of ASIP n Implementation of application specific operations n n n User-defined Special Function Units (SFUs) Sacrificing architecture generality for optimization and performance improvement Designed SFUs: n n Real multiplication with shifting ability Complex multiplication with shifting Sub-word arithmetic operations Sign-test and add/subtract

SFU: Complex Multiplication n Reduction of data transports between FUs n n n Less number of buses and smaller interconnection network Smaller instruction word Instruction and data parallelism is placed inside CXMUL

Performance Improvement with SFUs n n Bus reduction of 50% Instruction word length reduction of about 50%

TTA Processors for MIMO Equalization Two co-processors (CG equalization) 1. n n 2. n Co-processor for updating equalizer coefficients Co-processor filtering and user detection Single processor for all parts of equalization algorithm (CG/LMS equalization) Identical architectures for slow and fast fading environments

Single Processor vs. Two Coprocessors n Single processor n n Smaller area and power dissipation Higher clock frequency

Processor Flexibility n n Identical customized processor for broad range of channel environments Identical processor for LMS and CG equalization

Example of Designed Processor Coprocessor for CG filter update

Hardware synthesis design flow n n n MOVEGen: generates VHDL representation of processor core Xilinx tools for fast FPGA prototyping Mentor Graphics tools for CMOS gate level design

VHDL Template of TTA Processor n n Automatic VHDL generation of processor core, control and interconnection FUs, SFUs, peripherals: pre-designed or defined by user

Move. Proc Synthesis on Xilinx FPGA n CG/LMS equalizer including user detection n no SFUs 32 buses Xilinx FPGA part: XC 2 V 8000 n n Slices: 38, 757 out of 46, 592 BRAMs: 148 out of 168 IOBs: 263 out of 1108 MULT 18 x 18 s: 24 out of 168

Move. Proc Synthesis on Xilinx FPGA n Customized CG/LMS equalizer including user detection n with SFUs 16 buses Xilinx FPGA part: XC 2 V 6000 n n Slices: 21, 126 out of 33, 792 BRAMs: 107 out of 144 IOBs: 229 out of 1104 MULT 18 x 18 s: 11 out of 144

Gate Level CMOS Synthesis n Mentor Graphics Tools n 0. 5 m CMOS library n n Customized CG/LMS equalizer including user detection (with SFUs) Synthesis estimate of processor core: 182, 887 gates

Conclusions n Equalization algorithms for broad range of channel environments n n n ASIP architecture design based on TTA n n Slow fading: CG/LMS Fast fading: Modifications of basic CG equalization Same architecture – different equalization algorithms Optimization with application-specific operations Reasonable frequency and power dissipation for 3 GPP data rate Fast processor design n n VHDL representation of optimal processor FPGA synthesis and CMOS gate level synthesis

Future Work n Processor layout synthesis n n n Implementation of hybrid word length n n IC Station software tool from Mentor Graphics Precise timing, area, and power analysis Reduced precision for filter application part Implementation on C 5 x DSP for comparison

Acknowledgements n Thanks to: n n n n Professor Cavallaro Dr. De Baynast Professor Aazhang Dr. Dabak Dr. Sabharwal Texas Instruments Nokia