Software Defined Radio A High Performance Embedded Challenge
Software Defined Radio – A High Performance Embedded Challenge Hyunseok Lee, Yuan Lin, Yoav Harel, Mark Woh, Scott Mahlke, Trevor Mudge, and 1 Krisztian Flautner University of Michigan 1 ARM Ltd
Contents Software defined radio n Categories of wireless networks n Core technologies for future networks n Case study : W-CDMA Network n ¨ Major algorithms ¨ Workload characterization ¨ Architectural implications Advanced Computer Architecture Laboratory University of Michigan 2
Software Defined Radio
Wireless Communication System Transport Network LINK TCP/UDP IP Baseband Processing Analog Front-end PPP MAC Application bits Upper Protocol Layers Packets Physical Layer (PHY) Advanced Computer Architecture Laboratory University of Michigan “Air” 4
Anatomy of Cellular Phone Advanced Computer Architecture Laboratory University of Michigan 5
Protocol on Wireless Platform Source coding Audio Video AMR/QCELP MPEG Transport Upper layers Physical layer Network GPP DSP/ (Software) Accelerator GPP (Software) LINK MAC PHY Application Processor Baseband Processor ASIC (Hardware) Advanced Computer Architecture Laboratory University of Michigan 6
Software Defined Radio (SDR) n Use software routines instead of ASICs for the physical layer operations of wireless communication system ASICs (PHY) n Software Routines Programmable Hardware Both Analog Frontend and Digital Baseband are the scope of SDR Advanced Computer Architecture Laboratory University of Michigan 7
Levels of SDR Tier Name Description Tier 0 Hardware Radio (HR) Implemented using hardware components. Cannot be modified Tier 1 Software Controlled Radio (SCR) Only control functions are implemented in software: inter-connects, power levels, etc. Tier 2 Software Defined Radio (SDR) Software control of a variety of modulation techniques, wide-band or narrow-band operation, security functions, etc. Tier 3 Ideal Software Radio (ISR) Tier 4 Ultimate Software Radio (USR) Programmability extends to the entire system with analog conversion only at the antenna. Defined for comparison purposes only <source: http: //www. sdrforum. org> Advanced Computer Architecture Laboratory University of Michigan 8
Why we need SDR ? n Seamless wireless connection – End User ¨ Widely different wireless protocols n n n ¨ n Needs a terminal that can support multiple wireless protocols Easy infrastructure upgrade – Service Provider ¨ Wireless protocols evolve continuously n n TDMA : GSM, AMPS CDMA : IS-95, cdma 2000, W-CDMA, IEEE 802. 11 b OFDM : IEEE 802. 11 a/g/n, Wi. MAX Ex) W-CDMA + HSDPA Time to market – Manufacturer ¨ Reduce hardware development time and cost Advanced Computer Architecture Laboratory University of Michigan 9
Where can we use SDR ? n Basestations ¨ Weak constraints on power and area ¨ Support several hundred subscribers ¨ Will be commercialized first n Wireless terminals ¨ Tight constraints on power and area. ¨ Will be commercialized next Advanced Computer Architecture Laboratory University of Michigan 10
Why SDR is challenging ? n Analog Frontend ¨ n Must be tunable across a range of carrier frequencies and bandwidths. Digital Baseband ¨ Super computer level computation power. n ¨ Tight power budget. n ¨ > 50 Gops per subscriber 200 ~ 300 m. W (@terminal) High level of programmability. n Combination of heterogeneous signal processing algorithms. Advanced Computer Architecture Laboratory University of Michigan 11
Our Strategy n Performance ¨ n Exploit the parallelism in signal processing and forward error correction (FEC) algorithms Power Limit the programmability to minimize power consumption. ¨ Minimize both active and idle mode power consumption ¨ n There exists trade off between power efficiency and programmability Advanced Computer Architecture Laboratory University of Michigan 12
Categories of Wireless Networks
Categories of Wireless Networks <source : Wireless communication technology landscape, DELL > Advanced Computer Architecture Laboratory University of Michigan 14
WWAN (Wireless Wide Area Network) Advanced Computer Architecture Laboratory University of Michigan 15
WLAN / WMAN n n n n WLAN : Wireless Local Area Network High data rate Poor mobility support WMAN : Wireless Metro Area Network For last mile problem 802. 16 d : Fixed Wi. Max 802. 16 e : Mobile Wi. Max Advanced Computer Architecture Laboratory University of Michigan 16
WPAN (Wireless Personal Area Network) n Interconnecting personal devices Advanced Computer Architecture Laboratory University of Michigan 17
Core technologies of future networks
OFDM (Orthogonal Frequency Division Multiplexing) n n n Transmit signal over several sub-carriers. Frequency spectrum of sub-carriers are overlapped. (High spectral efficiency) Highly susceptible to frequency error in receiver. Advanced Computer Architecture Laboratory University of Michigan 19
Major Computation in OFDM system n FFT / IFFT N = 64 : IEEE 802. 11 a ¨ N = 256~2048 : IEEE 802. 16 Wi. Max ¨ Data precision : 12~16 bits ¨ n Amount of computations for OFDM operation ¨ ~ 108 complex multiplications / sec Advanced Computer Architecture Laboratory University of Michigan 20
MIMO (Multiple Input Multiple Output) n n Use multiple antennas for signal transmission and reception In ideal case, linearly increase channel capacity Can effectively compensate multipath fading effect Significantly increase receiver complexity <Single Input Single Output (SISO)> Channel Capacity C = W log 2(1+SNR) <Multiple Input Multiple Output (MIMO)> Channel Capacity C = min(n, m) * W log 2(1+SNR) Advanced Computer Architecture Laboratory University of Michigan 21
Computation in MIMO receiver n Amount of computation in MIMO receiver M : # of Tx/Rx antenna ¨ LT : Length of preamble ¨ LP : Length of payload ¨ n 4 Tx/Rx antenna, 100 Mbps, 64 QAM, ½ coding rate ¨ ~ 6 x 108 Computations / Sec <source: B. Hassibi, An Efficient Square-Root Algorithm for BLAST> Advanced Computer Architecture Laboratory University of Michigan 22
LDPC code n Low Density Parity Check (LDPC) code ¨ n Turbo code like coding gain with lower implementation cost. Encoding Matrix multiplication, c = x. G ¨ G (Generator matrix) is large matrix. (e. g. 4 K X 4 K matrix) ¨ n Decoding Equivalent to find most probable vector x such that Hx mod 2 = 0. ¨ H (Parity check matrix) is large sparse matrix. ¨ n Implementation ¨ There exist trade-off between coding gain and implementation complexity Advanced Computer Architecture Laboratory University of Michigan 23
Hybrid ARQ n n Reuse error frames for the decoding of retransmitted frame Require huge buffer space Advanced Computer Architecture Laboratory University of Michigan 24
Case Study : W-CDMA system
Major Algorithms
Physical layer of W-CDMA Suppress the signal term in outside of stop band Error Correction Overcome severe error in short time interval Assignal waveform optimal for data transmission Advanced Computer Architecture Laboratory University of Michigan 27
Channel Encoder/Decoder n Encoder ¨ n Decoder ¨ n Add systematic redundancy on source data Fix errors on received data with the systematic redundancy information generated by encoder W-CDMA system uses Convolutional code (for short voice and control message) ¨ Turbo code (for video stream and high speed packet data) ¨ Advanced Computer Architecture Laboratory University of Michigan 28
Channel Encoder n n Consists of flip-flops and exclusive OR gates Has negligible impact on workload Input D D D D Output 0 G 0 = 561 ( octal) Output 1 G 1 = 753 ( octal) <convolutional encoder of W-CDMA system> Advanced Computer Architecture Laboratory University of Michigan 29
Channel Decoder n n n Determine maximally probable code sequence from the received sequence. Select C having minimum distance with received sequence r One of dominant workload C 1 d 1 C 2 - {ci} : code set r d 2 d. N . . . - r : received signal CN Advanced Computer Architecture Laboratory University of Michigan 30
Channel Decoder – Viterbi Algorithm n n n Most popular decoding algorithm of convolutional code Consists of three steps: ¨ Branch metric calculation (BMC) n abs(a-b), Parallelizable ¨ Add compare select (ACS) n min(a+b, c+d), Parallelizable ¨ Trace back (TB) n Recursive pointer tracing, Sequential Amount of operation in W-CDMA ¨ 16 Kbps voice : ~2 Gops Advanced Computer Architecture Laboratory University of Michigan 31
Channel Decoder –Turbo decoder n Two algorithms are widely used ¨ SOVA (Soft Output Viterbi Algorithm) n Less computation intensive n Lower error correction performance ¨ Max-Log. Map algorithm n More computation required n Higher error correction performance n Amount of operation in W-CDMA ¨ For 128 Kbps streaming data : ~18 Gops Advanced Computer Architecture Laboratory University of Michigan 32
Turbo Decoder n n Based on the multiple iteration of SOVA / Max-Log. Map blocks. More iterations show better performance. <High level block diagram of turbo decoder> Advanced Computer Architecture Laboratory University of Michigan 33
Block Interleaver/Deinterleaver n n Overcome severe signal attenuation within short time interval which frequently appears at wireless channel. Interleaver (@transmitter): ¨ n Deinterleaver (@receiver): ¨ n Randomize the sequence of source data. Recover original sequence by reordering. Amount of operation : < 10 Mops <example of signal strength variation> Interleaving 123456789 Deinterleaving 147258369 Advanced Computer Architecture Laboratory University of Michigan 123456789 34
Spreader/Despreader n n Allow the transmission of several signals at the same time. (x[n] and y[n] in the below diagram) It is based on the orthogonality between spreading codes <orthogonality between codes> Advanced Computer Architecture Laboratory University of Michigan 35
Spreader/Despreader n Spreader / Despreader also suppress noise n Amount of operation : ~4 Gops Advanced Computer Architecture Laboratory University of Michigan 36
Scrambler/Descrambler n n n Randomize the output signal by multiplying pseudo random sequence so called scrambling code. Allow multiple terminals to communicate at the same time. Amount of operation : ~ 3 Gops Terminal 1, with scrambling code n Advanced Computer Architecture Laboratory University of Michigan Terminal 2, with scrambling code m 37
Low Pass Filter n Suppress the signal terms at the outside of stop band frequency. Impulse signal sinc function Time domain Filtering Band limited signal Freq. domain Band unlimited signal <Input signal> Advanced Computer Architecture Laboratory University of Michigan <Output signal> 38
Low Pass Filter n Use conventional FIR filter n Number of filter tap (N) = 32 ~ 64 Amount of operation : ~ 12 Gops n Advanced Computer Architecture Laboratory University of Michigan 39
Rake Receiver – Multipath fading n n Rake receiver mitigates multipath fading effect Multipath fading is a major cause of unreliable wireless channel characteristic x(t) y(t) = a 0 x(t)+a x(t) 1 x(t-d 1)+a ) 2 x(t-d 2) Advanced Computer Architecture Laboratory University of Michigan 40
Rake Receiver - Functions Ideally the function of rake receiver is to aggregate the signal terms with proper delay compensation n y(t) = a 0 x(t)+a 1 x(t-d 1)+a 2 x(t-d 2) Rake receiver r(t) = a 0 x(t-tdealy)+a 1 x(t-d 1 -dest 1)+a 2 x(t-d 2 -dest 2) = (a 0+a 1+a 2) * x(t-tdelay) n We need to know delay spread of received signal that randomly varies Advanced Computer Architecture Laboratory University of Michigan 41
Rake Receiver – Detect Delay Spread n Scan the received signal in frame buffer while computing correlation with scrambling code sequence. Correlation window Received signal Correlation Result a 1 a 2 a 0 0 d 1 d 2 Advanced Computer Architecture Laboratory University of Michigan 42
Computation of Rake Receiver n Correlation computation : LWLBF ¨ LW : Correlation window = 320 ¨ LB: Frame buffer size = 5120 ¨ F : Operation Frequency = 50 ¨ ~ 80 Mega Multiplications / sec ¨ Multiplications can be converted into subtraction n n Amount of operation in W-CDMA : ~25 Gops Most dominant workload Advanced Computer Architecture Laboratory University of Michigan 43
Rake Receiver – Overall Architecture Detects delay spread Compensates propagation delay recombine signal terms without delay Advanced Computer Architecture Laboratory University of Michigan 44
: Pilot Signal Power Control n n u : Power Control Command Receiver controls the transmission power of transmitter in order to minimize the interference to other users. Required computation is negligible Strength of pilot signal is below the reference level Strength of pilot signal is above the reference level Refrence level Terminal Basestation u d u u d d u Terminal sends DOWN command Terminal sends UP command Advanced Computer Architecture Laboratory University of Michigan 45
H/W operation states • For long idle period between sessions • Periodic wake up for control message reception • Minimum workload but dominate terminal standby time Idle • For short idle period between packet burst • Hold narrow control channel for fast transition to Active • Intermediate workload Control Hold • For packet burst transmission period • Use high speed packet channels up to 2 Mbps • Most heavily loaded state Radio resource control state defined in W-CDMA specification Advanced Computer Architecture Laboratory University of Michigan Active operation states defined according to H/W activity 46
Workload Characterization
Workload Profile n One operation Searcher, Workload profile Turbo is varies decoder, equivalent according and to one LPFto. RISC are operation dominant instruction state workloads Advanced Computer Architecture Laboratory University of Michigan 48
Processing Time Requirement n n Mixture of algorithms with various processing time requirements Classified into two categories Heavy workload with long processing time (turbo decoder, searcher) ¨ Light workload with short processing time (Scrambler, spreader, LPF, Power control) ¨ Advanced Computer Architecture Laboratory University of Michigan 49
Parallelism n Most heavyofworkload algorithms Data width most operation is 8 have bit significant vector parallelism Advanced Computer Architecture Laboratory University of Michigan 50
Memory Access Pattern n Huge memory is not required Traffic between algorithm is not dominant Access rate of scratch pad memory is very high. Advanced Computer Architecture Laboratory University of Michigan 51
Instruction Breakdown n n ADD/SUB are dominant instruction Multiplication is not dominant in heavy workloads Advanced Computer Architecture Laboratory University of Michigan 52
Frequent Computations n n Most multiplications are simplified into cheaper operations Multiplication in LPF-Rx can not be simplified because both operands are 16 bit integer number. Advanced Computer Architecture Laboratory University of Michigan 53
Architectural Implications
Architectural Implications n SIMD because We can exploit vector parallelism in W-CDMA algorithms ¨ Highly power efficiency can be achieved by sharing control logic between datapath elements. ¨ n Chip multiprocessor because SIMD …. SIMD Interconnection Network There exist substantial algorithm level parallelism ¨ There exist many tiny sequential algorithms ¨ Multiple SIMD + Scalar ¨ Advanced Computer Architecture Laboratory University of Michigan Scalar 55
Architectural Implications n Memory structure ¨ Cache free n Memory access pattern exhibits very dense spatial locality. ¨ Small data memory (<64 K) ¨ Small instruction memory (<4 K) n Simple interconnection network ¨ Low inter-processor communication is possible by algorithm level task mapping on each PE. Advanced Computer Architecture Laboratory University of Michigan 56
Architectural Implication n Power management ¨ Large workload variation according to operation state and radio channel condition change. ¨ Various power management schemes can be applied n DVS, DFS, Clock gating. ¨ Idle mode power must be minimized because it dominates terminal standby time. Advanced Computer Architecture Laboratory University of Michigan 57
W-CDMA benchmark suite n n n C based implementation of W-CDMA physical layer operation. Used for the workload characterization done in this paper. Available at ¨ www. eecs. umich. edu/~sdrg Advanced Computer Architecture Laboratory University of Michigan 58
Conclusion n We discussed : ¨ what is SDR and why it is challenging topic for embedded system. ¨ the evolution history of wireless protocols and what are the core technologies of emerging protocols. n We analyzed : ¨ the workload characteristic of W-CDMA protocol and its architectural implication. Advanced Computer Architecture Laboratory University of Michigan 59
Backup Slides
Viterbi Algorithms –Trellis Diagram n n Viterbi algorithm is based on trellis diagram. Trellis diagram represents all possible state transition of encoder. < Example of trellis diagram and corresponding convolutional encoder> Advanced Computer Architecture Laboratory University of Michigan 61
Viterbi Algorithm - BMC n BMC (Branch metric calculation) operation is to compute difference between the received sequence r and outputs of trellis diagram. BMCi, j = distance(rij, oij)=abs(rij, oij) oij : output of state transition form i to j rij : corresponding received sequence Cn distance between r(01) and Cn(10) = 1 + 1 = 2 n All BMC operation in a trellis diagram can be done in parallel. Advanced Computer Architecture Laboratory University of Michigan 62
Viterbi Algorithm - ACS Compare, Select Add n ACS(Add Compare Select) operation is: n This procedure is equivalent to finding a local optimal code sequence. If C 1 has smallest ACS value at node state i, then the ACS values of C 2 and C 3 are always greater than that of C 1 n Advanced Computer Architecture Laboratory University of Michigan 63
Viterbi Algorithm - TB n n Trace back a code sequence which is most close to the received sequence Sequential algorithm Advanced Computer Architecture Laboratory University of Michigan 64
Block Interleaver/Deinterleaver n Interleaver Write row by row sequentially ¨ read column by column according to the predefined permutation pattern ¨ n Deinterlever Write column by column according to the predefined permutation pattern ¨ read row by row sequentially ¨ <interleaving procedure> Advanced Computer Architecture Laboratory University of Michigan 65
- Slides: 65