Implementing LDPC Decoding on NetworkOnChip T Theocharides G

  • Slides: 25
Download presentation
Implementing LDPC Decoding on Network-On-Chip T. Theocharides, G. Link, N. Vijaykrishnan, M. J. Irwin

Implementing LDPC Decoding on Network-On-Chip T. Theocharides, G. Link, N. Vijaykrishnan, M. J. Irwin Penn State University International Conference on VLSI Design 2005 Team LDPC, So. C Lab. Graduate Institute of CSIE, NTU

Outline • • • Intro Message Passing Iterative Decoding Word length Processing Elements Virtual

Outline • • • Intro Message Passing Iterative Decoding Word length Processing Elements Virtual & Physical nodes Network on Chip Packets Message Decoding Behavior Bit node PE & Check node PE Power Optimization Conclusion & Comparison Team LDPC, So. C Lab. Graduate Institute of CSIE, NTU

Intro • Addressing problem are either limited in the types of LDPC codes, or

Intro • Addressing problem are either limited in the types of LDPC codes, or constrained by hardware. • Reconfigurable for different block sizes and code rates. Team LDPC, So. C Lab. Graduate Institute of CSIE, NTU

Message Passing • Start from bit function unit • Message passing iterations are performed

Message Passing • Start from bit function unit • Message passing iterations are performed by the two computation units. Team LDPC, So. C Lab. Graduate Institute of CSIE, NTU

Iterative Decoding • Check node operation where – This function is implemented by using

Iterative Decoding • Check node operation where – This function is implemented by using a ROM based look-up table (LUT). Team LDPC, So. C Lab. Graduate Institute of CSIE, NTU

Iterative Decoding • Bit node operation • stored_llr describes the previously stored logarithmic likelihood

Iterative Decoding • Bit node operation • stored_llr describes the previously stored logarithmic likelihood ration for the bit. Team LDPC, So. C Lab. Graduate Institute of CSIE, NTU

Word length • Word length is critical parameter. – Performance – Power consumption •

Word length • Word length is critical parameter. – Performance – Power consumption • A large data word results in a lower BER even in noisy channels. – Sign-magnitude representation – 16 bit word length Team LDPC, So. C Lab. Graduate Institute of CSIE, NTU

Processing element (PE) • Bit and check nodes act as PEs. • PEs communicate

Processing element (PE) • Bit and check nodes act as PEs. • PEs communicate via on-chip routers. • Each PE has a dedicated memory to store configuration information. Team LDPC, So. C Lab. Graduate Institute of CSIE, NTU

Virtual & Physical nodes • Virtual nodes cannot be mapped all at once on

Virtual & Physical nodes • Virtual nodes cannot be mapped all at once on a single chip. a b c d e f VN A B PN Team LDPC, So. C Lab. Graduate Institute of CSIE, NTU C D E F G H I

Network on Chip (No. C) • Inter-PE communication is handled by an on-chip network

Network on Chip (No. C) • Inter-PE communication is handled by an on-chip network consisting of a number of small on-chip routers. • A full packet of data moves one hop per clock cycle. • Losing a single packet is catastrophic to LDPC computation. Team LDPC, So. C Lab. Graduate Institute of CSIE, NTU

Packets • Physical destination address • Virtual identification information • Packet Mark (1 bit)

Packets • Physical destination address • Virtual identification information • Packet Mark (1 bit) Team LDPC, So. C Lab. Graduate Institute of CSIE, NTU

Packet Size • 48 bit for PN * 25, VN * 64, MAX=16 –

Packet Size • 48 bit for PN * 25, VN * 64, MAX=16 – Header 16 bit • • Physical address 5 bit Virtual address 6 bit Max 4 bit Reserved 1 bit – Data 16 bit * 2 • Word length 16 bit Team LDPC, So. C Lab. Graduate Institute of CSIE, NTU

Message Decoding Behavior • Analog signal arrives and is converted to llr values after

Message Decoding Behavior • Analog signal arrives and is converted to llr values after ADC conversion, the llr values are grouped into message blocks. • Two blocks are decoded in parallel. (66% network traffic) Team LDPC, So. C Lab. Graduate Institute of CSIE, NTU

Bit Node PE • Node has 48 bit input, output ports. • Data concentrator

Bit Node PE • Node has 48 bit input, output ports. • Data concentrator directs values to accumulator base on VID. • Once all input values for a given virtual bit node are received, the computation proceeds to the execution unit. Team LDPC, So. C Lab. Graduate Institute of CSIE, NTU

Check Node PE • Node supports the simultaneous decoding of two independent message blocks.

Check Node PE • Node supports the simultaneous decoding of two independent message blocks. Team LDPC, So. C Lab. Graduate Institute of CSIE, NTU

Results • 16 physical bit nodes • 9 physical check nodes • 64 virtual

Results • 16 physical bit nodes • 9 physical check nodes • 64 virtual nodes • 2 D Mesh topology Team LDPC, So. C Lab. Graduate Institute of CSIE, NTU

Power Consumption • 750 Mbps@500 MHz 34. 8 W (N=1024) – interconnect 43% –

Power Consumption • 750 Mbps@500 MHz 34. 8 W (N=1024) – interconnect 43% – check nodes 23% – bit nodes 22% – leakage 12% Team LDPC, So. C Lab. Graduate Institute of CSIE, NTU

Power Consumption (detail) • A range between 25% and 40% of the total data

Power Consumption (detail) • A range between 25% and 40% of the total data passed between each node are either zero or infinity. (High switch activity) Team LDPC, So. C Lab. Graduate Institute of CSIE, NTU

Encoding Values • If the result is either zero or infinity, we set S

Encoding Values • If the result is either zero or infinity, we set S 1 and S 2 to corresponding value. Team LDPC, So. C Lab. Graduate Institute of CSIE, NTU

Result • 750 Mbps@500 MHz (N=1024) – 34. 8 W 30. 36 W (-12.

Result • 750 Mbps@500 MHz (N=1024) – 34. 8 W 30. 36 W (-12. 75%) Team LDPC, So. C Lab. Graduate Institute of CSIE, NTU

Early Termination Team LDPC, So. C Lab. Graduate Institute of CSIE, NTU

Early Termination Team LDPC, So. C Lab. Graduate Institute of CSIE, NTU

Result (+Early Termination) • 750 Mbps@500 MHz (N=1024) – 34. 8 W 30. 36

Result (+Early Termination) • 750 Mbps@500 MHz (N=1024) – 34. 8 W 30. 36 W (-12. 75%) 24. 32 W (-30%) Team LDPC, So. C Lab. Graduate Institute of CSIE, NTU

Conclusion • Network-on-chip interconnect is scalable. • Multiple LDPC codes of varying types are

Conclusion • Network-on-chip interconnect is scalable. • Multiple LDPC codes of varying types are supported. • Design can be extended into reconfigurable low-power decoders. Team LDPC, So. C Lab. Graduate Institute of CSIE, NTU

My conclusion • Distance between nodes must be considered. • Parameters (number of physical

My conclusion • Distance between nodes must be considered. • Parameters (number of physical nodes, node placement) are the keys. • Two messages must have the same latency. • Syndrome test can offer early termination Team LDPC, So. C Lab. Graduate Institute of CSIE, NTU

Comparison This design Our design Number Representation Sign-magnitude 2’s complement & sign-magnitude Code type

Comparison This design Our design Number Representation Sign-magnitude 2’s complement & sign-magnitude Code type Arbitrary QC-LDPC Word Length 16 bit Data Overhead Packet header (48 bit) None Rate ¾ ½ ROM/LUT Required Not required HUE N/A >97. 56% (40 iter) Memory Req. Double size Single size Strategy Dual code simultaneous Single Algorithm SPA SMSA/SPA Addressing Router & HFT Counter Reconfigurable Feasible Team LDPC, So. C Lab. Graduate Institute of CSIE, NTU