TECHNION ISRAEL INSTITUTE OF TECHNOLOGY Electrical Engineering Department

  • Slides: 16
Download presentation
TECHNION - ISRAEL INSTITUTE OF TECHNOLOGY Electrical Engineering Department Software Systems Lab Multi-Threading LAME

TECHNION - ISRAEL INSTITUTE OF TECHNOLOGY Electrical Engineering Department Software Systems Lab Multi-Threading LAME MP 3 Encoder Performed by : Gilad Riachshtian Copyright, 2004 © Gilad Raichshtain.

Talk Layout l l l What is the L. A. M. E. Project ?

Talk Layout l l l What is the L. A. M. E. Project ? Project Goal MP 3 Encoding & Hyper-Threading Overview Multi-Threading strategies Results & Remarks Future Work

What is the L. A. M. E. Project? An Open Source project l An

What is the L. A. M. E. Project? An Open Source project l An Educational Tool used for learning about MP 3 encoding l It’s goal is to improve l – Psycho-acoustics quality – The speed of MP 3 encoding l Lame is the most popular state of the art MP 3 encoder/decoder used by today’s leading products. FOR MORE INFO. . . http: //lame. sourceforge. netwww. lame. org/

Project Goal Speeding up the encryption of an audio stream l Turning LAME into

Project Goal Speeding up the encryption of an audio stream l Turning LAME into a Multi-Threaded (MT) engine l Be 1: 1 bit compatible with the original version l Optimize specifically for SMT platforms (implementation on Intel’s P 4 with Hyper. Threading Technology) l

Thread Level Parallelism Provides thread level parallelism on each processor l Resulting in l

Thread Level Parallelism Provides thread level parallelism on each processor l Resulting in l – Increased use of processor execution resources – Higher processing throughput l Achieved by duplicating the architectural state on each processor, while sharing one set of processor execution resources

MP 3 Encoding Overview Break up the audio stream into frames (uniform chunks, typically

MP 3 Encoding Overview Break up the audio stream into frames (uniform chunks, typically ~1 K) Frame 1 Frame. Audio 2 Frame Stream 3 Frame 4 Read Frame Perceptual Psycho. Acoustic Model Analysis Filterbank MDCT Quantization Specifically in LAME Bitstream Huffman Encoding Encode

LAME MT – Intuitive approach The intuitive approach: Frame 1 Frame 2 Frame 3

LAME MT – Intuitive approach The intuitive approach: Frame 1 Frame 2 Frame 3 Frame 4 Frame 5 Frame 6 Thread 1: Thread 2: An unbreakable dependence This is actually Data Decomposition due to Huffman Encoding

LAME MT – Functional Decomposition Frame 1 Frame 2 Frame 3 Frame 4 Frame

LAME MT – Functional Decomposition Frame 1 Frame 2 Frame 3 Frame 4 Frame 5 Frame 6 Floating Point Intensive T 1: Read Frame Psycho. Acoustic Analysis Filterbank MDCT Quantization T 2: Integer Intensive Huffman Encoding

Results

Results

Results due to Multi-Threading SMT Platform CBR / VBR SMP Platform CBR / VBR

Results due to Multi-Threading SMT Platform CBR / VBR SMP Platform CBR / VBR Using Microsoft’s Compiler 22% / 32% 38% / 62% Using Intel’s Compiler 8. 1 20% / 29% 44% / 59%

Results using Intel’s Compiler 8. 1 SMT Platform CBR / VBR SMP Platform CBR

Results using Intel’s Compiler 8. 1 SMT Platform CBR / VBR SMP Platform CBR / VBR LAME Original Code 3. 97 a 21% / 19% 22% / 17% LAME MT Code 19% / 17% 28% / 15%

Overall Performance Results SMT Platform CBR / VBR LAME MT code + Using Intel’s

Overall Performance Results SMT Platform CBR / VBR LAME MT code + Using Intel’s Compiler 8. 1 SMP Platform CBR / VBR 52% / 70% 78% / 109%

Remarks l Architectural Issues – Pitfall found in version 3. 93: • Memory access

Remarks l Architectural Issues – Pitfall found in version 3. 93: • Memory access to two different pages with the same offset • ~11% speedup achieved by fixing it • No longer relevant in later versions – No major arch issues found in versions 3. 94 -3. 97 a l Implement a PNI version for FFT – No significant gain achieved l Overall ~40 blocks of code were change and are under #ifdef

Future work

Future work

Future Work Splitting the encoding process into more than two steps l Reading frames

Future Work Splitting the encoding process into more than two steps l Reading frames in parallel l