Efficient Performance Essential Performance Advanced Performance Distributed Performance
Efficient Performance Essential Performance Advanced Performance Distributed Performance Developing Intel® AVX Optimized Microsoft* Real -Time Audio (MSRTA) Codec using Intel® IPP Naveen Gv Intel Corporation Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 1
Objectives • How Intel® Integrated Performance Primitives (Intel® IPP) can provide the building blocks to develop Microsoft* Real Time Audio (MSRTA) codec on the latest Intel microprocessor, codenamed ‘Sandy Bridge’. • This session describes how to use Intel IPP to build Intel® Advanced Vector Extensions (Intel® AVX) optimized Microsoft* Real Time Codec for Vo. IP applications and provides performance results for the Sandy Bridge microprocessor-based platform. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 2
Agenda § Introduction § Intel® AVX Optimization § Unified Speech Component (USC) § USC MSRTA Codec § Demo – Intel IPP – USC MSRTA Codec § Performance of USC MSRTA codec § Summary § References Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 3
Intel® Parallel Studio XE 2011 Powerful tools to create fast, reliable and secure code Phase Productivity Tool Feature Benefit Advanced Build & Debug Intel® Composer XE C/C++ and Fortran compilers, performance libraries, and parallel models Application performance, scalability and quality for current multicore and future many-core systems. Advanced Verify Intel® Inspector XE Memory & threading error checking tool for higher code reliability & quality Increases productivity and lowers cost, by catching memory and threading defects early Performance Profiler to optimize performance and scalability Removes guesswork, saves time, makes it easier to find performance and scalability bottlenecks Combines ease of use with deeper insights. Advanced Tune Intel® VTune™ Amplifier XE Today’s Focus: Intel® Integrated Performance Primitives, a component of Intel® Composer XE Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 4
Intel IPP - overview Application Source Code Intel IPP Usage Code Samples Free Code Samples • • • Sample video/audio/speech codecs Image processing and JPEG Signal processing Data compression. NET and Java integration Rapid Application Development API calls Cross-platform API • • Intel IPP Library C/C++ API Cryptography Image processing Image color conversion JPEG / JPEG 2000 Computer Vision Video coding Audio coding • • Data Compression Data Integrity Signal processing Matrix mathematics Vector mathematics String processing Speech coding Compatibility and Code Re-Use Static/Dynamic Link Intel IPP Processor-Optimized Binaries Processor-Optimized Implementation Software & Services Group Developer Products Division • • Intel® Core™ i 7 Processors Intel® Atom™ Processors Intel® Core™ 2 Duo and Core™ Extreme Processors Intel® Core™ Duo and Core™ Solo Processors Intel® Pentium® D Dual-Core Processors Intel® Xeon® 64 -bit Dual-Core Processors Intel® Xeon® DP and MP Processors Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Outstanding Performance 5
Intel IPP - Functions and Samples Domains 1. Image Processing 2. Computer Vision 3. Color conversion Functions * Geometry transformations, such as resize/rotate * Linear and non-linear filtering operation on an image for edge detection, blurring, noise removal and etc for filter effect. * Linear transforms for 2 D FFTs, DCT. * image statistics and analysis Samples * Tiled Image Processing / 2 D Wavelet Transform /C++ Image Processing Classes/Image Processing functions Demo * Background differencing, Feature Detection (Corner Detection, Canny Edge detection), Distance Transforms, Image Gradients, Flood fill, Motion analysis and * Face Detection Object Tracking, Pyramids, Pattern recognition, Camera Calibration * Converting image/video color space formats: RGB, HSV, YUV, YCb. Cr * Up/Down sampling * Brightness and contrast adjustments 4. JPEG Coding * High-level JPEG and JPEG 2000 compression and decompression functions * JPEG/JPEG 2000 support functions: DCT, Wavelet transforms, color conversion, downsampling • UIC-Unified Image Codec/ Integration with the Independent JPEG Group (IJG) library 5. Video Coding * VC-1, H. 264, MPEG-2, MPEG-4, H. 261, H. 263 and DV codec support functions * Simple Media Player/ Video Encoder / h. 264 decoding 6. Audio Coding * Echo cancellation and audio transcoding, Block. Filtering, Spectral Data prequantization. * Audio Codec Console application 7. Realistic Rendering * Acceleration Structures, Ray-Scene Intersection and Ray Tracing * Surface properties, shader support, tone mapping * Ray Tracing Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 6
Intel IPP - Functions and Samples Domains Functions Samples 8. Speech Coding * Adaptive/Fixed Codebook functions, Autocorrelation, Convolution, Levinson-Durbin recursion, Linear Prediction Analysis & Quantization, Echo Cancellation, Companding * G. 168, G. 167, G. 711, G. 722. 2, AMRWB, Extended AMRWB (AMRWB+), G. 723. 1, G. 726, G. 728, G. 729, MSRTA, GSM AMR, GSM FR 9. Data Compression * Entropy-coding compression: Huffman, VLC * Dictionary-based compression: LZSS, LZ 77 * Burrows-Wheeler Transform, Move. To. Front, RLE, Generalized Interval Transformation * Compatible feature support for zlib and bzip 2 * zlib, bzip 2, gzip-compatible /General data compression examples 10. Cryptography * Big-Number Arithmetic / Rijndael, DES, TDES, SHA 1, MD 5, RSA, DSA, Montgomery, prime number generation and pseudo-random number generation (PRNG) functions * Intel IPP crypto usage in Open SSL* 11. String Processing * Compare, Insert, change case, Trim, Find, Regexp, Hash * “ippgrep” – regular expression matching 12. Signal Processing * Transforms: DCT, DFT, MDCT, Wavelet (both Haar and user-defined filter banks), Hilbert * Convolution, Cross-Correlation, Auto-Correlation, Conjugate * Filtering: IIR/FIR/Median filtering, Single/Multi-Rate FIR LMS filters * Other: Windowing, Jaehne/Tone/Traingle signal generation, Thresholding * Signal Processing Function Demo 13. Vector Math * Logical, Shift, Conversion, Power, Root, Exponential, Logarithmic, Trigonometric, Hyperbolic, Erfc 14. Matrix Math * Addition, Multiplication, Decomposition, Eigenvalues, Cross-product, transposition Other Common Functions * CPUTypes, Thread Number Control, Memory Allocation Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. * Linkages/Different language support 7
Sandy Bridge Microarchitecture : First AVX Capable Processor Intel AVX is a 256 -bit instruction set extension to SSE designed to provide even higher performance for applications that are floatingpoint intensive. functionality to Intel the AVX existing adds Intel new SIMD instruction set (based on SSE) and includes a more compact SIMD encoding format. The primary benefits of Intel AVX are: § Support for wider vector data (up to 256 -bit). § Efficient instruction encoding scheme that supports 3 and 4 operand instruction syntaxes. § Flexible programming environment, ranging from branch handling to relaxed memory alignment requirements. § New data manipulation and arithmetic compute primitives, including broadcast, permute, fused-multiply-add, etc. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 8
Intel IPP : Intel AVX Optimizations • AVX Hand-tuned Optimizations – signal processing functions (FFT, DFT, FIR, IIR) – vector math and random numbers – speech and audio coding – image processing and realistic rendering – data compression and cryptography – Direct. X 3 D math functions • Intel Compiler Auto-Vector Optimizations – ~30% of IPP functions – simple arithmetic and logic functions – performance approximately equal to hand optimized Complete list of hand optimized functions - http: //software. intel. com/en-us/articles/intel-ipp-functionsoptimized-for-intel-avx-intel-advanced-vector-extensions/ Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 9
Intel IPP : Intel AVX Optimization Intel IPP uses codes optimized for various central processing units (CPUs). Dispatching refers to detection of your CPU and selecting the corresponding Intel IPP binary. Platform Architecture SIMD Requirements Processor / µarchitecture IA-32 px C optimized for all IA-32 processors i 386+ p 8 SSE 4. 1, SSE 4. 2, AES-NI Penryn, Nehalem, Westmere g 9 AVX Sandy Bridge µarchitecture mx C-optimized for all Intel® 64 platforms P 4 y 8 SSE 4. 1, SSE 4. 2, AES-NI Penryn, Nehalem, Westmere e 9 AVX Sandy Bridge µarchitecture Intel® 64 (EM 64 T) Application Inte main(…) { #include “ipp. h” ipps. Zero. Mean_16 s } ippsc-7. 0. dll DLLMain (…) { //Check CPU ipp. Get. CPUType (): //Load best DLL … } ippsc-7. 0. dll ippscg 9 -7. 0. dll Sandy Bridge µarchitecture ippscw 7 -7. 0. dll Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10
Performance : Software & Services Group Developer Products Division Intel AVX optimized functions Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 11
Performance : Software & Services Group Developer Products Division Intel AVX optimized functions Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 12
Intel AVX optimized Intel IPP RT audio functions Function Base Name Operation Adaptive. Codebook. Search_RTA Searches for the adaptive codebook index and the lag, and computes the adaptive vector Fixed. Codebook. Search_RTA, Fixed. Codebook. Search. Random_RTA Searches for the fixed codebook vector High. Pass. Filter_RTA Performs high-pass filtering LSPQuant_RTA Performs quantization of LSP coefficients LSPTo. LPC_RTA Converts LSP coefficients to LP coefficients QMFDecode_RTA Performs QMF synthesis Post. Filter_RTA Restores speech signal from the residual *LSP – Line spectral pairs *LP – Linear Prediction *QMF- Quadrature mirror filter Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 13
Unified Speech Component (USC) interface is designed for implementing speech codecs, echo cancellers and other algorithm modules in the C language using Intel IPP. The USC interface defines a global table of unified functions that are applicable to a USC algorithm. The table can be augmented for future functionality expansions. Each USC algorithm must implement USC base functions and may implement algorithm-specific functions. Currently the USC library implements the following types of algorithms: – Speech codec – Echo cancellation algorithm, also referred to as echo canceller – Speech signal filter – Tone detection and generation algorithm, also referred to as tone detector. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 14
USC MSRTA codec Real Time(RT)Audio is the preferred Microsoft® Real-Time audio codec and is used by Microsoft Office Communications Server (OCS) and other communication application like Microsoft Office Communicator, and Microsoft Live Meeting Console. The USC MSRTA codec supports 16 bit wideband 16000 Hz and narrowband 8000 Hz PCM(Pulse Code Modulation) mono signals compression and decompression with 20 ms frame lengths at bitrates 8800 bps and 18000 bps respectively. To get more information about Microsoft Real-Time Audio Codec refer to - Overview of the Microsoft RTAudio Speech codec Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 15
Demo : USC MSRTA Sample How to build the source code § Set system environment variable IPPROOT § Open solution/project file in related Microsoft* Visual Studio § Select configuration/platform you need § Build all projects in Microsoft* Visual Studio § Running the codec usc_speech_rtp_codec. exe [options] <infile> <outfile> Codec name Supported bitrate, in bps Codec description IPP_MSRTAnb_FP 8800 Narrowband 8000 KHz MSRTA codec, floating-point implementation IPP_MSRTAwb_FP 18000 Wideband 16000 KHz MSRTA codec, floating-point implementation Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 16
Optimization Notice Intel compilers, associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel and non-Intel microprocessors (for example SIMD instruction sets), but do not optimize equally for non-Intel microprocessors. In addition, certain compiler options for Intel compilers, including some that are not specific to Intel micro-architecture, are reserved for Intel microprocessors. For a detailed description of Intel compiler options, including the instruction sets and specific microprocessors they implicate, please refer to the “Intel Compiler User and Reference Guides” under “Compiler Options. " Many library routines that are part of Intel compiler products are more highly optimized for Intel microprocessors than for other microprocessors. While the compilers and libraries in Intel compiler products offer optimizations for both Intel and Intel-compatible microprocessors, depending on the options you select, your code and other factors, you likely will get extra performance on Intel microprocessors. Intel compilers, associated libraries and associated development tools may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel® Streaming SIMD Extensions 2 (Intel® SSE 2), Intel® Streaming SIMD Extensions 3 (Intel® SSE 3), and Supplemental Streaming SIMD Extensions 3 (Intel SSSE 3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel and non-Intel microprocessors, Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements. We hope to win your business by striving to offer the best performance of any compiler or library; please let us know if you find we do not. Notice revision #20110307 Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 17
IPP USC MSRTA codec Performance – IA 32 IPP SC Codec Voice Activity Detection Bitrate (bps) IPP_MSRTA_FP - 8800 IPP_MSRTA_FP VAD 1 8800 IPP SC Codec Voice Activity Detection NHM Audio file Bitrate (bps) SNB Speedup : NHM/SNB Duration (sec) Encode (MHz) Decode (MHz) s_8000_16. wav 1070 20. 42 2. 35 16. 65 s_8000_16. wav 1070 19. 97 2. 15 16. 45 Audio file NHM Encode Decode 1. 92 1. 23 1. 22 1. 78 1. 21 SNB Speedup : Nhm/Snb Duration (sec) Encode (MHz) Decode (MHz) IPP_MSRTA_FP - 18000 s_16000_16. wav 1090 42. 87 5. 29 38. 35 4. 18 IPP_MSRTA_FP VAD 1 18000 s_16000_16. wav 1090 37. 52 5. 61 32. 51 4. 65 System Configuration CPU Operating System Nehalem(NHM) Sandy Bridge(SNB) Intel(R) Xeon(R) CPU X 5570 @ 2. 93 GHz Genuine Intel(R) CPU 0 @ 3. 00 GHz Microsoft* Windows 2003 Microsoft* Windows x 64 with SNB patch Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Encode Decode 1. 12 1. 27 1. 15 1. 21 18
IPP USC MSRTA codec Performance – Intel 64 IPP SC Codec ippscy 87. 0. dll+ Voice Activity Detection IPP_MSRTA_FP IPP SC Codec 8800 VAD 1 ippscy 87. 0. dll+ Voice Activity Detection IPP_MSRTA_FP Bitrate (bps) 8800 System Configuration CPU Operating System SNB Speedup : NHM/SNB Duration (sec) Encode (MHz) Decode (MHz) s_8000_16. wav 1070 22. 8 3. 03 16. 79 1. 95 s_8000_16. wav 1070 21. 77 2. 78 16. 19 1. 75 Bitrate (bps) NHM Audio file 18000 VAD 1 NHM Audio file 18000 Decode 1. 36 1. 55 1. 34 1. 59 SNB Speedup : Nhm/Snb Duration (sec) Encode (MHz) Decode (MHz) s_16000_16. wav 1090 46. 26 6. 87 34. 51 4. 36 s_16000_16. wav 1090 39. 99 7. 25 31. 08 4. 77 Nehalem(NHM) Sandy Bridge(SNB) Intel(R) Xeon(R) CPU X 5570 @ 2. 93 GHz Genuine Intel(R) CPU 0 @ 3. 00 GHz Microsoft* Windows 2003 Microsoft* Windows x 64 with SNB patch Software & Services Group Developer Products Division Encode Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Encode Decode 1. 34 1. 58 1. 29 1. 52 19
Summary § Intel IPP is a highly optimized library for latest Intel architecture including Sandybridge. § By using IPP functions, Intel AVX optimized audio and speech codec can be developed including Microsoft* real time audio codec. § Intel IPP offers sample code to demonstrate the development and usage of MSRTA codec. Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 20
References § Intel® Parallel Studio XE 2011 Home page - http: //software. intel. com/en-us/articles/intelparallel-studio-xe/ § Intel® IPP Functions Optimized for Intel® Advanced Vector Extensions – http: //software. intel. com/en-us/articles/intel-ipp-functions-optimized-for-intel-avx-inteladvanced-vector-extensions/ § Intel® AVX: New Frontiers in Performance Improvements and Energy Efficiency http: //software. intel. com/en-us/articles/intel-avx-new-frontiers-in-performanceimprovements-and-energy-efficiency/ Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 21
Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 22
Optimization Notice Intel compilers, associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel and non-Intel microprocessors (for example SIMD instruction sets), but do not optimize equally for non-Intel microprocessors. In addition, certain compiler options for Intel compilers, including some that are not specific to Intel micro-architecture, are reserved for Intel microprocessors. For a detailed description of Intel compiler options, including the instruction sets and specific microprocessors they implicate, please refer to the “Intel Compiler User and Reference Guides” under “Compiler Options. " Many library routines that are part of Intel compiler products are more highly optimized for Intel microprocessors than for other microprocessors. While the compilers and libraries in Intel compiler products offer optimizations for both Intel and Intel-compatible microprocessors, depending on the options you select, your code and other factors, you likely will get extra performance on Intel microprocessors. Intel compilers, associated libraries and associated development tools may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel® Streaming SIMD Extensions 2 (Intel® SSE 2), Intel® Streaming SIMD Extensions 3 (Intel® SSE 3), and Supplemental Streaming SIMD Extensions 3 (Intel SSSE 3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel and non-Intel microprocessors, Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements. We hope to win your business by striving to offer the best performance of any compiler or library; please let us know if you find we do not. Notice revision #20110307 Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 23
Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, reference www. intel. com/software/products. Bunny. People, Celeron Inside, Centrino Atom, Centrino Atom Inside, Centrino logo, Cilk, Core Inside, Flash. File, i 960, Instant. IP, Intel, the Intel logo, Intel 386, Intel 486, Intel. DX 2, Intel. DX 4, Intel. SX 2, Intel Atom Inside, Intel Core, Intel Inside logo, Intel. Leap ahead. logo, Intel Net. Burst, Intel Net. Merge, Intel Net. Structure, Intel Single. Driver, Intel Speed. Step, Intel Strata. Flash, Intel Viiv, Intel v. Pro, Intel XScale, Itanium Inside, MCS, MMX, Oplus, Over. Drive, PDCharm, Pentium Inside, skoool, Sound Mark, The Journey Inside, Viiv Inside, v. Pro Inside, VTune, Xeon, and Xeon Inside are trademarks of Intel Corporation in the U. S. and other countries. *Other names and brands may be claimed as the property of others. Copyright © 2011. Intel Corporation. http: //intel. com/software/products Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 24
- Slides: 24