ICC Optimization for the Linux Kernel and Linux

  • Slides: 41
Download presentation

ICC Optimization for the Linux Kernel and Linux Operating System www. linuxdna. com/techincal_docs. tar.

ICC Optimization for the Linux Kernel and Linux Operating System www. linuxdna. com/techincal_docs. tar. gz C. Tyler Mc. Adams Project Founder

What are the possibilities? Extreme performance potential for IA 32, x 64 and IA

What are the possibilities? Extreme performance potential for IA 32, x 64 and IA 64 systems Diversification – i. e. finding bugs that go unseen More freedom choice where there is none

How did it all start? Gentoo Forums Googling Intel Forums Ingo A. Kubblin Pyrillion.

How did it all start? Gentoo Forums Googling Intel Forums Ingo A. Kubblin Pyrillion. org 2. 6. 4 ~ 2. 6. 9 2. 6. 22 Po. C

Project Members and Contributors Luyi Cheng – Ho. D & base patchset Chi Hoang

Project Members and Contributors Luyi Cheng – Ho. D & base patchset Chi Hoang – 32 bit Jeroen Moetwill – 64 bit kernel patch Feilong Huang – Intel Max Domeika – Intel

Is our kernel Non-GPL? No! Can work with or without ICC installed ICC Compiler

Is our kernel Non-GPL? No! Can work with or without ICC installed ICC Compiler is free for your personal usage! Our patches are 100% free and GPL 2. 0 code

Linux. DNA Major Projects Make ICC kernel fast & easy to make Work with

Linux. DNA Major Projects Make ICC kernel fast & easy to make Work with SGI to engineer an Altix supercomputer kernel Work with Intel to provide an ICC Mobiln Linux repo – eg. ICC Firefox

Linux. DNA Vision - GPC to SPC GPC – General Purpose computing SPC –

Linux. DNA Vision - GPC to SPC GPC – General Purpose computing SPC – Specific Purpose Computing SPC is Faster, Greener - what you were after in the first place! Big companies are investing in SPC RD

SPC Examples Hardware: RISC cpu core for networking – less instructions to execute since

SPC Examples Hardware: RISC cpu core for networking – less instructions to execute since network packages are transmitted in Big Endian Software: ~ Gentoo Linux ~ The ability to compile the system for a specific platform and streamline the software for the specific purpose (eg. web appliance - Je. OS) Room for improvement! - We're ready to take both of these examples and evolve them to the next level.

SPC is Faster and Greener! Optimized code executes faster, and thus takes less time

SPC is Faster and Greener! Optimized code executes faster, and thus takes less time to finish a job on the same hardware and therefore less wattage & money to finish a given task. Forgo investment on new hardware and speed up existing infrastructure!

Why else move from monolithic OSes? Windows has over ~ 50 million lines of

Why else move from monolithic OSes? Windows has over ~ 50 million lines of code (not including applications like Office) Windows has been estimated to need 20 billion cpu operations just to get to the login!

Torvalds to kernel… you’re bloated!!! Spork-like due to generic nature Always larger than cache

Torvalds to kernel… you’re bloated!!! Spork-like due to generic nature Always larger than cache size (cache misses) Hardware abstracted Mixed C & ASM

Why do binary distro kernels end up bloated? They have to be ready to

Why do binary distro kernels end up bloated? They have to be ready to work with anything. They have hardly any idea what CPU you're going to put it on (generic) Everybody's fav: Backwards Compatibility

i. e. Why recompile a kernel? Imagine you have a typical web server that

i. e. Why recompile a kernel? Imagine you have a typical web server that comes with a TOE NIC card. This NIC can offload the entire network stack, the firewall and encryption. If these things are built in to a normal distro's kernel (which they always are) you lose a great deal of processing power. . . both because the unused code takes up cache / memory space but CPU cycles also. This brings about undue redundancy because the kernel stacks are not used.

So you’re compiling for a “P 4”. . . Willamette – first P 4

So you’re compiling for a “P 4”. . . Willamette – first P 4 core – no HT – socket 423 – SSE 2 Northwood – 2 nd core – introduced HT – socket 478 – SSE 2 Prescott – 3 rd core – introduced SSE 3 – HT – 478 and LGA 755 (prescott instructions are not backwards compatible!) Later prescott models have EMT 64 extensions and the e. Xecute Disable (XD) technology and dual core (Pentium D)

But what if you can’t SPC? ICC can help put “Lipstick on a pig”!

But what if you can’t SPC? ICC can help put “Lipstick on a pig”! Using advanced optimizations like IPO and PGO we can step around barriers and increase performance

Linux + ICC Makes this Possible! Linux is open source so we can shape

Linux + ICC Makes this Possible! Linux is open source so we can shape it like an artist does clay ICC employs powerful optimization techniques that can exploit this openness for speed

So what does ICC do so well? IPO - Interprocedural optimization PGO - Profile

So what does ICC do so well? IPO - Interprocedural optimization PGO - Profile Guided Optimization High End Vectorization High End Math Algorithms Optimized threading

IPO: Interprocedural Optimization IPO is a heuristically based optimization scheme that can be implemented

IPO: Interprocedural Optimization IPO is a heuristically based optimization scheme that can be implemented on entire programs or single files. IPO can eliminate inefficient wasted use of CPU registers, SIMD units and more.

PGO: Profile Guided Optimization PGO uses multiple stages to create code that executes optimality

PGO: Profile Guided Optimization PGO uses multiple stages to create code that executes optimality for whatever purpose the system is being used Stage 1: make, execute and analyze execution Stage 2: make and optimize with stage 1 data

PGO step by step Results in instrumented executable. Amount of instrumentation data depends on

PGO step by step Results in instrumented executable. Amount of instrumentation data depends on the prof-gen keyword used to compile source Phase 1 Compile sources with The prof-gen option Phase 2 Run the Instrumented Executable (one or more times) Results in dynamic profile information file. (Each time the instrumented executable is run a new. dyn file is created with the file name format of: 8_hex_digits. dyn. ) Phase 3 Compile with prof-use option Creates and uses merged dynamic information summary file. The default file name is pgopti. dpi Application optimized using profile-guided code

PGO Example: Firefox 3. 5 Google V 8 benchmark: Normal Linux GCC binary: 167

PGO Example: Firefox 3. 5 Google V 8 benchmark: Normal Linux GCC binary: 167 V 8 score after PGO: 209 (Pentium M 1. 7 ghz)

Vectorization: MMX, SSE*, etc Vectorization was first used back in the '60 s as

Vectorization: MMX, SSE*, etc Vectorization was first used back in the '60 s as a way to instrument the compiler to find and optimize loops analyzed in code. Today's examples of it's use are apps like Photoshop ICC output states: “LOOP WAS VECTORIZED”

Other Diversified ICC Existentials Debugger Threading Building Blocks Integrated Performance Primitives Math Kernel Library

Other Diversified ICC Existentials Debugger Threading Building Blocks Integrated Performance Primitives Math Kernel Library

ICC compile examples: ICC makes it simple! Absolutely no NOS or Whale Tales needed!

ICC compile examples: ICC makes it simple! Absolutely no NOS or Whale Tales needed! (Gentoo joke) icc -O 3 -x. W -ipo -gcc myapp. c icc -O 3 -x. W -prof_gen myapp. c icc -O 3 -x. W -ipo -prof_use myapp. c

Compiling the kernel – What does the wrapper do? Translates GCC semantics for ICC

Compiling the kernel – What does the wrapper do? Translates GCC semantics for ICC Sets the framework for specific optimizations Filters out noncompatible flags & C files

Kernel Compile Commands: Old way: make HOSTCC=intelwrapper Ar=xiar LD=xild Or: make HOSTCC=intelwrapper New way:

Kernel Compile Commands: Old way: make HOSTCC=intelwrapper Ar=xiar LD=xild Or: make HOSTCC=intelwrapper New way: make WARNING: ICC is very “chatty”. . . just relax : ) Different looking output compared to GCC Refer to the Linux. DNA. com mirror full instructions

Debian Compile Example After installing ICC on Debian (or Ubuntu or Fedora) make sure

Debian Compile Example After installing ICC on Debian (or Ubuntu or Fedora) make sure to source it with the following command: source /opt/intel/compiler/11. 0/083/bin/iccvars. sh ia 32 Or: source /opt/intel/compiler/11. 0/083/bin/iccvars. sh intel 64 For Itanium 64 bit: source /opt/intel/compiler/11. 0/083/bin/iccvars. sh ia 64

Debian Continued Now make the kernel after you have configured it: MAKEFLAGS="HOSTCC=intelwrapper Ar=xiar LD=xild"

Debian Continued Now make the kernel after you have configured it: MAKEFLAGS="HOSTCC=intelwrapper Ar=xiar LD=xild" make-kpkg -revision=enter_a_description_with_a_number. 00 kernel_image kernel_headers This will build 2. deb packages in /usr/src, one for the kernel image and another for the kernel headers (to compile other modules later). Change directory to /usr/src and do dpkg -i linux*. deb

Making a Redhat ICC. rpm unpack the srcrpm, edit the specfile. Here is an

Making a Redhat ICC. rpm unpack the srcrpm, edit the specfile. Here is an example of libnl's specfile – find these: %build %configure make gendoc Change: %configure to CC=icc CXX=icpc CFLAGS=. . . CXXFLAGS=. . . AR=xiar LD=xild. /configure. . Then use rpmbuild to build the rpm package.

Gentoo /etc/make. conf CFLAGS="-march=pentium 4 -O 2 -pipe" CXXFLAGS="${CFLAGS}" CHOST="i 686 -pclinux-gnu" MAKEOPTS="-j 3"

Gentoo /etc/make. conf CFLAGS="-march=pentium 4 -O 2 -pipe" CXXFLAGS="${CFLAGS}" CHOST="i 686 -pclinux-gnu" MAKEOPTS="-j 3" USE="sse 2 dri kde. . . ” ICCCFLAGS="-O 3 -x. W -ipo -gcc" ICCCXXFLAGS="${ICCCFLAGS}"

When GCC Breaks. . . The majority of problems are from mixed C /

When GCC Breaks. . . The majority of problems are from mixed C / ASM blocks 3 apps remain untamed – GCC, Binutils, & Glibc Itanium kernel = EPIC fail, but could be a win!

Optimization Campaign The latest Vanilla kernel ICC compatible w/ IPO and PGO patch sets.

Optimization Campaign The latest Vanilla kernel ICC compatible w/ IPO and PGO patch sets. Fully ICC optimized system: Xorg, FF, GTK+ Fully PGO aware system (w/ both ICC & GCC PGO) Live. CD system that uses PGO to install optimized packages to disk Meego ICC repos Distro repos for rpm and deb based systems Diversify with other CC like PGP and SUN compilers

S. E. L. DNA SEL – Self Evolving Linux Works in idle time and

S. E. L. DNA SEL – Self Evolving Linux Works in idle time and or background Fine grained customization GUI and. configuration Distro agnostic. rpm, . deb. . . etc

SEL Example Optimize the entire system or just parts and or applications Optimize kernel

SEL Example Optimize the entire system or just parts and or applications Optimize kernel or just certain parts of the kernel Choose from preoptimized packages or PGO aware packages

SEL Proves the Power of Open Source Closed source software can not have this

SEL Proves the Power of Open Source Closed source software can not have this kind of optimization because you can not have the code to recompile No placebo – meaningful, holistic results

Conclusion GPC computing does not cut it anymore The idea of simply setting up

Conclusion GPC computing does not cut it anymore The idea of simply setting up a GPC solution and assuming the system is capable is a bad business strategy A SEL SPC solution is cheaper, greener and more capable!

Contributors to Linux. DNA Lu. Yi Cheng – head developer and responsible for the

Contributors to Linux. DNA Lu. Yi Cheng – head developer and responsible for the first bootable kernel we made, Linux. DNA Live. CD and PGO enabled Fire. Fox Feilong Huang – Intel Developer: kernel hooks for ICC

Contributors to Linux. DNA SGI – Providing Altix 4700 NUMA Supercomputer access Chi Hoang

Contributors to Linux. DNA SGI – Providing Altix 4700 NUMA Supercomputer access Chi Hoang – original 2. 6. 22 patch and latest 32 bit patches Many more team members @ Linux. DNA Google Group

What are the Linux. DNA costs? Surprisingly very little! Please donate @ Linux. DNA.

What are the Linux. DNA costs? Surprisingly very little! Please donate @ Linux. DNA. com A couple of in-house servers for mirror and repo VMware ESX VI for distro development