Shader generation and compilation for a programmable GPU
Shader generation and compilation for a programmable GPU Student: Jordi Roca Monfort Advisor: Agustín Fernández Jiménez Co-advisor: Carlos González Rodríguez
Outline n n n Introduction. Background. Goals. Design and implementation. Conclusions.
Introduction
ATTILA simulation framework Open. GL Application GLInterceptor Open. GL trace Vendor Open. GL API Vendor Driver ATTILA Open. GL API GLPlayer ATTILA Driver ATTILA Simulator Statistics
My Work Open. GL Application GLInterceptor Open. GL trace Vendor Open. GL API GLPlayer Vendor driver Extend/Complete Open. GL API to execute ATTILA Open. GL API recent/advanced 3 D Applications (Doom 3, ATTILA Driver Unreal Tournament, etc) Simulates last generation of 3 D ATTILA Simulator graphics boards (programmable GPUs) Statistics
Background
Renderization (I) n ¿What is called renderization? Generate the pixels for a set of images/frames forming an animated scene. n Goal: compute each pixel color as fast as possible → determines FPS n n ¿Which computations are required? n n Given the scene objects DB, compute the color of the projected objects in the pixel screen area. Each pixel color depends on the scene lighting and the viewer camera position.
Renderization (II) View Info Position Lighting Info Position, Color Geometry info Renderization data Screen area
Renderization approaches n For each pixel (x, y) compute physical interaction between the lights and objects in scene: n n Ray. Tracing, Radiosity, Photon Map Very expensive pixel computation: n n Global lighting (shadows, indirect reflections among objects) Interaction between objects and lights are computed only in vertices and for each pixel (x, y) the corresponding value is approached. n n Direct Rendering (3 D graphics boards, 3 D game consoles, etc. ). Only direct illumination from light sources (Each vertex color is independent)
Direct Rendering (I) Viewer Info Color interpolation Position Lighting Info Position, Color Geometry info Renderization data Screen area
Direct Rendering (II) n n n The higher density of vertices, the more realistic lighting. In addition, more vertices are required to improve level of detail in surfaces. Thus: n n ▲realism→ ▲vertices→ ▲computation→ ▼FPS Solution: n n Specify surface using less vertices and Specify surface details using textures.
Textures Viewer Info Position Lighting Info Position, Color Geometry info Renderization data Screen area
Texture mapping 1 (0. 63, 0. 86) (0. 26, 0. 37) (0. 79, 0. 10) 0 0 1 Screen area
Texture mapping Coordinate interpolator 1 (0. 63, 0. 86) (0. 26, 0. 37) (0. 40, 0. 45) Texture sampled value 0 0 1 Screen area (0. 79, 0. 10)
3 D Rendering Pipeline Lighting info Viewer info 3 D scene Vertex DB Compute: • color • coordinates • vertex position in screen Textures Generate interpolated attributes (color, coordinates) Per-pixel texture mapping RASTERIZER Vertex processing stage (VERTEX SHADING) Parallelizable process Final screen Fragment processing stage (FRAGMENT SHADING) Parallelizable process
3 D RP Implementation n Implementations n Software: n n Mesa 3 D Graphics Library (Open. GL). Software + hardware acceleration: n n Vendor Open. GL, Direct 3 D, Xbox, Play. Station, etc. Work distribution between CPU y graphics board transparently to the applications.
3 D accelerators evolution n 2 D accelerators (pre Voodo) <1996 BD CPU n VS Rasterizer Final screen FS VGA 3 D accelerators (3 Dfx Voodo) 1996 3 D accelerators BD CPU n FS Final screen Graphical Processor Units (Ge. Force) 1999 BD CPU n VS Rasterizer FS Final screen GPU Programmable GPUs (Ge. Force 3) 2001 BD CPU VS Rasterizer FS Final screen PGPU
GPUs: applying 2 textures Rasterizer Interpolatedcolor Texture coordinate 1 Texture coordinate 2 Final color Fixed Function * + F 1 (x, y) Fragment stream Texture Memory Fragment Unit 0 Uses: • Per-pixel lighting. • Shadow implementation. • Bump-mapping.
Programmable GPUs: 2 textures Rasterizer F 1 (x, y) Interpolatedcolor Fragment Stream Texture coordinate Final color LDTEX t 1, coord 1, Text 1 Shader Processors ALU Temporals Fragment Shader 0 LDTEX t 2, cood 2, Text 2 ADD t 1, color. In, t 1 MUL t 1, t 2 Texture Memory
Shader Processors n SP execute small programs (shaders) using vectorial and scalar instructions, that define the computation in the following stages: n Vertex processing: Vertex Shader n n Fragment processing: Fragment Shader n n n Lighting computation On-screen vertex projection Texture coordinates generation. Texture color fetch and blending. FOG It is like a GPU supporting “infinite visualization effects” not supported in previous graphics boards generations.
Goals
Goals n Implement all the necessary modules in the Open. GL API to: n n Support new real 3 D applications using shaders in our simulation framework. Support also for old applications using FF and applications combining both shaders and FF. Idea: Perform Fixed Function emulation through generating equivalent shaders for SP.
Things to do n Implement shader support in our Open. GL API: n n Using the most used shader programming language by 3 D apps: ARB_vertex_program y ARB_fragment_program Study how to express FF functions in terms of shaders (pre-study phase).
Design and implementation
Fixed Function emulation
FF Emulation BD Vertex Shader Fragment Shader Rasterizer Final screen !!ARBvp 1. 0 !!ARBfp 1. 0 ATTRIB pos = vertex. position; PARAM mat[4] = { state. matrix. mvp }; #first set of texture coordinates ATTRIB tex = fragment. texcoord; # Transform by concatenation of the # MODELVIEW and PROJECTION matrices. DP 4 result. position. x, mat[0], pos; DP 4 result. position. y, mat[1], pos; DP 4 result. position. z, mat[2], pos; DP 4 result. position. w, mat[3], pos; # interpolated color ATTRIB col = fragment. color; OUTPUT out. Color = result. color; TEMP tmp; # Pass the primary color through # w/o lighting. MOV result. color, vertex. color; #sample the texture TEX tmp, texture, 2 D; #perform the modulation MUL out. Color, tmp, col; END
FF emulation n Implemented functions (according to Open. GL Spec 2. 0): n Vertex Shading (85% of total): n Per-vertex standard Open. GL lighting: n n n Vertex transformation Automatic texture coordinate generation. n n Point, directional and spot lights. Attenuation. Local and infinite viewer. Object Plane and Eye Plane Normal Map, Reflection Map and Sphere Map. FOG coordinate. Fragment Shading (90% of total): n n Multi-texturing and texture combine functions FOG application: n Linear, Exponential and Second Order Exponential
FF emulation example n n FOG application: Algorithm: For each pixel, perform linear interpolation between the original and the fog color, accoding to the distance from the object to the viewer.
FOG emulation n FOG exponential mode f = e-density*fogcoord f = 2 -(density * fogcoord)/ln(2) (e = 21/ln 2) Final color = pixel color * f + fog color * (1 - f)
FOG emulation !!ARBfp 1. 0 ATTRIB fog. Coord = fragment. fogcoord; OUTPUT o. Color = result. color; PARAM fog. Color = state. fog. color; PARAM fog. Params = program. local[0]; # fog. Params. x : density/ln(2) TEMP fragment. Color, fog. Factor; # Texture applications. . # Fog Factor computing. . . MUL fog. Factor. x, fog. Param. x, fog. Coord. x; # fog. Factor. x = density*fogcoord/ln(2) EX 2_SAT fog. Factor. x, -fog. Factor. x; # fog. Factor. x = 2^-(fog. Factor. x) # Fog color interpolation LRP o. Color, fog. Factor. x, fragment. Color, fog. Color; END
ARB compilers
ARB compilers !!ARBvp 1. 0 !!ARBfp 1. 0 ATTRIB pos = vertex. position; PARAM mat[4] = { state. matrix. mvp }; #first set of texture coordinates ATTRIB tex = fragment. texcoord; # Transform by concatenation of the # MODELVIEW and PROJECTION matrices. DP 4 result. position. x, mat[0], pos; DP 4 result. position. y, mat[1], pos; DP 4 result. position. z, mat[2], pos; DP 4 result. position. w, mat[3], pos; # interpolated color ATTRIB col = fragment. color; OUTPUT out. Color = result. color; TEMP tmp; # Pass the primary color through # w/o lighting. MOV result. color, vertex. color; #sample the texture TEX tmp, texture, 2 D; #perform the modulation MUL out. Color, tmp, col; END
The compilers common architecture !!ARBvp 1. 0 PARAM arr[5] = { program. env[0. . 4] }; #ADDRESS addr; ATTRIB v 1 = vertex. attrib[1]; PARAM par 1 = program. local[0]; OUTPUT o. Pos = result. position; OUTPUT o. Col = result. color. front. primary; OUTPUT o. Tex = result. texcoord[2]; ARL addr. x, v 1. x; MOV res, arr[addr. x - 1]; END Code generation GPU Specific Generic Symbol table IR Lexical - Syntactic Analysis (Flex + Bison) Line: By 0 By 1 By 2 By 3 By 4 By 5 By 6 By 7 By 8 By 9 By. ABy. By. DBy. EBy. F 011: 16 00 03 28 00 01 00 08 26 1 b 6 a 00 0 f 1 b 04 78 012: 09 00 03 00 00 00 02 08 24 1 b 1 b 00 08 1 b 14 18 013: 09 00 04 00 00 00 02 08 24 1 b 1 b 00 04 1 b 14 b 8 014: 09 00 05 00 00 00 02 08 24 1 b 1 b 00 02 1 b 04 58 015: 09 00 06 00 00 00 02 08 24 1 b 1 b 00 01 1 b 04 f 8 016: 16 00 01 00 00 00 02 30 24 1 b 1 b 00 08 1 b 14 98 017: 16 00 02 00 00 01 02 30 24 1 b 1 b 00 08 1 b 04 38 018: 16 00 00 00 03 30 24 00 1 b 00 02 1 b 04 d 8 019: 16 00 01 00 00 00 03 30 24 00 1 b 00 01 1 b 14 78 020: 01 00 08 00 00 08 18 08 24 04 ae 00 0 c 1 b 04 18 021: 17 00 00 00 13 30 24 00 00 00 08 1 b 04 b 8 022: 17 00 01 00 00 00 13 30 24 00 00 00 04 1 b 14 58 023: 01 00 08 00 00 09 18 08 24 04 04 00 0 c 1 b 14 f 8 024: 01 00 08 00 00 0 a 18 08 26 04 ae 00 0 c 1 b 04 98 025: 01 00 08 00 00 0 b 18 08 26 04 04 00 0 c 1 b 14 38 !!ARBvp 1. 0 Semantic Analysis
Intermediate Representation n !!ARBvp 1. 0 Example: ATTRIB pos = vertex. position; PARAM mat[4] = { state. matrix. mvp }; # Transform by concatenation of the # MODELVIEW and PROJECTION matrices. DP 4 result. position. x, mat[0], pos; DP 4 result. position. y, mat[1], pos; DP 4 result. position. z, mat[2], pos; DP 4 result. position. w, mat[3], pos; IRProgram header: “!!ARBvp 1. 0” Program Statements # Pass the primary color through # w/o lighting. MOV result. color, vertex. color; IRVP 1 ATTRIBStatement IRInstruction name: pos attrib: vertex. position opcode: DP 4 destination END sources IRDst. Operand IRSrc. Operand destination: result. position write. Mask: x is. Result. Register: true source: mat swizzle. Mask: xyzw is. Input. Register: false source: pos swizzle. Mask: xyzw is. Input. Register: false
Semantic analysis and generic code generation n Features: n n n Implemented using the visitor pattern. Decouples IR from the different operations involved in each compiler phase. Allows using a common analyzer and a common code generator for both program types.
Code generation n n Phase 1: Generate an architecture-independent generic code assuming unbounded machine resources. Phase 2: Translate to specific code being aware of the concrete GPU architecture constraints. Generic. Code Generic. Instruction Machine File Descriptor Specific Code GPUInstruction Generic. Instruction GPUInstruction
Conclusions
Conclusions n Achieved goals: n Now, the Open. GL API implementation supports: n Fixed Function emulation n n Of almost the entire set of functions of VS and FS stages (the most important ones). Shader compilation for ARB_vertex_program and ARB_fragment_program specifications. n n Both compilers share most of the implementation. Clear separation between generic and specific stages.
Future work n n Support/include other 3 D RP parts (i. e. interpolation) like programables stages to reduce hardware complexity and power consumption (embedded systems). Implement high-level shading languages compilers (GLSlang, HLSL).
End of the presentation
- Slides: 40