ATI GPUs and Graphics APIs Mark Segal ATI

  • Slides: 9
Download presentation
ATI GPUs and Graphics APIs Mark Segal

ATI GPUs and Graphics APIs Mark Segal

ATI Hardware • X 1 K series • 8 SIMD vertex engines, 16 SIMD

ATI Hardware • X 1 K series • 8 SIMD vertex engines, 16 SIMD fragment (pixel) engines • 3 -component vector + scalar ALUs • X 1900: 48 fragment ALU cores • Dynamic flow control • 256 Mb, 512 Mb configurations • 650 MHz engine, 775 MHz memory (X 1900 XTX)

X 1 K Fragment Processor Features • Dynamic Flow Control • Branching (IF…ELSE), Looping,

X 1 K Fragment Processor Features • Dynamic Flow Control • Branching (IF…ELSE), Looping, Subroutines • 128 -bit (4 x 32) Floating-Point Processing • For pixel and vertex shaders • Longer Shaders (512 instructions) • X 1900 XT • 120 Gflops peak compute (fragment processors only) • 60 Gflops (measured) on dense matrix-matrix multiply

Threading • When a fragment program hits a stall, switch to another fragment that’s

Threading • When a fragment program hits a stall, switch to another fragment that’s ready to go • E. g. texture read takes many cycles • Latency hiding • Many fragments in various stages of completion at any one time • Multiple calculations in flight • Requires storage for stalled fragments’ data • Can use unused temporary registers if available • Flow control

Graphics Programming Interfaces • Provide software interface to graphics hardware • Lowest level: •

Graphics Programming Interfaces • Provide software interface to graphics hardware • Lowest level: • Expose full functionality of hardware at full performance • Hide device-specific details • Limit interface changes generation to generation • Higher levels: • Simplify application programming • E. g. for graphics: scene graph, shading languages • Current interfaces aren’t so lowest-level anymore

New low-level interface host • Distinguish two characteristics Command P Processor • Data path,

New low-level interface host • Distinguish two characteristics Command P Processor • Data path, routing, memory • parallel data processors • Expose Programmability; jettison fixed function • Expose memory capabilities and routing • Stripped-down interface F Rasterizer Fragment P Processor F Graphics Memory • Machine language • Compiler for higher-level languages • Use libraries Vertex P Processor Per-Pixel Operations P: programmable F: fixed

Compare with Open. GL • No Begin/End or immediate mode • No vertex transform

Compare with Open. GL • No Begin/End or immediate mode • No vertex transform • No texture environment • Open. GL is an application layered on this • Benefit: simplified driver • Much less state management • No software path • Better support, faster addition of new features • Better match to GPGPU • Benefit: greater control over memory usage

Conclusion • Good image quality requires lots of computation • Recent GPUs have lots

Conclusion • Good image quality requires lots of computation • Recent GPUs have lots of computational power • Don’t forget details, like memory • Starting to see use for effects in games • games still drive the market, and they always need more performance • Current graphics APIs aren’t quite up to the task of presenting the hardware’s computational abilities outside of graphics

Games and Numerical Applications • Game physics • Collision detection, rigid body dynamics, particle

Games and Numerical Applications • Game physics • Collision detection, rigid body dynamics, particle systems, fluid (water) simulation, cloth and hair, etc. • Collision detection + response “shaders” • Game play physics vs. effects physics • Game play physics affects game outcome • Effects physics affects display only • E. g. water, trees, rubble, cloth and hair