Making the Pieces Fit Together Jonathan Blow Game

Making the Pieces Fit Together Jonathan Blow Game Developers Conference Reception October 21, 2002 Seoul, Korea

3 D Techniques I Will Discuss • • • Level-of-Detail Management (LOD) Triangle Strip Generation Vertex Cache Optimization Normal Map Generation Ordered Rendering (sorted output geometry)

How I will discuss them • You can read about these techniques on the internet: hardware vendor sites, programmer hobbyist sites. There is a lot of hype. • Most of this stuff is not written by people actually making ambitious games (they’re busy!). • Most of it is ill-advised. • I want to provide a hype-free, skeptical review.

Lecture in Three Parts • Part 1: A Sense of Perspective – What is 3 D rendering for games, today? • Part 2: The Techniques – Explained by a Skeptic… • Part 3: Making Games – How to use 3 D techniques without going out of business or building a horrible game.

Part 1: A Sense of Perspective

3 D rendering for games is a complicated subject • Partially because we have accomplished a lot – Recent demos, and a few games, are graphically very impressive • (but the demos look much better than the games – why is that? ? ) • The games all draw worlds by projecting a bunch of triangles onto the screen.

Primary Rendering Paradigm • Projecting triangles – but very fancy triangles – Texture maps, normal maps, complex lighting • Alternative representations exist – NURBS, N-Patches, subdivision surfaces – These are used in preprocesses, translated into triangles for the realtime pipeline.

Why have triangles dominated? They are simple and robust.

Suppose we’re inventing realtime rendering from scratch • Project every point of solid object to the screen, use depth buffer – We waste a lot of resources drawing everything inside the solid, which will inevitably be hidden! • Cull out interior points (same result) • Now we have a bunch of solid 2 D shells to draw, but each still has a large number of points • We want a more compressed way to represent 2 D subsets of 3 D

Introducing the Triangle • The triangle is the simplest way to denote a closed region of 2 D space.

Start with a point P • We have a 0 -dimensional space

Define one more point P Q • Suddenly we have a 1 D space! • That is a lot bigger than 0 D. P + t(Q-P)

Add a third point R P Q • Now we have a 2 D space! P + t(Q-P) + s(R-P) • In a way, the concept of a triangle is the same as the concept of two dimensions.

The linearity of the triangle is tremendously useful! • Easy to: – – Interpolate Clip Intersection test Bounding volume • Linear equations are the most basic and wellunderstood kind (see, for example, linearizing differential equations!) • If you are doing something unconventional, the triangle probably won’t get in your way.

Higher-order surfaces cause more problems. • Clipping a curved surface is annoying. • Bounding volumes are also annoying. • The offset of a Bezier surface is not a Bezier surface – So what happens if spline parameters are your base representation, and you need to offset? Green surface is a spline Red is not

Among linear polygons, triangles are the simplest. • Quads can be noncoplanar (vertex lighting will fail!) • Pipeline must handle primitives of varying vertices • Games had brief dalliances with quads / ngons around 1996, but nobody uses them any more to represent general geometry.

In Summary • The impressiveness of our current graphics techniques depends on us being able to draw a lot of triangles.

Question: “So how do I draw a lot of triangles? ” Part 2: The Techniques (Answer: “very carefully. ”)

Rendering Techniques that people like to hear about… but first: • There are two basic kinds of 3 D techniques – #1: We would think about them if we had infinitely fast hardware (e. g. projective transform, BRDF) – #2: The kind we only care about because hardware is slow • Type #2 usually introduces complications, and we need to manage those complications

Drawing a lot of triangles: Reduce Data Size • Fancy Triangles = big vertices (60 bytes each) – XYZ position (12 bytes) – Texture UV coordinates (8 bytes) – RGBA color (4 bytes) – Tangent frame (36 bytes; maybe smaller) • 180 bytes per triangle if you just list vertices! (5000 triangles = 900 kbytes) • This makes the hardware run slowly

Indexed Triangle List • • A mesh has a lot of shared vertices Put the vertices into an array The triangles are described by indices into this array Shrinks total amount of data – F = 2 V ; S 0 = 3 k. F; S 1 = k. V + 3 i. F; S 1 – S 0 = V(5 k – 6 i) Bonus: Separates topology from position data 2 3 4 0 1 0, 1, 2, 3 1, 3, 4

Triangles in a mesh share not only vertices, but edges too 2 3 2 4 0 1 2 3 3 4 0 1 0, 1, 2 1 1, 2, 3 1 3, 1, 4

Triangle Strips • We can compress a list of indexed triangles by forming “strips” that run along the shared edges. 6 4 2 5 012, 123, 234, 345, 456 012, 3, 4, 5, 6 3 0 1

Cost analysis of triangle strips is often somewhat wrong • 3 indices for the 1 st triangle, 1 for each thereafter • Incomplete because there also needs to be a way to delimit strips 3 strips: 01234, 567, 89241 Index buffer: 0123456789241 But where do they start and end?

Delimiting Triangle Strips • Explicitly add numbers to describe strip length 3 strips: 01234, 567, 89241 Index buffer: 5012343567589241 • Direct. X 8 -style separate API calls (impact on CPU usage, AND adds numbers behind the scenes) Index buffer: 0123456789241 Draw. Indexed. Primitive 0, 5 Draw. Indexed. Primitive 5, 3 Draw. Indexed. Primitive 8, 5 Output stream: 5012343567589241 • Strips start out worse than lists, and have to catch up… the longer the strip, the better you catch up

Because triangle strips are limited, we need to add swaps 6 6 5 4 2 5 3 0 1 1 012, 3, 4, 5, 6 012, 3, 2, 4, 5, 6 012, 123, 234, 345, 456 012, 123, 232, 324, 245, 456

Triangle Strip Efficiency • Depends on strip length, which depends on your data • It takes a complicated algorithm to make good strips. 4 strips, 40 indices 10 strips, 52 indices (no swaps yet)

Triangle Strip Skepticism • In a full game, performance numbers don’t necessarily validate triangle strips… we’ll see why • Strips make implementation complications • Even with perfect stripping, you only reduce index data (minority of total data) from 6 i. V to 2 i. V+2. You won’t have perfect stripping. • Degenerate triangles can cost you.

If you want to make a strip algorithm… • Most papers give you the basic idea, but are not very good in the end – Old SGI source code – STRIPE papers • You really want a non-greedy algorithm – Heuristics based on strip length and cache – Tunneling operator

Vertex Cache and Vertex Shader • We want to cache vertex memory for fast access… • Vertex Shader is a small hardware program that runs for each vertex – Compute lighting, transform, skinning, etc • Hardware caches the results of the evaluated vertex shader – A cache miss means running the shader again • (More expensive than traditional CPU cache miss!) memory shader vertex cache

You want to order vertices by cache efficiency • Mostly use vertices you just used recently • But this conflicts with triangle strip efficiency! Can’t even do the red path in one triangle strip without inserting a teleport (very expensive!)

Vertex cache effects can be dominant • Multi-pass rendering –you skin the guy multiple times, so shader is expensive! • Or do you skin on the CPU? • Now you begin to have a lot of optimization choices; these can determine who’s dominant • The “right answer” depends on your game and target platform

How do we resolve the conflict between strips and cache? • Maybe you write a triangle stripper that tries to deal with the vertex cache – Complicated to write, degraded results on both sides; Nvidia’s does this • Maybe you ignore vertex caching – Might be okay if your shaders are cheap • Maybe you ignore triangle strips, and just use triangle lists

Quirks of some architectures make strips better • Nvidia triangle setup (Xbox, etc) • Nvidia push buffer bottleneck also makes strips more effective.

Now… we need some kind of LOD • Because even perfect triangle strips / cache hits still draws way too many triangles… we need to go from O(n) to O(log n). • Several types of LOD available: – Dynamic (view-dependent): FORGET IT – Static mesh switching (simple) – Progressive mesh (best algorithm: VIPM)

View Independent Progressive Mesh • Collapse vertices due to base-plane error metric. • Generate one sequence of collapses that takes us from highres to low-res. • Popping in VIPM is subtle, which is good. • VIPM draws fewer triangles than static switching, since we usually push static switching away in Z to avoid popping.

Problem with VIPM • VIPM slides a window across the index buffer, doing fix-ups. index buffer fix-up record • Need to sort vertices by LOD collapse order • This conflicts with strip / cache sorting • You can’t do all three at once (though you can do stripped VIPM or cache-sorted VIPM)

Sorting Score Card (more items will be added here) • Triangle strip efficiency order • Vertex cache order • LOD collapse order (if VIPM)

Normal Map Generation • Approximate huge amounts of geometry by per-texel normals • Generate the maps by crunching a high-res mesh down onto a low-res one… • When rendering, transform texture normal by iterated tangent frame, and you get the normal of the high-res model (almost) • Object or tangent space?

Normal Map Generation influences LOD choice • With static switching, you just have an array of meshes • With VIPM, you are forced to use objectspace normal maps, which probably don’t compress as well as tangent-space maps. • Normal mapping to a high-res model makes static mesh switching look better (much less popping… most popping was due to light)

More Sorting • To render quickly, we want to sort by render state (multiple materials on the same object means we break that object into several passes, decreasing triangle strip and vertex cache effectiveness) • To render quickly, we want to draw front-to-back (fast z-fail) • To render transparent things correctly, we need to draw those back-to-front (break these into a separate pass, decrease stripping and cache effectiveness) • We are robbing ourselves of the benefits we got earlier… so hopefully we didn’t pay very much for them (more on this later)

Sorting Score Card • • • Triangle strip efficiency order Vertex cache order LOD collapse order (if PM) Sort by shader Front-to-back (opaque things) Back-to-front (translucent things)

How do you LOD a guy with multiple materials? • Materials usually done by one pixel / vertex shader pair, per material • Can only combine triangles so much (can’t cross material boundary) • Can’t combine textures into one (lose lighting effects) • Everybody just kind of punts… this is an important problem to solve for the future.

Part 3: Making Games

Trade-Offs • As computer scientists and engineers we are accustomed to the idea of engineering tradeoffs (time for space, etc) • Must consider code complexity to be a FINITE RESOURCE that can be traded with time, space, etc.

Complexity as Resource • Every extra line of code or ‘if’ statement must be maintained through the life of the project and must interact with new features – IMPORTANT: most new features are not orthogonal; they will FIGHT with your existing code. • You only have so much complexity to spend over the course of your project; too much and your project will fail.

Cultural Problem • At least in America, many programmers try to prove themselves by doing complicated, impressive-sounding things. • Try to make 3 D engine that is the “next big cool thing” • The successful paths of the past have been things that are NOT complicated (triangles are simple!) • Successful paths of the future will probably also be the simpler ones. So…

A Thought • If your engine / algorithms seem very complicated…. – …. they are unlikely to be on a path that history will make successful – They will NOT be the next big thing

Good Art and Levels are more important than a good engine Max Payne • If you are adding engine features that make it more difficult to create levels / content (without making the content a lot richer), this is probably a mistake

Cost-Benefit Analysis • Don’t forget to account for opportunity cost … every minute you spend working on A is a minute not working on B • You need to be an economist, deciding how to get the most net worth out of the resources you have to spend. • YOU need to do it, not just the managers – It is a multiscale (fractal) phenomenon

So how is a game different from a demo? • A demo is free to isolate itself to a small group of effects that “fit together” – Examples? (stencil w/out transparencies, highpoly without dynamic shadows, etc) • Games have gameplay ramifications – You MUST support the game design, make all the disparate pieces connect – “Is thing in shadow? ”

My Personal Choices • Triangle lists, vertex cache optimized – No triangle strips • Static-switching LOD – No progressive mesh I try to be as simple as possible in graphics, reserving complexity for things like AI or physics.

References • VIPM, triangle strips, vertex cache: – www. cbloom. com/3 d/techdocs/vipm_topics. txt

The End Questions?