Siggraph 2009 Render Ants Interactive REYES Rendering on




















































- Slides: 52

Siggraph 2009 Render. Ants: Interactive REYES Rendering on GPUs Kun Zhou Minmin Gong Qiming Hou Zhong Ren Xin Sun Baining Guo JAEHYUN CHO

Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion 2

Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion 3

REYES rendering ● “Renders Everything You Ever Saw” ● In 1980 s by Carpenter and Cook ● Photo-realistic images ● Main Idea ● Subdivide every primitive into micropolygons ● In use by Pixar ● Photo. Realistic. Render. Man ( PRMan ) 4

Basic REYES pipeline Modeling Bucketing Application primitives Shade shaded micropolygons Sample Bound Yes Split visible points Too Large? Composite & Filter No Dice 5 pixels unshaded micropolygons

Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion 6

System overview ● Map all basic REYES stages to the GPU ● Add 3 dynamic scheduling stages ● Support multi-GPU rendering 7

Render. Ants system pipeline 8

Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion 9

Bound/Split and Dice 10

Bound/Split and Dice ● Bound/Split ● All input primitives are stored in a queue ● Primitives in queue are bound and split in parallel ● Dice ● Primitives in dicing region are subdivided into micropolygons in parallel 11

Shade 12

Shade ● Main idea ● Translate Render. Man shader instructions to GPU shader instructions ● Use shader compiler ● Each vertex of micropolygons is shaded 13

Shade ● Out-of-core Texture fetch ● Too large to load on GPU memory at one time ● Use CPU-side cache manager ● If not in cache, interrupt GPU then cache reads from disk and copy to GPU 14

Sample 15

Sample ● Main idea ● Each pixel in sampling region is divided into subpixels ● If micropolygon covers sample location of subpixel, compute and store sample point of left micropolygon 16 sample point of right micropolygon

Sample ● Compute sample point ● Interpolate color, opacity and depth values of micropolygon at sample location 17

Composite & Filter 18

Composite & Filter ● Composite ● Sort sample points of each subpixel in depth order ● Composite sample points of each subpixel in depth order until meeting the depth of subpixel in parallel ● Filter ● For each pixel, blend color and opacity of subpixels in parallel 19

Advanced features ● Shadow ● Use shadow maps through shadow pass ● Motion blur & Depth-of-field ● Use accumulation buffer ● Assign unique sample time to each subpixel ● Sample subpixel whose sample time is equal to current rendering time 20

Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion 21

Dynamic scheduling ● Main idea ● Maximize parallelism at each stage ● Estimate memory requirements at each stage 22

Dicing scheduler 23

Dicing scheduler ● Main factor of memory requirements ● Total data of micropolygons ● Estimate memory requirements ● Total # of micropolygons computed from total # of primitives 24

Dicing scheduler ● Main idea ● Split current bucket into dicing regions ● Until # of primitives in processing region fits available GPU memory ● Use binary space partitioning ( BSP ) 25

How to split dicing region? ● Let # of primitive to fit GPU memory = 2 bucket 26 primitive

How to split dicing region? ● Let # of primitive to fit GPU memory = 2 subregion bucket 27 primitive bucket primitive

How to split dicing region? ● Let # of primitive to fit GPU memory = 2 subregion bucket 28 primitive bucket subregion primitive bucket primitive

Shading scheduler 29

Shading scheduler ● Main factor of memory requirements ● Temporary data allocated during shader execution ● Estimate memory requirements ● Different shaders require different sizes of temporary data 30

Shading scheduler ● Main idea ● Split micropolygon list into sublist ● Until # of micropolygons for current shader execution fits available GPU memory ● Do scheduling per shader execution 31

Sampling scheduler 32

Sampling scheduler ● Main factor of memory requirements ● Total data of subpixel framebuffer and sample points ● Estimate memory requirements ● Framebuffer size equals to region size ● Use line scanning process to estimate # of sample points 33

Sampling scheduler ● Main idea ● Split current dicing region into sampling regions ● Until # of sample points in processing region + region size fits available GPU memory ● Use binary space partitioning ( BSP ) 34

Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion 35

Multi-GPU rendering ● Main idea ● Minimize inter-GPU communication ● Balance workloads among GPUs 36

How to minimize inter-GPU communication? ● GPU maintains a complete list of all primitives in a bucket ● Only transfer region description 37

How to minimize inter-GPU communication? ● Let A, B, C denote each GPU A bucket 38

How to minimize inter-GPU communication? ● Let A, B, C denote each GPU subregion A bucket 39 A bucket B

How to minimize inter-GPU communication? ● Let A, B, C denote each GPU subregion B A bucket 40 A bucket B A bucket C

How to balance workloads among GPUs? ● Split region under both conditions ● If # of primitives > threshold ● If idle GPU exists 41

How to balance workloads among GPUs? ● Let threshold = 2 subregion A bucket 42 B primitive

How to balance workloads among GPUs? ● Let threshold = 2 subregion B A bucket 43 B primitive A bucket C primitive

Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion 44

Results 45

Rendering Performance 46

Rendering Time on GPU ● Breakdown of the rendering time on GPU ● Initialization time is relatively short ( Data loading from CPU to GPU ) 47

Scaled Performance on GPU 48

Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion 49

Conclusions ● Advantages ● Faster than CPU-based Rendering ● Performance scalability ● Disadvantages ● Geometry scalability ● Motion/focal blur ●Improved in [Hou et al 2010] 50

Questions & Answers Q&A 51

Finish! Thank You 52