Siggraph 2009 Render Ants Interactive REYES Rendering on

  • Slides: 52
Download presentation
Siggraph 2009 Render. Ants: Interactive REYES Rendering on GPUs Kun Zhou Minmin Gong Qiming

Siggraph 2009 Render. Ants: Interactive REYES Rendering on GPUs Kun Zhou Minmin Gong Qiming Hou Zhong Ren Xin Sun Baining Guo JAEHYUN CHO

Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling

Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion 2

Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling

Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion 3

REYES rendering ● “Renders Everything You Ever Saw” ● In 1980 s by Carpenter

REYES rendering ● “Renders Everything You Ever Saw” ● In 1980 s by Carpenter and Cook ● Photo-realistic images ● Main Idea ● Subdivide every primitive into micropolygons ● In use by Pixar ● Photo. Realistic. Render. Man ( PRMan ) 4

Basic REYES pipeline Modeling Bucketing Application primitives Shade shaded micropolygons Sample Bound Yes Split

Basic REYES pipeline Modeling Bucketing Application primitives Shade shaded micropolygons Sample Bound Yes Split visible points Too Large? Composite & Filter No Dice 5 pixels unshaded micropolygons

Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling

Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion 6

System overview ● Map all basic REYES stages to the GPU ● Add 3

System overview ● Map all basic REYES stages to the GPU ● Add 3 dynamic scheduling stages ● Support multi-GPU rendering 7

Render. Ants system pipeline 8

Render. Ants system pipeline 8

Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling

Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion 9

Bound/Split and Dice 10

Bound/Split and Dice 10

Bound/Split and Dice ● Bound/Split ● All input primitives are stored in a queue

Bound/Split and Dice ● Bound/Split ● All input primitives are stored in a queue ● Primitives in queue are bound and split in parallel ● Dice ● Primitives in dicing region are subdivided into micropolygons in parallel 11

Shade 12

Shade 12

Shade ● Main idea ● Translate Render. Man shader instructions to GPU shader instructions

Shade ● Main idea ● Translate Render. Man shader instructions to GPU shader instructions ● Use shader compiler ● Each vertex of micropolygons is shaded 13

Shade ● Out-of-core Texture fetch ● Too large to load on GPU memory at

Shade ● Out-of-core Texture fetch ● Too large to load on GPU memory at one time ● Use CPU-side cache manager ● If not in cache, interrupt GPU then cache reads from disk and copy to GPU 14

Sample 15

Sample 15

Sample ● Main idea ● Each pixel in sampling region is divided into subpixels

Sample ● Main idea ● Each pixel in sampling region is divided into subpixels ● If micropolygon covers sample location of subpixel, compute and store sample point of left micropolygon 16 sample point of right micropolygon

Sample ● Compute sample point ● Interpolate color, opacity and depth values of micropolygon

Sample ● Compute sample point ● Interpolate color, opacity and depth values of micropolygon at sample location 17

Composite & Filter 18

Composite & Filter 18

Composite & Filter ● Composite ● Sort sample points of each subpixel in depth

Composite & Filter ● Composite ● Sort sample points of each subpixel in depth order ● Composite sample points of each subpixel in depth order until meeting the depth of subpixel in parallel ● Filter ● For each pixel, blend color and opacity of subpixels in parallel 19

Advanced features ● Shadow ● Use shadow maps through shadow pass ● Motion blur

Advanced features ● Shadow ● Use shadow maps through shadow pass ● Motion blur & Depth-of-field ● Use accumulation buffer ● Assign unique sample time to each subpixel ● Sample subpixel whose sample time is equal to current rendering time 20

Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling

Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion 21

Dynamic scheduling ● Main idea ● Maximize parallelism at each stage ● Estimate memory

Dynamic scheduling ● Main idea ● Maximize parallelism at each stage ● Estimate memory requirements at each stage 22

Dicing scheduler 23

Dicing scheduler 23

Dicing scheduler ● Main factor of memory requirements ● Total data of micropolygons ●

Dicing scheduler ● Main factor of memory requirements ● Total data of micropolygons ● Estimate memory requirements ● Total # of micropolygons computed from total # of primitives 24

Dicing scheduler ● Main idea ● Split current bucket into dicing regions ● Until

Dicing scheduler ● Main idea ● Split current bucket into dicing regions ● Until # of primitives in processing region fits available GPU memory ● Use binary space partitioning ( BSP ) 25

How to split dicing region? ● Let # of primitive to fit GPU memory

How to split dicing region? ● Let # of primitive to fit GPU memory = 2 bucket 26 primitive

How to split dicing region? ● Let # of primitive to fit GPU memory

How to split dicing region? ● Let # of primitive to fit GPU memory = 2 subregion bucket 27 primitive bucket primitive

How to split dicing region? ● Let # of primitive to fit GPU memory

How to split dicing region? ● Let # of primitive to fit GPU memory = 2 subregion bucket 28 primitive bucket subregion primitive bucket primitive

Shading scheduler 29

Shading scheduler 29

Shading scheduler ● Main factor of memory requirements ● Temporary data allocated during shader

Shading scheduler ● Main factor of memory requirements ● Temporary data allocated during shader execution ● Estimate memory requirements ● Different shaders require different sizes of temporary data 30

Shading scheduler ● Main idea ● Split micropolygon list into sublist ● Until #

Shading scheduler ● Main idea ● Split micropolygon list into sublist ● Until # of micropolygons for current shader execution fits available GPU memory ● Do scheduling per shader execution 31

Sampling scheduler 32

Sampling scheduler 32

Sampling scheduler ● Main factor of memory requirements ● Total data of subpixel framebuffer

Sampling scheduler ● Main factor of memory requirements ● Total data of subpixel framebuffer and sample points ● Estimate memory requirements ● Framebuffer size equals to region size ● Use line scanning process to estimate # of sample points 33

Sampling scheduler ● Main idea ● Split current dicing region into sampling regions ●

Sampling scheduler ● Main idea ● Split current dicing region into sampling regions ● Until # of sample points in processing region + region size fits available GPU memory ● Use binary space partitioning ( BSP ) 34

Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling

Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion 35

Multi-GPU rendering ● Main idea ● Minimize inter-GPU communication ● Balance workloads among GPUs

Multi-GPU rendering ● Main idea ● Minimize inter-GPU communication ● Balance workloads among GPUs 36

How to minimize inter-GPU communication? ● GPU maintains a complete list of all primitives

How to minimize inter-GPU communication? ● GPU maintains a complete list of all primitives in a bucket ● Only transfer region description 37

How to minimize inter-GPU communication? ● Let A, B, C denote each GPU A

How to minimize inter-GPU communication? ● Let A, B, C denote each GPU A bucket 38

How to minimize inter-GPU communication? ● Let A, B, C denote each GPU subregion

How to minimize inter-GPU communication? ● Let A, B, C denote each GPU subregion A bucket 39 A bucket B

How to minimize inter-GPU communication? ● Let A, B, C denote each GPU subregion

How to minimize inter-GPU communication? ● Let A, B, C denote each GPU subregion B A bucket 40 A bucket B A bucket C

How to balance workloads among GPUs? ● Split region under both conditions ● If

How to balance workloads among GPUs? ● Split region under both conditions ● If # of primitives > threshold ● If idle GPU exists 41

How to balance workloads among GPUs? ● Let threshold = 2 subregion A bucket

How to balance workloads among GPUs? ● Let threshold = 2 subregion A bucket 42 B primitive

How to balance workloads among GPUs? ● Let threshold = 2 subregion B A

How to balance workloads among GPUs? ● Let threshold = 2 subregion B A bucket 43 B primitive A bucket C primitive

Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling

Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion 44

Results 45

Results 45

Rendering Performance 46

Rendering Performance 46

Rendering Time on GPU ● Breakdown of the rendering time on GPU ● Initialization

Rendering Time on GPU ● Breakdown of the rendering time on GPU ● Initialization time is relatively short ( Data loading from CPU to GPU ) 47

Scaled Performance on GPU 48

Scaled Performance on GPU 48

Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling

Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion 49

Conclusions ● Advantages ● Faster than CPU-based Rendering ● Performance scalability ● Disadvantages ●

Conclusions ● Advantages ● Faster than CPU-based Rendering ● Performance scalability ● Disadvantages ● Geometry scalability ● Motion/focal blur ●Improved in [Hou et al 2010] 50

Questions & Answers Q&A 51

Questions & Answers Q&A 51

Finish! Thank You 52

Finish! Thank You 52