Siggraph 2009 Render Ants Interactive REYES Rendering on




















































- Slides: 52
 
	Siggraph 2009 Render. Ants: Interactive REYES Rendering on GPUs Kun Zhou Minmin Gong Qiming Hou Zhong Ren Xin Sun Baining Guo JAEHYUN CHO
	Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion 2
	Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion 3
	REYES rendering ● “Renders Everything You Ever Saw” ● In 1980 s by Carpenter and Cook ● Photo-realistic images ● Main Idea ● Subdivide every primitive into micropolygons ● In use by Pixar ● Photo. Realistic. Render. Man ( PRMan ) 4
	Basic REYES pipeline Modeling Bucketing Application primitives Shade shaded micropolygons Sample Bound Yes Split visible points Too Large? Composite & Filter No Dice 5 pixels unshaded micropolygons
	Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion 6
	System overview ● Map all basic REYES stages to the GPU ● Add 3 dynamic scheduling stages ● Support multi-GPU rendering 7
	Render. Ants system pipeline 8
	Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion 9
	Bound/Split and Dice 10
	Bound/Split and Dice ● Bound/Split ● All input primitives are stored in a queue ● Primitives in queue are bound and split in parallel ● Dice ● Primitives in dicing region are subdivided into micropolygons in parallel 11
	Shade 12
	Shade ● Main idea ● Translate Render. Man shader instructions to GPU shader instructions ● Use shader compiler ● Each vertex of micropolygons is shaded 13
	Shade ● Out-of-core Texture fetch ● Too large to load on GPU memory at one time ● Use CPU-side cache manager ● If not in cache, interrupt GPU then cache reads from disk and copy to GPU 14
	Sample 15
	Sample ● Main idea ● Each pixel in sampling region is divided into subpixels ● If micropolygon covers sample location of subpixel, compute and store sample point of left micropolygon 16 sample point of right micropolygon
	Sample ● Compute sample point ● Interpolate color, opacity and depth values of micropolygon at sample location 17
	Composite & Filter 18
	Composite & Filter ● Composite ● Sort sample points of each subpixel in depth order ● Composite sample points of each subpixel in depth order until meeting the depth of subpixel in parallel ● Filter ● For each pixel, blend color and opacity of subpixels in parallel 19
	Advanced features ● Shadow ● Use shadow maps through shadow pass ● Motion blur & Depth-of-field ● Use accumulation buffer ● Assign unique sample time to each subpixel ● Sample subpixel whose sample time is equal to current rendering time 20
	Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion 21
	Dynamic scheduling ● Main idea ● Maximize parallelism at each stage ● Estimate memory requirements at each stage 22
	Dicing scheduler 23
	Dicing scheduler ● Main factor of memory requirements ● Total data of micropolygons ● Estimate memory requirements ● Total # of micropolygons computed from total # of primitives 24
	Dicing scheduler ● Main idea ● Split current bucket into dicing regions ● Until # of primitives in processing region fits available GPU memory ● Use binary space partitioning ( BSP ) 25
	How to split dicing region? ● Let # of primitive to fit GPU memory = 2 bucket 26 primitive
	How to split dicing region? ● Let # of primitive to fit GPU memory = 2 subregion bucket 27 primitive bucket primitive
	How to split dicing region? ● Let # of primitive to fit GPU memory = 2 subregion bucket 28 primitive bucket subregion primitive bucket primitive
	Shading scheduler 29
	Shading scheduler ● Main factor of memory requirements ● Temporary data allocated during shader execution ● Estimate memory requirements ● Different shaders require different sizes of temporary data 30
	Shading scheduler ● Main idea ● Split micropolygon list into sublist ● Until # of micropolygons for current shader execution fits available GPU memory ● Do scheduling per shader execution 31
	Sampling scheduler 32
	Sampling scheduler ● Main factor of memory requirements ● Total data of subpixel framebuffer and sample points ● Estimate memory requirements ● Framebuffer size equals to region size ● Use line scanning process to estimate # of sample points 33
	Sampling scheduler ● Main idea ● Split current dicing region into sampling regions ● Until # of sample points in processing region + region size fits available GPU memory ● Use binary space partitioning ( BSP ) 34
	Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion 35
	Multi-GPU rendering ● Main idea ● Minimize inter-GPU communication ● Balance workloads among GPUs 36
	How to minimize inter-GPU communication? ● GPU maintains a complete list of all primitives in a bucket ● Only transfer region description 37
	How to minimize inter-GPU communication? ● Let A, B, C denote each GPU A bucket 38
	How to minimize inter-GPU communication? ● Let A, B, C denote each GPU subregion A bucket 39 A bucket B
	How to minimize inter-GPU communication? ● Let A, B, C denote each GPU subregion B A bucket 40 A bucket B A bucket C
	How to balance workloads among GPUs? ● Split region under both conditions ● If # of primitives > threshold ● If idle GPU exists 41
	How to balance workloads among GPUs? ● Let threshold = 2 subregion A bucket 42 B primitive
	How to balance workloads among GPUs? ● Let threshold = 2 subregion B A bucket 43 B primitive A bucket C primitive
	Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion 44
	Results 45
	Rendering Performance 46
	Rendering Time on GPU ● Breakdown of the rendering time on GPU ● Initialization time is relatively short ( Data loading from CPU to GPU ) 47
	Scaled Performance on GPU 48
	Outline ● REYES Rendering ● System Overview ● GPU REYES Rendering ● Dynamic Scheduling ● Multi-GPU Rendering ● Results ● Conclusion 49
	Conclusions ● Advantages ● Faster than CPU-based Rendering ● Performance scalability ● Disadvantages ● Geometry scalability ● Motion/focal blur ●Improved in [Hou et al 2010] 50
	Questions & Answers Q&A 51
	Finish! Thank You 52