Data Management Techniques SungEui Yoon KAIST URL http

Data Avalanche (or Data Explosions) There are too much data out data!!! www. cs.

Geometric Data Avalanche ● Massive geometric data ● Due to advances of modeling, simulation,

CAD Model: Double Eagle Oil Tanker 82 million triangles (4 gigabyte)

CAD Model: Boeing 777 Ray Tracing Boeing 777, 470 million triangles Excerpted from SIGGRAPH

Scanned Model: ST. Matthew Model 372 million triangles (10 GB) www. cyberware. com

Possible Solutions? ● Hardware improvement will address the data avalanche? ● Moore’s law: the

Current Architecture Trends Data access time becomes 1000 computational bottleneck! the major Accumulated growth

Four Orthogonal Approaches ● Cache-coherent layouts ● Random-accessible compressed meshes ● Cache-oblivious ray reordering

Overview ● Cache-coherent layouts ● Random-accessible compressed meshes ● Cache-oblivious ray reordering ● Hybrid

Cache-Coherent Layouts of Meshes ● One dimensional data layout of a mesh ● Reduce

Block-based I/O Model [Aggarwal and Vitter 88] Fast memory Slow memory or cache CPU

Applications ● View-dependent meshes ● View-dependent rendering ● Triangle meshes ● Isocontour extractions ●

View-Dependent Rendering using LODs Improving GPU vertex cache Utilization Ge. Force 6800 (January 2005)

Advantages ● General ● Works well for various applications ● Cache-oblivious ● Can have

Random-Accessible Compressed Data ● Compression methods of meshes and hierarchies ● Reduce the memory

Hierarchical-Culling oriented Compact Meshes (HCCMeshes) ● Consists of two parts: ● i-HCCMeshes (in-core representation)

Data Access Framework Main memory Data pool Request User Data 21

Data Access Framework - Out-of-Core Technique Main memory Cached data Request User Data Cluster

HCCMeshes Support hierarchical random access! Main memory Cached data Request User Decomp. Data Compressed

Main Benefits ● Use a lower memory space and working set size ● o-HCCMeshes

Applications ● Whitted-style ray tracing ● LOD-based ray tracing ● Collision detection ● Photon

Overview ● Multi-resolution representations ● Random-accessible compressed meshes ● Cache-oblivious ray reordering ● Hybrid

Challenges ● Secondary rays generated show low ray coherence ● Result in low cache

Goal ● Design an efficient algorithm for converting incoherent secondary rays to coherent ●

Ray Reordering Framework Hit points and material information Camera information Ray generation Ray reordering

Applications ● Path tracing ● Photon mapping 31

Result – Path Tracing (Video) ● 104 M triangles ● (12. 8 GB) ●

Result – Photon Mapping ● 128 M triangles ● (15. 7 GB) ● Cache

Collision Detection ● Collision detection is used in various fields ● Game, movie, scientific

Discrete VS Continuous Discrete collision detection (DCD) Time step (i-1) 36 Time step (i)

Discrete VS Continuous collision detection(CCD) Time step (i-1) 37 Time step (i)

Discrete VS Continuous Discrete collision detection (DCD) ? Time step (i-1) 38 Time step

Discrete VS Continuous 39 Continuous CD Discrete CD Accuracy Accurate May miss collisions Computation

Motivation ● Continuous collision detection ● Accurate, but slow for complex models ● Hardware

Hybrid Parallel CCD [Kim et al. PG 09] ● Takes advantages of both: ●

Results ● Performance of HPCCD utilizing both CPUs and GPUs Source codes are available

Conclusions ● Data explosion and lower growth rate of data access time ● Discussed

Acknowledgements ● Research collaborators ● Tae. Joon Kim, Duk. Su Kim, Pio Claudio, Boo.

Slides: 45

Download presentation

Data Management Techniques Sung-Eui Yoon KAIST URL: http: //jupiter. kaist. ac. kr/~sungeui/

Data Avalanche (or Data Explosions) There are too much data out data!!! www. cs. umd. edu/class/spring 2001/ cmsc 838 b/Project/Parija_Spacco/images/

Geometric Data Avalanche ● Massive geometric data ● Due to advances of modeling, simulation, and data capture techniques ● Time-varying data (4 D data sets)

CAD Model: Double Eagle Oil Tanker 82 million triangles (4 gigabyte)

CAD Model: Boeing 777 Ray Tracing Boeing 777, 470 million triangles Excerpted from SIGGRAPH course note on massive model rendering

Scanned Model: ST. Matthew Model 372 million triangles (10 GB) www. cyberware. com

Possible Solutions? ● Hardware improvement will address the data avalanche? ● Moore’s law: the number of transistor is roughly double every 18 months

Current Architecture Trends Data access time becomes 1000 computational bottleneck! the major Accumulated growth rate during 1999~2009 (log scale) 192 X 56 X 100 4. 5 X 10 10 X 1. 3 X 1 Random Sequential RAM access disk. Disk access speed CPU GPU speed

Four Orthogonal Approaches ● Cache-coherent layouts ● Random-accessible compressed meshes ● Cache-oblivious ray reordering ● Hybrid parallel continuous collision detection

Overview ● Cache-coherent layouts ● Random-accessible compressed meshes ● Cache-oblivious ray reordering ● Hybrid parallel continuous collision detection

Cache-Coherent Layouts of Meshes ● One dimensional data layout of a mesh ● Reduce the number of cache misses va vb vc vd va vb vd vc One dimensional layout ● Cache-aware or cache-oblivious layouts ● Minimize the number of cache misses for a specific or various cache parameters (e. g. , cache block size) [Yoon et al. SIG 05, VIS 06, Euro 06]

Block-based I/O Model [Aggarwal and Vitter 88] Fast memory Slow memory or cache CPU or GPU Block transfer Access time: 10 -6 sec Disk 10 -4 sec 1 sec

Applications ● View-dependent meshes ● View-dependent rendering ● Triangle meshes ● Isocontour extractions ● Hierarchies ● Ray tracing ● Collision detection

View-Dependent Rendering using LODs Improving GPU vertex cache Utilization Ge. Force 6800 (January 2005)

Applications ● View-dependent meshes ● View-dependent rendering ● Triangle meshes ● Isocontour extractions ● Hierarchies ● Ray tracing ● Collision detection Achieve up to 20 X improvement on iso-contouring Puget sound, 134 M triangles Isocontour z(x, y) = 500 m

Applications ● View-dependent meshes ● View-dependent rendering ● Triangle meshes ● Isocontour extractions ● Hierarchies ● Ray tracing ● Collision detection Achieve 30% ~ 300% performance improvement

Advantages ● General ● Works well for various applications ● Cache-oblivious ● Can have benefit for all levels of the memory hierarchy (e. g. CPU/GPU caches, memory, and disk) ● No modification of runtime applications ● Only layout computation Source codes are available as a library called Open. CCL

Overview ● Cache-coherent layouts ● Random-accessible compressed meshes ● Cache-oblivious ray reordering ● Hybrid parallel continuous collision detection

Random-Accessible Compressed Data ● Compression methods of meshes and hierarchies ● Reduce the memory requirements ● Supports random accesses on meshes and hierarchies ● Can be useful to many different applications [Kim et al. Tech. Report 09; Kim et al. , TVCG 09; Yoon and Lindstrom, VIS 07]

Hierarchical-Culling oriented Compact Meshes (HCCMeshes) ● Consists of two parts: ● i-HCCMeshes (in-core representation) ● o-HCCMeshes (out-of-core representation)

Data Access Framework Main memory Data pool Request User Data 21

Data Access Framework - Out-of-Core Technique Main memory Cached data Request User Data Cluster c 0 Cluster c 1 Cluster c 2 Cluster c 3 Cluster c 4 Cluster c 5 … Cluster cn 22 External drive cluster ID Data pool cluster

HCCMeshes Support hierarchical random access! Main memory Cached data Request User Decomp. Data Compressed Data i-HCCMesh 23 Cluster c 0 Cluster c 1 Cluster c 2 Cluster c 3 Cluster c 4 Cluster c 5 Cluster c 6 Cluster c 7 Cluster c 8 Cluster c 9 Cluster c 10 Cluster c 11 Cluster c 12 Cluster c 13 … Cluster cm External drive cluster ID Data pool Decomp. compressed cluster o-HCCMesh

Main Benefits ● Use a lower memory space and working set size ● o-HCCMeshes have 20: 1 compression ratios ● i-HCCMeshes have 6: 1 compression ratios ● Improve runtime performance 24

Applications ● Whitted-style ray tracing ● LOD-based ray tracing ● Collision detection ● Photon mapping ● Non-photorealistic rendering Source codes are available as Open. RACM 25

Results 26

Overview ● Multi-resolution representations ● Random-accessible compressed meshes ● Cache-oblivious ray reordering ● Hybrid parallel continuous collision detection 27

Challenges ● Secondary rays generated show low ray coherence ● Result in low cache utilizations ● In case of ray tracing massive models, expensive cache misses occur (e. g. L 1/L 2, main memory) 28 Landscape ( >1000 M ) St. Matthew ( 372 M )

Goal ● Design an efficient algorithm for converting incoherent secondary rays to coherent ● Achieve a high cache coherence of these rays ● The performance improvement of ray tracing 29

Ray Reordering Framework Hit points and material information Camera information Ray generation Ray reordering Ray processing Caches L 1 Main memory Ray buffer [Moon et al. , under review] 30 Disk Scene information

Applications ● Path tracing ● Photon mapping 31

Result – Path Tracing (Video) ● 104 M triangles ● (12. 8 GB) ● 512*512 resolution ● 100 path ● 8 area lights 32

Result – Photon Mapping ● 128 M triangles ● (15. 7 GB) ● Cache 19% of all the data ● 4 area lights ● 13 X speedup 33

Overview ● Multi-resolution representations ● Random-accessible compressed meshes ● Cache-oblivious ray reordering ● Hybrid parallel continuous collision detection 34

Collision Detection ● Collision detection is used in various fields ● Game, movie, scientific simulation and robotics <Figure from PIXAR> <Figure from C. Lauterbach > <Figure from AION > 35

Discrete VS Continuous Discrete collision detection (DCD) Time step (i-1) 36 Time step (i)

Discrete VS Continuous collision detection(CCD) Time step (i-1) 37 Time step (i)

Discrete VS Continuous Discrete collision detection (DCD) ? Time step (i-1) 38 Time step (i)

Discrete VS Continuous 39 Continuous CD Discrete CD Accuracy Accurate May miss collisions Computation time Expensive Very fast

Motivation ● Continuous collision detection ● Accurate, but slow for complex models ● Hardware trend ● CPUs and GPUs are increasing the # of cores ● Heterogeneous architectures ● Intel Larabee architecture ● Previous approaches ● Utilize either multi-core CPUs or GPUs ● Not enough performance for interactive applications 40

Hybrid Parallel CCD [Kim et al. PG 09] ● Takes advantages of both: ● Multi-core CPU architectures ● GPU architectures ● Achieves interactive performance for various deforming models consisting of tens or hundreds of thousand triangles GPU … Multi-core CPU GPU 41 CCD

Results ● Performance of HPCCD utilizing both CPUs and GPUs Source codes are available as a library called Open. CCD 42

Results 43

Conclusions ● Data explosion and lower growth rate of data access time ● Discussed three different techniques as a data management method ● ● Cache-coherent layouts Random-accessible compressed data Cache-oblivious ray reordering Hybrid continuous collision detection ● Applied to rendering and collision detection ● Observed meaningful performance improvement 44

Acknowledgements ● Research collaborators ● Tae. Joon Kim, Duk. Su Kim, Pio Claudio, Boo. Chang Moon, Yong. Young Byun, Jae. Pil Heo, Seung. Yong Lee, Yong. Jin Kim, Jae. Hyuk Heo, John Kim, Peter Lindstrom, Valerio Pascucci, Dinesh Manocha ● Funding sources 45 ● ● ● Microsoft Research Asia KAIST seed grant Ministry of Knowledge Economy Samsung Korea Research Foundation