Data Management Techniques SungEui Yoon KAIST URL http

  • Slides: 45
Download presentation
Data Management Techniques Sung-Eui Yoon KAIST URL: http: //jupiter. kaist. ac. kr/~sungeui/

Data Management Techniques Sung-Eui Yoon KAIST URL: http: //jupiter. kaist. ac. kr/~sungeui/

Data Avalanche (or Data Explosions) There are too much data out data!!! www. cs.

Data Avalanche (or Data Explosions) There are too much data out data!!! www. cs. umd. edu/class/spring 2001/ cmsc 838 b/Project/Parija_Spacco/images/

Geometric Data Avalanche ● Massive geometric data ● Due to advances of modeling, simulation,

Geometric Data Avalanche ● Massive geometric data ● Due to advances of modeling, simulation, and data capture techniques ● Time-varying data (4 D data sets)

CAD Model: Double Eagle Oil Tanker 82 million triangles (4 gigabyte)

CAD Model: Double Eagle Oil Tanker 82 million triangles (4 gigabyte)

CAD Model: Boeing 777 Ray Tracing Boeing 777, 470 million triangles Excerpted from SIGGRAPH

CAD Model: Boeing 777 Ray Tracing Boeing 777, 470 million triangles Excerpted from SIGGRAPH course note on massive model rendering

Scanned Model: ST. Matthew Model 372 million triangles (10 GB) www. cyberware. com

Scanned Model: ST. Matthew Model 372 million triangles (10 GB) www. cyberware. com

Possible Solutions? ● Hardware improvement will address the data avalanche? ● Moore’s law: the

Possible Solutions? ● Hardware improvement will address the data avalanche? ● Moore’s law: the number of transistor is roughly double every 18 months

Current Architecture Trends Data access time becomes 1000 computational bottleneck! the major Accumulated growth

Current Architecture Trends Data access time becomes 1000 computational bottleneck! the major Accumulated growth rate during 1999~2009 (log scale) 192 X 56 X 100 4. 5 X 10 10 X 1. 3 X 1 Random Sequential RAM access disk. Disk access speed CPU GPU speed

Four Orthogonal Approaches ● Cache-coherent layouts ● Random-accessible compressed meshes ● Cache-oblivious ray reordering

Four Orthogonal Approaches ● Cache-coherent layouts ● Random-accessible compressed meshes ● Cache-oblivious ray reordering ● Hybrid parallel continuous collision detection

Overview ● Cache-coherent layouts ● Random-accessible compressed meshes ● Cache-oblivious ray reordering ● Hybrid

Overview ● Cache-coherent layouts ● Random-accessible compressed meshes ● Cache-oblivious ray reordering ● Hybrid parallel continuous collision detection

Cache-Coherent Layouts of Meshes ● One dimensional data layout of a mesh ● Reduce

Cache-Coherent Layouts of Meshes ● One dimensional data layout of a mesh ● Reduce the number of cache misses va vb vc vd va vb vd vc One dimensional layout ● Cache-aware or cache-oblivious layouts ● Minimize the number of cache misses for a specific or various cache parameters (e. g. , cache block size) [Yoon et al. SIG 05, VIS 06, Euro 06]

Block-based I/O Model [Aggarwal and Vitter 88] Fast memory Slow memory or cache CPU

Block-based I/O Model [Aggarwal and Vitter 88] Fast memory Slow memory or cache CPU or GPU Block transfer Access time: 10 -6 sec Disk 10 -4 sec 1 sec

Applications ● View-dependent meshes ● View-dependent rendering ● Triangle meshes ● Isocontour extractions ●

Applications ● View-dependent meshes ● View-dependent rendering ● Triangle meshes ● Isocontour extractions ● Hierarchies ● Ray tracing ● Collision detection

View-Dependent Rendering using LODs Improving GPU vertex cache Utilization Ge. Force 6800 (January 2005)

View-Dependent Rendering using LODs Improving GPU vertex cache Utilization Ge. Force 6800 (January 2005)

Applications ● View-dependent meshes ● View-dependent rendering ● Triangle meshes ● Isocontour extractions ●

Applications ● View-dependent meshes ● View-dependent rendering ● Triangle meshes ● Isocontour extractions ● Hierarchies ● Ray tracing ● Collision detection Achieve up to 20 X improvement on iso-contouring Puget sound, 134 M triangles Isocontour z(x, y) = 500 m

Applications ● View-dependent meshes ● View-dependent rendering ● Triangle meshes ● Isocontour extractions ●

Applications ● View-dependent meshes ● View-dependent rendering ● Triangle meshes ● Isocontour extractions ● Hierarchies ● Ray tracing ● Collision detection Achieve 30% ~ 300% performance improvement

Advantages ● General ● Works well for various applications ● Cache-oblivious ● Can have

Advantages ● General ● Works well for various applications ● Cache-oblivious ● Can have benefit for all levels of the memory hierarchy (e. g. CPU/GPU caches, memory, and disk) ● No modification of runtime applications ● Only layout computation Source codes are available as a library called Open. CCL

Overview ● Cache-coherent layouts ● Random-accessible compressed meshes ● Cache-oblivious ray reordering ● Hybrid

Overview ● Cache-coherent layouts ● Random-accessible compressed meshes ● Cache-oblivious ray reordering ● Hybrid parallel continuous collision detection

Random-Accessible Compressed Data ● Compression methods of meshes and hierarchies ● Reduce the memory

Random-Accessible Compressed Data ● Compression methods of meshes and hierarchies ● Reduce the memory requirements ● Supports random accesses on meshes and hierarchies ● Can be useful to many different applications [Kim et al. Tech. Report 09; Kim et al. , TVCG 09; Yoon and Lindstrom, VIS 07]

Hierarchical-Culling oriented Compact Meshes (HCCMeshes) ● Consists of two parts: ● i-HCCMeshes (in-core representation)

Hierarchical-Culling oriented Compact Meshes (HCCMeshes) ● Consists of two parts: ● i-HCCMeshes (in-core representation) ● o-HCCMeshes (out-of-core representation)

Data Access Framework Main memory Data pool Request User Data 21

Data Access Framework Main memory Data pool Request User Data 21

Data Access Framework - Out-of-Core Technique Main memory Cached data Request User Data Cluster

Data Access Framework - Out-of-Core Technique Main memory Cached data Request User Data Cluster c 0 Cluster c 1 Cluster c 2 Cluster c 3 Cluster c 4 Cluster c 5 … Cluster cn 22 External drive cluster ID Data pool cluster

HCCMeshes Support hierarchical random access! Main memory Cached data Request User Decomp. Data Compressed

HCCMeshes Support hierarchical random access! Main memory Cached data Request User Decomp. Data Compressed Data i-HCCMesh 23 Cluster c 0 Cluster c 1 Cluster c 2 Cluster c 3 Cluster c 4 Cluster c 5 Cluster c 6 Cluster c 7 Cluster c 8 Cluster c 9 Cluster c 10 Cluster c 11 Cluster c 12 Cluster c 13 … Cluster cm External drive cluster ID Data pool Decomp. compressed cluster o-HCCMesh

Main Benefits ● Use a lower memory space and working set size ● o-HCCMeshes

Main Benefits ● Use a lower memory space and working set size ● o-HCCMeshes have 20: 1 compression ratios ● i-HCCMeshes have 6: 1 compression ratios ● Improve runtime performance 24

Applications ● Whitted-style ray tracing ● LOD-based ray tracing ● Collision detection ● Photon

Applications ● Whitted-style ray tracing ● LOD-based ray tracing ● Collision detection ● Photon mapping ● Non-photorealistic rendering Source codes are available as Open. RACM 25

Results 26

Results 26

Overview ● Multi-resolution representations ● Random-accessible compressed meshes ● Cache-oblivious ray reordering ● Hybrid

Overview ● Multi-resolution representations ● Random-accessible compressed meshes ● Cache-oblivious ray reordering ● Hybrid parallel continuous collision detection 27

Challenges ● Secondary rays generated show low ray coherence ● Result in low cache

Challenges ● Secondary rays generated show low ray coherence ● Result in low cache utilizations ● In case of ray tracing massive models, expensive cache misses occur (e. g. L 1/L 2, main memory) 28 Landscape ( >1000 M ) St. Matthew ( 372 M )

Goal ● Design an efficient algorithm for converting incoherent secondary rays to coherent ●

Goal ● Design an efficient algorithm for converting incoherent secondary rays to coherent ● Achieve a high cache coherence of these rays ● The performance improvement of ray tracing 29

Ray Reordering Framework Hit points and material information Camera information Ray generation Ray reordering

Ray Reordering Framework Hit points and material information Camera information Ray generation Ray reordering Ray processing Caches L 1 Main memory Ray buffer [Moon et al. , under review] 30 Disk Scene information

Applications ● Path tracing ● Photon mapping 31

Applications ● Path tracing ● Photon mapping 31

Result – Path Tracing (Video) ● 104 M triangles ● (12. 8 GB) ●

Result – Path Tracing (Video) ● 104 M triangles ● (12. 8 GB) ● 512*512 resolution ● 100 path ● 8 area lights 32

Result – Photon Mapping ● 128 M triangles ● (15. 7 GB) ● Cache

Result – Photon Mapping ● 128 M triangles ● (15. 7 GB) ● Cache 19% of all the data ● 4 area lights ● 13 X speedup 33

Overview ● Multi-resolution representations ● Random-accessible compressed meshes ● Cache-oblivious ray reordering ● Hybrid

Overview ● Multi-resolution representations ● Random-accessible compressed meshes ● Cache-oblivious ray reordering ● Hybrid parallel continuous collision detection 34

Collision Detection ● Collision detection is used in various fields ● Game, movie, scientific

Collision Detection ● Collision detection is used in various fields ● Game, movie, scientific simulation and robotics <Figure from PIXAR> <Figure from C. Lauterbach > <Figure from AION > 35

Discrete VS Continuous Discrete collision detection (DCD) Time step (i-1) 36 Time step (i)

Discrete VS Continuous Discrete collision detection (DCD) Time step (i-1) 36 Time step (i)

Discrete VS Continuous collision detection(CCD) Time step (i-1) 37 Time step (i)

Discrete VS Continuous collision detection(CCD) Time step (i-1) 37 Time step (i)

Discrete VS Continuous Discrete collision detection (DCD) ? Time step (i-1) 38 Time step

Discrete VS Continuous Discrete collision detection (DCD) ? Time step (i-1) 38 Time step (i)

Discrete VS Continuous 39 Continuous CD Discrete CD Accuracy Accurate May miss collisions Computation

Discrete VS Continuous 39 Continuous CD Discrete CD Accuracy Accurate May miss collisions Computation time Expensive Very fast

Motivation ● Continuous collision detection ● Accurate, but slow for complex models ● Hardware

Motivation ● Continuous collision detection ● Accurate, but slow for complex models ● Hardware trend ● CPUs and GPUs are increasing the # of cores ● Heterogeneous architectures ● Intel Larabee architecture ● Previous approaches ● Utilize either multi-core CPUs or GPUs ● Not enough performance for interactive applications 40

Hybrid Parallel CCD [Kim et al. PG 09] ● Takes advantages of both: ●

Hybrid Parallel CCD [Kim et al. PG 09] ● Takes advantages of both: ● Multi-core CPU architectures ● GPU architectures ● Achieves interactive performance for various deforming models consisting of tens or hundreds of thousand triangles GPU … Multi-core CPU GPU 41 CCD

Results ● Performance of HPCCD utilizing both CPUs and GPUs Source codes are available

Results ● Performance of HPCCD utilizing both CPUs and GPUs Source codes are available as a library called Open. CCD 42

Results 43

Results 43

Conclusions ● Data explosion and lower growth rate of data access time ● Discussed

Conclusions ● Data explosion and lower growth rate of data access time ● Discussed three different techniques as a data management method ● ● Cache-coherent layouts Random-accessible compressed data Cache-oblivious ray reordering Hybrid continuous collision detection ● Applied to rendering and collision detection ● Observed meaningful performance improvement 44

Acknowledgements ● Research collaborators ● Tae. Joon Kim, Duk. Su Kim, Pio Claudio, Boo.

Acknowledgements ● Research collaborators ● Tae. Joon Kim, Duk. Su Kim, Pio Claudio, Boo. Chang Moon, Yong. Young Byun, Jae. Pil Heo, Seung. Yong Lee, Yong. Jin Kim, Jae. Hyuk Heo, John Kim, Peter Lindstrom, Valerio Pascucci, Dinesh Manocha ● Funding sources 45 ● ● ● Microsoft Research Asia KAIST seed grant Ministry of Knowledge Economy Samsung Korea Research Foundation