Ocean Wave Simulation in Realtime Using GPU ChinChih

  • Slides: 33
Download presentation
Ocean Wave Simulation in Real-time Using GPU Chin-Chih Wang Jia-Xiang Wu National Taiwan University

Ocean Wave Simulation in Real-time Using GPU Chin-Chih Wang Jia-Xiang Wu National Taiwan University Chao-En Yen Pangfeng Liu Chuen-Liang Chen

2 Outline • Goal • Contribution • Overview-Ocean Wave Simulation • System Flow •

2 Outline • Goal • Contribution • Overview-Ocean Wave Simulation • System Flow • System Performance • Demo • Conclusion

3 Goal

3 Goal

4

4

5 Contribution

5 Contribution

6 Contribution • We implemented ocean simulation with choppy effect and sprays • We

6 Contribution • We implemented ocean simulation with choppy effect and sprays • We improved the simulation performance 30% by reducing redundant memory transferring.

7 Overview

7 Overview

8 Ocean Wave • Basic Ocean Wave ▫ Use inverse FFT to composite many

8 Ocean Wave • Basic Ocean Wave ▫ Use inverse FFT to composite many sine and cosine waves to form the height map of wave surface. • Choppy Wave ▫ To make the wave peak sharp and the wave bottom flat.

9 Ocean Wave • Choppy Wave

9 Ocean Wave • Choppy Wave

10 Overlapping • As the choppy effect increase, the wave will “overlapping” on the

10 Overlapping • As the choppy effect increase, the wave will “overlapping” on the simulation. ▫ Generate new particles on the overlapping wave surface.

11 Spray • Produce Spray(Breaking wave) -Particle System

11 Spray • Produce Spray(Breaking wave) -Particle System

12 System Flow

12 System Flow

13 System Flow Generate FFT-Ocean Surface Add Choppy Effect Detect Region of Overlapping Generate

13 System Flow Generate FFT-Ocean Surface Add Choppy Effect Detect Region of Overlapping Generate New Particles Update Particle Information Draw Sky Box Cube-map Shading

14 Generate FFT-Ocean Surface • Use Phillips Spectrum to generate initial wave height map

14 Generate FFT-Ocean Surface • Use Phillips Spectrum to generate initial wave height map in frequency domain ▫ ▫ ξr and ξi are 2 Gaussian random variables with mean 0 and standard deviation 1. ▫ Phillips Spectrum : a model for generating wind driven wave. • We can compute the height map at time t as follows: ▫

15 Generate FFT-Ocean Surface • In each frame, use a CUDA thread to for

15 Generate FFT-Ocean Surface • In each frame, use a CUDA thread to for each k in parallel compute • Use inverse-FFT library provided by CUDA to transform to ▫

16 Add Choppy effect • To obtain sharper wave instead of sine wave, we

16 Add Choppy effect • To obtain sharper wave instead of sine wave, we horizontally move to at time t ▫ λ is a coefficient to control choppy effect • The Horizontal movement function is similar to but shifts phase forward ▫

17 Detect Region of Overlapping • When λ increase, the choppy effect causes “overlapping”

17 Detect Region of Overlapping • When λ increase, the choppy effect causes “overlapping” on the simulation ▫ We detect overlapping position to generate spray • Detect overlapping by using the Jacobian of the transformation from to ▫ System use a CUDA thread to calculate Jacobian for each position x ▫ If Jacobian at x is singular, overlapping occurs in x

18 Detect Region of Overlapping ▫ ▫ is a function of the coordinate on

18 Detect Region of Overlapping ▫ ▫ is a function of the coordinate on the horizontal plane

19 Generate spray • We generate new particles on the overlapping wave surface to

19 Generate spray • We generate new particles on the overlapping wave surface to simulate sprays • and are the larger and smaller eigenvalues of Jacobean. • The eigenvector corresponding to ▫ The direction of spray velocity is

20 Update Particle Information • We implemented a particle system to maintain and update

20 Update Particle Information • We implemented a particle system to maintain and update spray information. • System use CUDA threads to update the velocity, position, and age information of all living particles ▫ Just consider the effect of gravity • When a particle is dead ▫ Stop updating particle information ▫ Make the particle invisible until System recycle it as a new spray particle

21 Draw Sky Box & Cube-map Shading • Draw six pictures on the sky

21 Draw Sky Box & Cube-map Shading • Draw six pictures on the sky box ▫ Surrounds the whole scene • Sample colors from the cube map ▫ As the sky reflection on the water surface • All the rendering work is done by Open. GL shader

22 System Performance

22 System Performance

23 Bottleneck • The data transferring between device memory and host memory occupies a

23 Bottleneck • The data transferring between device memory and host memory occupies a lot of execution time. Host memory Device memory

24 Bottleneck Case 1 • First case: ▫ System need transfer data from CUDA

24 Bottleneck Case 1 • First case: ▫ System need transfer data from CUDA memory to Open. GL shader memory through the host memory in our original simulation GPU CUDA memory Host memory Open. GL shader memory

25 Solution For Case 1 • CUDA provides a memory mapping mechanism ▫ It

25 Solution For Case 1 • CUDA provides a memory mapping mechanism ▫ It maps CUDA device memory to Open. GL shader memory ▫ Open. GL can directly access the data on GPU memory without transferring through the host GPU CUDA memory Host memory Open. GL shader memory

26 Bottleneck Case 2 • Second case: ▫ Generating new Particles on CUDA would

26 Bottleneck Case 2 • Second case: ▫ Generating new Particles on CUDA would have race condition particle buffer. ▫ To avoid CUDA thread race condition, we generate new particles serially in CPU. So we need transfer a lot of data from device memory to host memory to generate new particle. GPU Host memory CUDA memory avoid competition

27 Solution For Case 2 • We design a lock mechanism to control CUDA

27 Solution For Case 2 • We design a lock mechanism to control CUDA threads accessing the particle buffer without race condition. ▫ System use Atomic. Inc to protect the index of particle buffer so that CUDA threads can access the particle buffer safely. ▫ Atomic. Inc : A CUDA function that guarantees to access and modify the memory without interference from other threads. • Then System can generate new particles on GPU to save data transferring time

28 Experiment platform • Intel Duo core 2. 0 GHz • 4 GB host

28 Experiment platform • Intel Duo core 2. 0 GHz • 4 GB host memory • NVIDIA Geforce 9800 GT ▫ 112 CUDA cores ▫ 512 MB device memory

29 System Profile 30 Execution time 25 20 15 before optimization after optimization 10

29 System Profile 30 Execution time 25 20 15 before optimization after optimization 10 5 0 memory copy between host and device detect overlap fft ocean choppy wave particle system caculate slope

30 System Performance 70 60 50 FPS 40 before optimization 30 after optimization 20

30 System Performance 70 60 50 FPS 40 before optimization 30 after optimization 20 10 0 256*256 512*512 1024*1024 ocean size 1536*1536

31 Demo

31 Demo

32 Conclusion • We implemented ocean simulation with choppy effect and sprays • We

32 Conclusion • We implemented ocean simulation with choppy effect and sprays • We enhanced the simulation performance by reducing redundant memory transferring between host and device. • The performance achieved 32 fps with 512*512 ocean grid, and 131072 particles

33 Q&A

33 Q&A