Multithreading Streaming Computer Games Engineering CO 4302 OVERVIEW

OVERVIEW ◉ We continue with topics that will help build the engine in the

MISSED OPPORTUNITIES ◉ Single-threaded programs under-utilise the CPU ◉ It is easy to focus

C++11 THREADING ◉ This kind of parallel loop is simple in C++11: void Loop.

C++11 THREADING ◉ std: : thread is declared in <thread> ◉ Constructor takes the

TOO SLOW… ◉ This approach is often recommended for C++11 threading. ◉ Clever templated

MANAGING WORKER THREADS WITH CONDITION VARIABLES // Note this code is illustrative – it’s

CONDITION VARIABLES ◉ std: : condition_variable is declared in <condition_variable> ◉ It has wait

HANDLING SPURIOUS WAKEUPS // Worker thread while (!have. Work[t]) { // 1. If work

C++11 MUTEXES are in <mutex> ◉ A mutex allows us to lock a section

C++11 MUTEXES ◉ std: : unique_lock works the same way as lock_guard except: ◉

const int Num. Workers = 10; std: : condition_variable work. Ready[Num. Workers ]; std:

DETAILS ◉ That is fully functional thread pool code ◉ Threads are not created

LINE SWEEP ALGORITHMS In the lab you use a variation of a line sweep

std: : async ◉ std: : async runs a function (or lambda) on a

std : : async ◉ You can also test if a result is ready

ADVANCED THREAD POOLS : std : : packaged_task ◉ std: : packaged_task is an

ASYNCHRONOUS I/O ◉ ◉ We often need to load game data while the game

LAB CASE STUDY – DYNAMICALLY LOADING A 2 D MAP Window moving through a

Slides: 23

Download presentation

Multi-threading Streaming Computer Games Engineering - CO 4302

OVERVIEW ◉ We continue with topics that will help build the engine in the labs ◉ Multi-threading is important in many areas of games development ◉ We introduce the new C++11 features that make threading easy(ish) ◉ We will use threads to increase performance of foreground tasks as perform background tasks (e. g. streaming in data) ◉ We also make a brief aside into an optimisation topic, line sweeps

Threading in C++11 The CPU is bored. 1

MISSED OPPORTUNITIES ◉ Single-threaded programs under-utilise the CPU ◉ It is easy to focus on GPU performance and forget the CPU ◉ But ever more dynamic game environments need the flexibility of the CPU (GPGPU programming is hard). ◉ Many CPU-side tasks are trivially parallel: ◉ Long loops, outputs independent, input data read only ◉ So there are few data synchronisation issues ◉ DOD often uses loops that fit this description ◉ Main Thread: loop 1 ->1000 ◉ Thread 1: loop 1 ->100, Thread 2: loop 101 ->200 etc.

C++11 THREADING ◉ This kind of parallel loop is simple in C++11: void Loop. Section(int s, int e) { for (int i = s; i < e; ++i) { Do. Thing(i); } } std: : threads[10]; for (int t = 0; t < 10; ++t) // Start 10 threads[t] = std: : thread(Loop. Section, t*100, (t+1)*100); for (int t = 0; t < 10; ++t) threads[t]. join(); // Wait for each thread to end

C++11 THREADING ◉ std: : thread is declared in <thread> ◉ Constructor takes the name of a function and the parameters to that function. The thread starts running that function. ◉ Main thread continues normally after the new thread starts. ◉ New thread ends when it exits from the function. ◉ The method thread. join waits for that thread to end. ◉ The related thread. detach disassociates that thread from the main thread. ◉ All running threads must be joined or detached before the main thread ends.

TOO SLOW… ◉ This approach is often recommended for C++11 threading. ◉ Clever templated parallel for loop replacements are possible ◉ However, it is ineffective for games. ◉ Creating threads is too slow with frame times in milliseconds ◉ Time spent creating threads will exceed any benefits. ◉ Only appropriate for tasks that will take seconds or more. ◉ Solution: create a collection of threads (thread pool ) at setup time. ◉ Each thread waits until woken up and handed some work to do. When the work is complete it waits again. ◉ Requires some inter-thread communication…

MANAGING WORKER THREADS WITH CONDITION VARIABLES // Note this code is illustrative – it’s not right yet const int Num. Workers = 10; std: : condition_variable work. Ready[Num. Workers]; // Main Thread for (int i = 0; i < Num. Workers; ++i) { Prepare. Work(i); work. Ready[i]. notify_one(); // Tell } // worker. . . for (int i = 0; i < num. Workers; ++i) { work. Ready[i]. wait(); // Wait for all } // work to finish // Thread t (already created) while (worker. Running) { work. Ready[t]. wait(); // Wait for work Do. Work(t) work. Ready[t]. notify_one(); // Tell main } // thread

CONDITION VARIABLES ◉ std: : condition_variable is declared in <condition_variable> ◉ It has wait and notify methods as shown last slide. ◉ Another variant is notify_all that allows several threads to receive the same signal ◉ However, the code on the last slide has problems: 1. While the main thread is waiting for one worker, another worker could finish and its signal would be missed 2. Condition variables can receive spurious wakeups ◉ A false signal, which should be ignored. These are allowed so the STL can be implemented more efficiently ◉ So as well as sending a signal we must maintain our own record that it was sent. That way we can avoid the above two issues.

HANDLING SPURIOUS WAKEUPS // Worker thread while (!have. Work[t]) { // 1. If work already complete don’t wait for a signal work. Ready[t]. wait(); // 2. Spurious signals ignored – have. Work won’t be set }; … // Main thread have. Work[i] = true; work. Ready[i]. notify_one(); ◉ Wake ups are verified against our own boolean variable ◉ Why not use a boolean only? Since sleeping in wait is better than continuously looping ◉ But this is still not correct - in this form the code would cause a race condition: ◉ If main thread code occurs after worker while condition, but before worker calls “wait”

C++11 MUTEXES are in <mutex> ◉ A mutex allows us to lock a section of code to one thread ◉ We don’t use std: : mutex directly, we use the lock types: ◉ std: : mutex, std: : lock_guard, std: : unique_lock std: : mutex; { // Thread 1 std: : lock_guard<std: : mutex>(mutex); // Locks mutex until out of scope bank. Balance -= 100; } // Note the curly brackets used only to define scope … { // Thread 2 std: : lock_guard<std: : mutex>(mutex); // Same mutex, code can’t run at same time if (bank. Balance > 50) Allow. Withdrawal(); }

C++11 MUTEXES ◉ std: : unique_lock works the same way as lock_guard except: ◉ It can be locked and unlocked at any time (lock_guard only unlocks on destruction). Doesn’t have to be locked at first. ◉ Can transfer ownership (moveable). ◉ etc. ◉ When using condition variables you must use unique_lock ◉ In fact the wait function requires you to pass an unique lock as a parameter, the lock must already be held. ◉ We now have enough to write a working thread pool:

const int Num. Workers = 10; std: : condition_variable work. Ready[Num. Workers ]; std: : mutex[Num. Workers]; // Main Thread for (int i = 0; i < Num. Workers; ++i) { Prepare. Work(i); { // Only use have. Work if other thread is not std: : unique_lock<std: : mutex> lock(mutex[i]) have. Work[i] = true; } work. Ready[i]. notify_one(); // Tell worker }. . . // Do something ? for (int i = 0; i < num. Workers; ++i) { std: : unique_lock<std: : mutex> lock(mutex[i]) while (have. Work[i]) {// Wait until work is done work. Ready[i]. wait(lock); } } // Thread t (already created) while (worker. Running) { { // Guard use of have. Work from other thread std: : unique_lock<std: : mutex> lock(mutex[t]) while (!have. Work[t]) { // Wait for some work. Ready[t]. wait(lock); }; } Do. Work(t) { std: : unique_lock<std: : mutex> lock(mutex[t]) have. Work[i] = false; } work. Ready[t]. notify_one(); // Tell main thread }

DETAILS ◉ That is fully functional thread pool code ◉ Threads are not created but sleep until work arrives – efficient enough for threading tasks in a single frame of a game. ◉ There is a alternative wait method that allows a function/lambda: work. Ready[i]. wait(lock, [&]() { return have. Work[i]; }); ◉ Replaces the whole while loop on the last slide ◉ Also note that wait disables the lock while it waits, and enables it again to test the condition (the predicate) ◉ Whenever we hold the lock, other threads are blocked from progressing past certain points – so disable it when possible. ◉ Note the main thread could do something while the workers are busy

An Aside Line Sweep Algorithms 2

LINE SWEEP ALGORITHMS In the lab you use a variation of a line sweep algorithm. This is a method of sorting data along one axis, then sweeping along that axis when performing a search or other algorithm Using the fact that neighbouring elements in the sweep are near each other in one axis, we can often optimise the algorithm. This example is creating a Voronoi diagram using a line sweep in O(n. log(n)). Voronoi diagrams can be used as a basis for fracturing geometry.

More C++11 Threading Features 3

std: : async ◉ std: : async runs a function (or lambda) on a new thread ◉ This function can return a result (unlike a thread) ◉ std: : async gives you a std: : future object that can be used to collect the result of the function int do_stuff(float a, int b) { // Runs on another thread. . . return result; } std: : future<int> future = std: : async(do_stuff, 2. 5 f, 10); // Do other things int result = future. get(); // Waits until result is ready

std : : async ◉ You can also test if a result is ready or not: if (future. wait_for(std: : chrono: : seconds(0)) == std: : future_status: : ready) ◉ wait_for is for setting timeouts on the result, but this is very useful ◉ async is easier to work with than threads: ◉ ◉ In particular, it makes communication of the result much simpler However, this is still has penalty of thread creation So not useful for threading game per-frame tasks But most convenient method for set-up tasks or long running tasks ◉ Side note: the main thread uses a std: : future to collect the result. The running thread uses a std: : promise to communicate the result. We don’t see std: : promise in typical usage, but mentioning it for completeness.

ADVANCED THREAD POOLS : std : : packaged_task ◉ std: : packaged_task is an object that encapsulates a function with arbitrary parameters and a return value ◉ A little like async, but it is not called straight away. It is an object that can be moved (not copied) around. ◉ Like async it returns results using std: : future making it easy to synchronise. ◉ This makes it an good choice for a generic work object to pass to worker threads. ◉ The details are quite complex, here is a good example: http: //roar 11. com/2016/01/a-platform-independent-thread-pool-using-c 14/

Threading Case Study Streaming 4

ASYNCHRONOUS I/O ◉ ◉ We often need to load game data while the game is running Normal file I/O will block and stall the game C++ does not have a standard asynchronous I/O file API We have a couple of options: 1. Platform specific APIs ◉ E. g. Windows: Create. File and Read. File with FILE_FLAG_OVERLAPPED flag set ◉ E. g. PS 4: API called fios 2. Write ordinary synchronous I/O code and run it in a separate thread ◉ Platform-specific APIs may be more efficient with the hardware ◉ Our own threaded I/O can be more tightly integrated with the game

LAB CASE STUDY – DYNAMICALLY LOADING A 2 D MAP Window moving through a large grid-based map. Blue squares loaded, yellow squares not loaded. Window approaches edge of loaded area, new squares must be loaded. Use async i/o to load green squares. Reuse memory occupied by squares with X’s, which are discarded. Loading must complete before screen reaches edge of loaded area. Must make blue grid large enough that there is time to load new sections. Also when the window reverses direction, do not want to immediately load the squares with X’s again. Increasing grid size will help with this too. Details in lab.