THQGas Powered Games Supreme Commander and Supreme Commander

  • Slides: 23
Download presentation
THQ/Gas Powered Games Supreme Commander and Supreme Commander: Forged Alliance Thread for Performance

THQ/Gas Powered Games Supreme Commander and Supreme Commander: Forged Alliance Thread for Performance

Supreme Commander runs best on 4 cores - let’s see how! Threading in midproject

Supreme Commander runs best on 4 cores - let’s see how! Threading in midproject can be done! Decoupled threads give great performance Memory management extends the gains Lessons learned

Threading was a mid-stream change • Code was initially single-threaded – Game demanded more

Threading was a mid-stream change • Code was initially single-threaded – Game demanded more performance – Changed mid-project (6 -12 months into development) – Separate render/sim threads to run at different rates – Support multiple cores • Limited architecture choices due to existing code • Using Boost thread library – Portable, open-source thread library

Render split is essential to speed • Lots of “little” threads: sound, loading, etc.

Render split is essential to speed • Lots of “little” threads: sound, loading, etc. • Sim thread: All simulation • Render thread: Full speed, <=10 x per sim tick • Sync phase: Once frame is ready to render – Sync render and sim – Fully queued in and out of sim – Fast

Decoupled architecture is built for speed Issue Ready to start a frame and a

Decoupled architecture is built for speed Issue Ready to start a frame and a simulation tick

Decoupled architecture is built for speed Issue Sim Thread Interface Simulation Render Run decoupled

Decoupled architecture is built for speed Issue Sim Thread Interface Simulation Render Run decoupled sim and render Fully buffered input to sim, call via Sim Thread Interface

Decoupled architecture is built for speed Issue Render can run repeatedly Depends on sim

Decoupled architecture is built for speed Issue Render can run repeatedly Depends on sim duration Simulation Render … Render Up to 10 x per sim tick

Decoupled architecture is built for speed Issue Fully decoupled? No. A few low level

Decoupled architecture is built for speed Issue Fully decoupled? No. A few low level systems have locks. No major performance impact! Simulation Locks Render … Render Up to 10 x per sim tick

Decoupled architecture is built for speed Issue Simulation Sync sim thread out to render

Decoupled architecture is built for speed Issue Simulation Sync sim thread out to render thread, via STI again Sim Thread Interface Render … Render Up to 10 x per sim tick Render

Decoupled architecture is built for speed Issue Multiplayer: Record everything going through STI Send

Decoupled architecture is built for speed Issue Multiplayer: Record everything going through STI Send over network Sim Thread Issue Interface Simulation Sim Thread Interface Render … Render Up to 10 x per sim tick Render

Decoupled architecture is built for speed Issue Simulation And so on… Render Up to

Decoupled architecture is built for speed Issue Simulation And so on… Render Up to 10 x per sim tick Sim Render … Re

Thread model adapts to varying loads • Architecture scales well with loads –Render load

Thread model adapts to varying loads • Architecture scales well with loads –Render load will often dominate –Re-render to keep frame rates up –Sim-heavy map will try to be simdominated

Displaying frame times – cool! Thread stats in real time

Displaying frame times – cool! Thread stats in real time

Sometimes, there’s more to render Runs as fast as possible Simulation Sim/render sync Both

Sometimes, there’s more to render Runs as fast as possible Simulation Sim/render sync Both threads synced, fully queued in and out of sim

Other times, there’s more to simulate Sim runs across many rendered frames

Other times, there’s more to simulate Sim runs across many rendered frames

A little sync doesn’t slow this code down Threads are busy most of the

A little sync doesn’t slow this code down Threads are busy most of the time! Frame n+1 Sync Waiting Busy Mostly waiting

Memory manager gives an additional boost • Memory: If you’re not careful in a

Memory manager gives an additional boost • Memory: If you’re not careful in a threaded game… – Memory use can thrash cache – but not a problem here! – Memory alloc/free can be slow • Suspected memory management was problem – Doing lots of small allocations – Built code to make it easy to switch mem managers • Custom mem manager outperforms default malloc/free – Can cause some debugging questions – Purchased commercial one for Supreme Commander – Wrote new one for Forged Alliance

What are some current bottlenecks? • Multiplayer: all sims run concurrently –Limited by least-common-denominator

What are some current bottlenecks? • Multiplayer: all sims run concurrently –Limited by least-common-denominator machine –That’s the RTS way • Monolithic render thread –Multiple monitors, typically different views –Possibly split off top part of render for second monitor? –Too expensive/complex for niche feature

This was a great learning experience! • Good intermediate step – Especially for threading

This was a great learning experience! • Good intermediate step – Especially for threading mid-project • Would do it differently if doing it from scratch – Target more processor cores – General worker threads w/dispatch system – Templates to define an interface to common semantics – Directed work graph/node graph (hard to express) – Or …? • The engine is so good, it’ll be back in Demigod! – Demigod team using modified Supreme Commander engine

We learned some DOs and DON’Ts • Do: –Architect for threading from the start,

We learned some DOs and DON’Ts • Do: –Architect for threading from the start, if you can –Thread single-threaded code, if you must –Decouple threads where possible • Don’t: –Be afraid to thread single-threaded code 20

Supreme Commander runs best on 4 cores – that’s how! Threading in midproject can

Supreme Commander runs best on 4 cores – that’s how! Threading in midproject can be done! Decoupled threads give great performance Memory management extends the gains Lessons learned

So, what do you think? • Have you tried something like this? –Successes? –Failures?

So, what do you think? • Have you tried something like this? –Successes? –Failures? • Have you rejected trying something like this? –Why?