I Shot You First Gameplay Networking in Halo

  • Slides: 101
Download presentation
I Shot You First! Gameplay Networking in Halo: Reach

I Shot You First! Gameplay Networking in Halo: Reach

Who am I? • David Aldridge, Lead Networking Engineer at Bungie • Spent three

Who am I? • David Aldridge, Lead Networking Engineer at Bungie • Spent three years working on Halo: Reach networking • I’ve been making games for a while

What is Halo: Reach? • [video]

What is Halo: Reach? • [video]

Talk Takeaways • A proven architecture for scalable gameplay networking • How to design

Talk Takeaways • A proven architecture for scalable gameplay networking • How to design solid networking for your game mechanics • How to measure and optimize your networking

What is this talk NOT about? • Halo’s Campaign or Firefight networking • Sockets/low

What is this talk NOT about? • Halo’s Campaign or Firefight networking • Sockets/low level networking • High level networking – Matchmaking – Rating & ranking systems – Creating and curating an online ecosystem

BUNGIE’S GAMEPLAY NETWORKING ARCHITECTURE

BUNGIE’S GAMEPLAY NETWORKING ARCHITECTURE

What is gameplay networking? • Communicating sufficient information to maintain a perceptually shared reality,

What is gameplay networking? • Communicating sufficient information to maintain a perceptually shared reality, while minimizing both bandwidth use and perceived violations of the integrity of the simulation (artifacts) • OR: Technology to help multiple players sustain the belief that they are playing a fun game together

Common simplifying approaches • 1. Lockstep (a. k. a. deterministic, input-passing) – Common for

Common simplifying approaches • 1. Lockstep (a. k. a. deterministic, input-passing) – Common for games with a strict split between input and simulation (e. g. RTS), so input latency issues can be bypassed – Also common for ports of classic games (avoids game alterations) • 2. Reliable transport protocols (TCP or homegrown) – Requires high bandwidth or simple networked state – TCP requires high latency tolerance • 3. Send all networked state as a single blob (atomically) – E. g. Quake 3 model – Works very well as long as the total networked state is not too large

Halo has to solve the hard problem Highly competitive multiplayer action game 16 players,

Halo has to solve the hard problem Highly competitive multiplayer action game 16 players, vehicles, hundreds of replicated objects No dedicated servers Game is expected to work regardless of connection quality • For N players, O(N 2) data needs to be networked • •

Bandwidth needed as a multiple of the 2 -player case We can’t network everything!

Bandwidth needed as a multiple of the 2 -player case We can’t network everything! 120 100 80 60 40 20 0 2 3 4 5 6 7 8 9 10 11 Number of players 12 13 14 15 16

TRIBES points the way • “The TRIBES Engine Networking Model”, Frohnmayer and Gift, GDC

TRIBES points the way • “The TRIBES Engine Networking Model”, Frohnmayer and Gift, GDC 1999 • A host/client model, resilient to cheating • Protocols for semi-reliable data delivery • Supports persistent state and transient events • Highly scalable to match available bandwidth

Three Key Terms

Three Key Terms

Term: Replication • The communication of state or events to a remote peer –

Term: Replication • The communication of state or events to a remote peer – “Replicating an object” means causing it to be created and updated on a remote peer – A “replicated object” is one whose state is kept approximately in sync between peers – Our replication systems are the Application Layer of our network stack

Term: Authority • Permission to update the persistent state of an object – E.

Term: Authority • Permission to update the persistent state of an object – E. g. in Reach, the game host peer is authoritative over dealing damage

Term: Prediction • Extrapolating the current properties of an entity based on historical authoritative

Term: Prediction • Extrapolating the current properties of an entity based on historical authoritative data and local guesses about the future – A predicted object is one which the local peer does not have full control over – this is the opposite of an authoritative object

Bungie’s Networking Stack Layer Purpose Game Runs the game Game Interface Extract and apply

Bungie’s Networking Stack Layer Purpose Game Runs the game Game Interface Extract and apply replicated data Prioritization Rate the priority of all possible replication options Replication Protocols with various reliability guarantees Channel Manager Flow and congestion control Transport Send & receive on sockets

Let’s talk about gameplay Layer Purpose Game Runs the game Game Interface Extract and

Let’s talk about gameplay Layer Purpose Game Runs the game Game Interface Extract and apply replicated data Prioritization Rate the priority of all possible replication options Replication Protocols with various reliability guarantees Channel Manager Flow and congestion control Transport Send & receive on sockets

Replication Protocol: State Data • Guaranteed eventual delivery of most current state, host→client only

Replication Protocol: State Data • Guaranteed eventual delivery of most current state, host→client only – – Object position Object health Territory capture timer ~150 more properties

Replication Protocol: Events • Unreliable notifications of transient occurrences, host→client and client→host – –

Replication Protocol: Events • Unreliable notifications of transient occurrences, host→client and client→host – – Please fire my weapon This weapon was fired Projectile detonated ~50 more events

Replication Protocol: Control data • High-frequency, best-effort transmission of rapidly-updated data extracted from player

Replication Protocol: Control data • High-frequency, best-effort transmission of rapidly-updated data extracted from player control inputs, host→client and client→host – Current analog stick values for all players (host->client) – Current position of client’s own biped (client->host) – ~15 more properties

Replication: The Big Picture Control Data “My biped is now at position x” Host

Replication: The Big Picture Control Data “My biped is now at position x” Host Events “I just fired my primary weapon” “I’d like to get into this warthog” Client

Replication: The Big Picture Control Data “This biped is now trying to strafe left”

Replication: The Big Picture Control Data “This biped is now trying to strafe left” State Data Host “This object is now in position X” “This warthog now has a broken windshield” “All these broken warthog chunks now exist” Events “This weapon just fired” “This warthog just took damage at this point” Client

Replication is never fully reliable • Unreliability enables aggressive prioritization, which lets us handle

Replication is never fully reliable • Unreliability enables aggressive prioritization, which lets us handle the richness of our simulation • Flow control layer decides when to send a packet, and what size it should be • Replication writes data into the packet until full • There is always more data than will fit, so we write high-priority data first

Prioritization • • • Priority is based on client view and simulation state Priority

Prioritization • • • Priority is based on client view and simulation state Priority is calculated separately per-object per-client Distance/direction is the core metric Size & speed affect priority Shooting & damage apply appropriate boosts Lots of special cases (e. g. thrown grenades)

Prioritization example

Prioritization example

Prioritization example

Prioritization example

Prioritization example 0. 22/0. 97/127 0. 50/1. 00/0 Legend: Final priority / relevance /

Prioritization example 0. 22/0. 97/127 0. 50/1. 00/0 Legend: Final priority / relevance / desired update period (ms)

Prioritization example Legend: Final priority / relevance / desired update period (ms) 0. 19/0.

Prioritization example Legend: Final priority / relevance / desired update period (ms) 0. 19/0. 73/339

DESIGNING FOR NETWORKING QUALITY

DESIGNING FOR NETWORKING QUALITY

Throwing a grenade • [video]

Throwing a grenade • [video]

Single-box grenade throw Controller Single peer simulation Player presses left trigger Grenade throw animation

Single-box grenade throw Controller Single peer simulation Player presses left trigger Grenade throw animation begins Throw animation delay Release frame is reached, grenade object is detached from hand, aimed, and launched

Client grenade throw – attempt #1 • Send grenade throw request to host •

Client grenade throw – attempt #1 • Send grenade throw request to host • Throw grenade locally when host confirms

Client grenade throw – attempt #1 Client Button press Here’s the lag! Host I’d

Client grenade throw – attempt #1 Client Button press Here’s the lag! Host I’d like to t hrow a gre nade Start your throw animation Throw animation starts Throw animation delay Release frame is reached Create grenade Throw animation delay object One-way latency, client to host Grenade throw animation begins Release frame is reached, throw grenade

Client grenade throw – attempt #2 • Throw a grenade locally. • Ask host

Client grenade throw – attempt #2 • Throw a grenade locally. • Ask host to also throw a grenade.

Client grenade throw – attempt #2 Client Button press, grenade throw animation begins Release

Client grenade throw – attempt #2 Client Button press, grenade throw animation begins Release frame is reached, throw grenade Where is the lag? There isn’t any! Host I’ve begun Throw animation delay a grenade throw Throw animation delay Grenade throw animation begins Release frame is reached, throw grenade

Client grenade throw - actual • Predict throw animation • But do not predict

Client grenade throw - actual • Predict throw animation • But do not predict grenade release – wait for host • Grenades in flight are always real, and the host is authoritative over them • Where is the lag?

Client grenade throw - actual Client Button press, grenade throw animation begins Release frame

Client grenade throw - actual Client Button press, grenade throw animation begins Release frame is reached, delete grenade, aim throw Here’s the lag! Grenade appears Host I’ve begun Throw animation delay a grenade Please crea Throw animation delay te a grenad Create grenade throw e aimed at X object, pos/vel Grenade throw animation begins Release frame is reached, delete grenade Create grenade aimed at X, grenade appears

Results! • [video]

Results! • [video]

TRICKIER GAMEPLAY EXAMPLES

TRICKIER GAMEPLAY EXAMPLES

Armor Lock • [video]

Armor Lock • [video]

Armor Lock as a sequence diagram Controller Single peer simulation Player presses equipment button

Armor Lock as a sequence diagram Controller Single peer simulation Player presses equipment button Intro animation begins 3 frames Intro completes, invulnerability begins Player releases equipment button Invulnerability ends

Armor Lock networking, v 1 • All animations & FX predicted by clients •

Armor Lock networking, v 1 • All animations & FX predicted by clients • This feels very responsive, no visible lag • But where is the lag?

V 1 sequence diagram Client Button press, intro animation begins Intro animation completes, player

V 1 sequence diagram Client Button press, intro animation begins Intro animation completes, player appears invulnerable WTF I was armor locked! Host I’ve activat 3 frame delay ed my arm or lock Intro animation begins 3 frame delay nd e just blew up, a Hey, this grenad e you took damag Where is the lag? Grenade explodes Intro animation completes, player is invulnerable

Armor Lock, v 2 • Animation controlled by client… • …but wait for host

Armor Lock, v 2 • Animation controlled by client… • …but wait for host to tell you to show yourself as invincible • Where did we move the lag to?

V 2 sequence diagram Client Button press, intro animation begins Intro animation completes, no

V 2 sequence diagram Client Button press, intro animation begins Intro animation completes, no shield yet Here’s the lag! WTF, why does my armor lock not work properly? Host I’ve activat 3 frame delay ed my arm or lock Intro animation begins 3 frame delay ed, you’re da Grenade explod maged ble now, turn on You’re invulnera the shield fx Grenade explodes Intro animation completes, player is invulnerable

Armor Lock, v 3 – one last tweak Client Button press, intro animation begins

Armor Lock, v 3 – one last tweak Client Button press, intro animation begins Intro animation completes, no shield yet : -) Host I’ve activated 3 frame delay my armor lock (3 -RTT) frame delay now, turn on You’re invulnerable the shield fx Grena ne but you’re fi , d e d lo p x e de Intro animation begins Invulnerability begins Intro animation ends Grenade explodes

What just happened? • Did we just cheat lag? Where did it go?

What just happened? • Did we just cheat lag? Where did it go?

Armor Lock, v 3 Client Button press, intro animation begins Intro animation completes, no

Armor Lock, v 3 Client Button press, intro animation begins Intro animation completes, no shield yet : -) Host I’ve activated 3 frame delay my armor lock (3 -RTT) frame delay now, turn on You’re invulnerable the shield fx Grena ne but you’re fi , d e d lo p x e de Intro animation begins Invulnerability begins Intro animation ends Grenade explodes

Results! • [video]

Results! • [video]

Example #3: Assassinations • [video]

Example #3: Assassinations • [video]

Assassinations • 2 bipeds are happily running along • Suddenly, we need to force

Assassinations • 2 bipeds are happily running along • Suddenly, we need to force them to perform a joint, synchronized animation

Assassinations, v 1 • Local prediction of participant positions & orientations • Worked great

Assassinations, v 1 • Local prediction of participant positions & orientations • Worked great in in-house playtests & take-homes • Failed in the wilds of the public beta

Assassinations, v 1 - issues • [videos]

Assassinations, v 1 - issues • [videos]

Assassinations, v 1 - issues • Animation didn’t always fit in the predicted positions

Assassinations, v 1 - issues • Animation didn’t always fit in the predicted positions on client machines • On completion, must resolve discrepancies for survivors

Assassinations, v 2 - shipping • All peers (including participants) obey host strictly •

Assassinations, v 2 - shipping • All peers (including participants) obey host strictly • No discrepancies on exit! • Visual-only object state is interpolated on the way in to the animation

Results! • [video]

Results! • [video]

4 rules of gameplay networking 1. Which parts of your gameplay need to be

4 rules of gameplay networking 1. Which parts of your gameplay need to be adjudicated by a single authority? 2. Always ask: Where am I hiding the lag? 3. Don’t be afraid to change game mechanics to improve networking 4. Reserve time to iterate

MEASURING AND OPTIMIZING

MEASURING AND OPTIMIZING

Networking is a magnet for entropy • Invisible system with ever-growing complexity • Optimizations

Networking is a magnet for entropy • Invisible system with ever-growing complexity • Optimizations obscure original intent of systems • May appear to work, but have lots of soft failures and inefficiencies • Halo 3 games with 16 players were often laggy • Let’s optimize!

Optimization is dangerous • Easy to find an “obvious” architectural optimization, gain 1% efficiency,

Optimization is dangerous • Easy to find an “obvious” architectural optimization, gain 1% efficiency, and introduce a week’s worth of bugs • Just like CPU, don’t optimize without good data! “The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet. ” - Michael A. Jackson

Inspection tools are the key! • Deep inspection and analysis tools will help you

Inspection tools are the key! • Deep inspection and analysis tools will help you identify the best optimizations • Think about the kind of tools you use for CPU performance optimization

Tool: Profilers • We built profilers to track bandwidth use and priority calculation results

Tool: Profilers • We built profilers to track bandwidth use and priority calculation results

Profiler demo • [video]

Profiler demo • [video]

Tool: Films • Deterministic playback of gameplay sessions • Extraordinarily useful for debugging gameplay…

Tool: Films • Deterministic playback of gameplay sessions • Extraordinarily useful for debugging gameplay… • …but have never been very useful for network debugging – Network systems are idle during film playback

Leveraging Films • Splice the network profiler data into the films • For the

Leveraging Films • Splice the network profiler data into the films • For the first time, we could analyze network performance after the fact +

Tool: Playtests • Network perf playtests, once a month during production • Simulate adverse

Tool: Playtests • Network perf playtests, once a month during production • Simulate adverse network conditions with traffic shaping tools

Tool: Playtests • How can we measure success in these playtests? • Allow players

Tool: Playtests • How can we measure success in these playtests? • Allow players to report lag with a controller button! – Afterwards, investigate perceived lag events • Will also find confusing game mechanics!

Culmination! • [video]

Culmination! • [video]

Inspection of Halo 3 revealed… • • • 50% positions/velocities/orientations 20% player control data

Inspection of Halo 3 revealed… • • • 50% positions/velocities/orientations 20% player control data 20% weapon firing, bullets, damage 10% other Woohoo, let’s optimize the heavy hitters!

This was a false start • Hard to further optimize the encoding of positions,

This was a false start • Hard to further optimize the encoding of positions, velocities, and orientations • Like seeing your math functions in your CPU profiles • Need to optimize at a higher level

GOOD OPTIMIZATIONS IN REACH

GOOD OPTIMIZATIONS IN REACH

Reducing always-on bandwidth use • Host->client control replication accounted for 22% of all host

Reducing always-on bandwidth use • Host->client control replication accounted for 22% of all host upstream on Halo 3 – Removed data that was duplicated in object state data – Removed data that clients didn’t need to know – Optimized some encoding (details in slide notes) • Reduced bandwidth use by 60% (14% overall)

Fixing a prioritization bug • Problem: Idle grenades rolling around on the ground had

Fixing a prioritization bug • Problem: Idle grenades rolling around on the ground had incredibly high network priority • The cause was traced back… to a bugfix at the end of Halo 3! • “Equipment” was given a huge priority boost • Fix: only apply priority boost to active equipment

Changing game mechanics • Halo 3 used a constant artificial friction on items •

Changing game mechanics • Halo 3 used a constant artificial friction on items • Problem: Very slow descent on hills • Optimization: Fake friction!

Ragdoll networking • Ragdolls are difficult and costly to network well • Hey, why

Ragdoll networking • Ragdolls are difficult and costly to network well • Hey, why do we have to network ragdolls?

Shock

Shock

Skepticism

Skepticism

Consideration

Consideration

Ragdoll networking • Ragdolls are difficult and costly to network well • Hey, why

Ragdoll networking • Ragdolls are difficult and costly to network well • Hey, why do we have to network ragdolls? • 2 challenges – Ragdolls block bullets – Humping • 2 fixes – Allow bullets and grenades to penetrate ragdolls freely – Sync initial state of ragdoll

Smoothing out bursts of bandwidth • Problems with high ROF weapons: bullets were networked

Smoothing out bursts of bandwidth • Problems with high ROF weapons: bullets were networked optimally, but not the damage they caused! – Fix: Allow client prediction of some damage effects • Periodic update of game statistics data taking priority over gameplay traffic (on a protocol below replication) – Fix: Limit statistics data to <= 10% of each packet • Low-priority objects getting updates in perfect sync – Fix: Limit objects that can take “panic” priority to N per packet

3 rules of network optimization 1. Measure twice, cut once - use tools to

3 rules of network optimization 1. Measure twice, cut once - use tools to guide your optimizations 2. Don’t focus on encoding & compression – look at the big picture 3. Make friends with your game mechanics designers and coders

TIDBITS AND THE FUTURE

TIDBITS AND THE FUTURE

Numbers from Reach 250 kbits/s Minimum total upstream for the host of a solid

Numbers from Reach 250 kbits/s Minimum total upstream for the host of a solid 16 player game 675 kbits/s Maximum total upstream bandwidth use from a single peer 45 kbits/s Maximum bandwidth sent to one client from a host 1 kbit/s Host upstream required to replicate one biped to one client at combat quality 10 hz Minimum packet rate for solid gameplay 100 ms/200 ms Maximum latency for close-quarters gameplay for tournament/casual 133 ms/300 ms Maximum latency for ranged gameplay for tournament/casual

Related best practices • Flow & congestion control • Connection quality records & smart

Related best practices • Flow & congestion control • Connection quality records & smart host selection • Host migration - adding this late is hard • A multiplayer beta or demo • Regular internal playtests, with traffic shaping • Full-time network testers, early and late

More Resources • “Recreating The LAN Party Online”, Butcher & House, GDC 2005 •

More Resources • “Recreating The LAN Party Online”, Butcher & House, GDC 2005 • “The TRIBES Engine Networking Model”, Frohnmayer & Gift, GDC 1999 • Play Reach!

Acknowledgements • Many people toiled to make Halo: Reach play as well as it

Acknowledgements • Many people toiled to make Halo: Reach play as well as it does online, especially these guys

Kings Among Men Nick Gerrone Lead Network Tester Paul Lewellen Network Engineer

Kings Among Men Nick Gerrone Lead Network Tester Paul Lewellen Network Engineer

Additional Kings Jon Cable Sandbox Engineer Luke Timmins Lead of Networking and UI

Additional Kings Jon Cable Sandbox Engineer Luke Timmins Lead of Networking and UI

What’s next for Bungie? • Usability improvements to replication – Reducing boilerplate code •

What’s next for Bungie? • Usability improvements to replication – Reducing boilerplate code • Extension of replication protocols to support one-off, low-bandwidth, complex use cases – I just want to network a state machine, I don’t want to get a Ph. D in replication

What’s really next for Bungie?

What’s really next for Bungie?

Questions? daldridge@bungie. com www. bungie. net/careers we’re hiring!

Questions? daldridge@bungie. com www. bungie. net/careers we’re hiring!

The talk proper was already too long BONUS SLIDES

The talk proper was already too long BONUS SLIDES

Basics of encoding • For rare things, and by default: write raw bits •

Basics of encoding • For rare things, and by default: write raw bits • For common things: limit range as much as possible, write only necessary bits (bitstream) • For floats: quantize to fixed point • For positions and vectors: Do lots of work to compress these – limit domains, limit precision, think about temporal coherence, use google

Packet rate vs. size • • Maximize packet rate to minimize latency Maximize packet

Packet rate vs. size • • Maximize packet rate to minimize latency Maximize packet size to maximize throughput Goals in direct tension… Ideally, maximize packet rate by default, but lower it as needed when simulation becomes too rich

Problem: Networking new mechanics is hard with our replication systems • This is somewhat

Problem: Networking new mechanics is hard with our replication systems • This is somewhat intentional! • Ease of use is dangerous • Lots of safeguards ensure careful thought (but add implementation time) • We still get quick-and-dirty prototype networking that needs to be rewritten late, but we try to minimize the amount of it

Example of a bad optimization • “Let’s classify all our networked object indices into

Example of a bad optimization • “Let’s classify all our networked object indices into contiguous buckets by object type so we can use fewer bits to refer to an object if the type is known on both ends, which is common” • Saved 1% of bandwidth - awesome • Cost over 30 hours of debugging/support over the course of the project

What is “Lag”? • • • Perceived delay or inconsistency Caused by latency Caused

What is “Lag”? • • • Perceived delay or inconsistency Caused by latency Caused by bandwidth limitation Caused by packet loss Sometimes caused by game mechanics

Glitches • Glitch: Colloquially, a series of events that break or appear to break

Glitches • Glitch: Colloquially, a series of events that break or appear to break the rules or perceived rules of the game • There are 4 important classes of glitches – – Perceived as wrong / real break of real rule Perceived as wrong / real rule, but not a real break Not perceived as wrong / real break of a real rule Perceived breakage of a perceived rule

Melee “Glitches” • Conceptually melee is very simple • In practice it’s not; we

Melee “Glitches” • Conceptually melee is very simple • In practice it’s not; we had to make post-ship fixes to it in halo 2/3 • Example: In Reach public beta, client melee strikes were sometimes (rarely) ignored by the host

There isn’t any more THAT’S ALL THERE IS

There isn’t any more THAT’S ALL THERE IS