5 1 PERFORMANCE Code performance and optimisation OPTIMISATIONOVERVIEW

  • Slides: 29
Download presentation
5. 1. PERFORMANCE Code performance and optimisation

5. 1. PERFORMANCE Code performance and optimisation

OPTIMISATIONOVERVIEW Overview of the optimisation process

OPTIMISATIONOVERVIEW Overview of the optimisation process

Optimisation Overview Optimisation is the process of developing your program so that it uses

Optimisation Overview Optimisation is the process of developing your program so that it uses the minimal amount of some resource. Resources can include CPU time, memory, hard-disk space, network traffic, etc. Optimisation approaches can focus on: • Design – ensuring the design employs optimal algorithms, minimises redundancy, etc. • Implementation – ensuring the code optimally maps onto CPU instructions, avoids inefficient data usage, etc. Important: This section focuses on code (implementation) optimisation

Optimisation Overview (Start and Stop) Design Optimisation - Go! In most applications, good performance

Optimisation Overview (Start and Stop) Design Optimisation - Go! In most applications, good performance comes from getting the architectural design right. Given the cost of refactoring do consider performance when developing your design. Code Optimisation – Stop! Code optimisation can be ‘expensive’ in terms of time, added complexity, etc. It is generally recommended not to optimise code until there is an observable need to do so. Quote attributed to C. A. R. Hoare: "We should forget about small efficiencies, say about 97% of the time: premature optimisation is the root of all evil”

Code Optimisation (the process) Whenever the UPS or FPS targets cannot be obtained) profile

Code Optimisation (the process) Whenever the UPS or FPS targets cannot be obtained) profile the game to determine which areas are consuming most CPU/GPU time. Based on the profiling and the design, decide which areas can be optimised and how they will be optimised. Implement optimisations, re-profile, and repeat until desired UPS/FPS targets obtained. If targets cannot be reached, then the wider game design and/or game feature set must be reconsidered. Profiler Suggestions: For Java use the Net. Beans Profiler. For XNA use NProf and PIX

Code Optimisation (JIT - a little bit of help) Both Java and C# use

Code Optimisation (JIT - a little bit of help) Both Java and C# use a JIT (just-in-time) compiler to compile each method from the intermediate language (IL) into native code at runtime. Most JIT compilers will optimise code, typically by: • Constant folding / copy propagation • Method in-lining • Extraction of loop invariants • Loop unrolling (for small loops) Aside: Unlike a traditional (e. g. C++) compiler, JIT compilation factors in run-time performance, e. g. the JIT compiler does not (normally) have the luxury of performing an exhaustive optimisation pass.

Optimising code for performance can be risky and normally has an associated cost. Warning!

Optimising code for performance can be risky and normally has an associated cost. Warning! (The cost of code optimisation) Examples of risks and costs include: • Breaking code that works • Limiting the reusability and/or extensibility of the code by introducing additional constraints • Making the code harder to understand maintain, thereby increasing the likelihood of future bugs Remember: Unless there is a clear and explicit performance problem, it is best to err on the side of producing ‘cleaner’ more simple code.

Warning! (Optimisation shelf-life) Development languages and execution environments change over time (compilers get smarter,

Warning! (Optimisation shelf-life) Development languages and execution environments change over time (compilers get smarter, garbage collection algorithms change, CPUs run faster, memory grows, etc. ) An optimisation technique solving a limitation of a previous development environment may not be applicable to current environments. Before accepting performance advice, ask firstly if the advice remains relevant to your current development environment.

GENERAL ADVICE : CREATING OBJECTS General advice applicable to both Java and C# (and

GENERAL ADVICE : CREATING OBJECTS General advice applicable to both Java and C# (and most other managed languages)

Creating objects… Both Java and C# are object oriented languages. Whenever an object is

Creating objects… Both Java and C# are object oriented languages. Whenever an object is created there is a creation overhead: • Memory is allocated for all the instance variables (including those from super-classes) • All instances variables are initialised to starting values. • The constructor (including those of any super-classes) are executed. Given this cost, it is good practice to avoid needless and excessive object creation.

Creating objects (Hints and tips) Use primitive data types rather than wrapper classes, e.

Creating objects (Hints and tips) Use primitive data types rather than wrapper classes, e. g. use an int instead of an Integer to store a Use lazy object creation / avoid number creating conditional objects unless conditions are matched. This also applies to instance variables if only some instances will use the instance variables. Use String literals instead of String objects where possible (as String literals are interned and reused) String str 1= "Hello"; // Literal String str 2 = new String("Hello”); // Object Integer val = 4+5; int val = 4+5; . . . string message = “. . . ” + name; if( print. Message ) Console. Write. Line( message ); . . . print. Message ) { if( string message = “. . . ” + name; Console. Write. Line( message ); }. . .

GENERAL ADVICE : GARBAGE COLLECTION General advice applicable to both Java and C# (and

GENERAL ADVICE : GARBAGE COLLECTION General advice applicable to both Java and C# (and most other managed languages)

Garbage collection (GC) is a form of automatic memory management. The GC automatically deletes

Garbage collection (GC) is a form of automatic memory management. The GC automatically deletes objects and reclaims memory from objects ‘discarded’ by the program. Garbage collection GC makes manual memory deallocation unnecessary, thereby freeing the programmer from having to release objects (which is an error prone and often onerous process) However, garbage collection can have performance implications that are difficult to manage.

Garbage collection (important factors) Three important factors influence the GC process: • Allocation rate:

Garbage collection (important factors) Three important factors influence the GC process: • Allocation rate: the rate at which new objects (including strings, arrays, etc. ) are created. • Retention: the amount of live heap data (i. e. the amount of allocated ‘stuff’), effecting workload for allocation and de-allocation. • Fragmentation: the number of unusable fragments (chunks of memory) between allocated objects, effecting space usage and search times. Most GCs try to avoid fragmentation (usually with an associated avoidance cost!) Aside: There are different types of collector, including: generational, mark-sweep, reference counting, incremental, etc. that influence the above three factors.

Garbage collection (important factors) For game developers the key questions to ask are: •

Garbage collection (important factors) For game developers the key questions to ask are: • How long does a GC take (in ms) • When and how often does a GC occur An acceptable balance between ‘how long’ and ‘how often’ is desirable, e. g. a GC once every several second that incurs a high cost can introduce a perceivable skip in the frame rate. On the other hand, lots of GCs / second will introduce a very high (and unnecessary

Garbage collection (Hints and tips) Select and fine-tune an appropriate GC algorithm (see the

Garbage collection (Hints and tips) Select and fine-tune an appropriate GC algorithm (see the linked reading material) Reduce number of allocated / de-allocated objects (reducing GC allocation costs and heap compacting frequency). Tactics include: • Use recyclable object pools (i. e. ‘released’ objects are stored and new object requests taken from this pool). • XNA: Use structs instead of classes. As structs are value types they are not stored on the GC maintained heap. Aside: The downside of object pools is that it involves writing and maintaining additional code, and it can introduce subtle errors by recycling an object from the pool that is still referred to and modified from another part of the program.

GENERAL ADVICE : METHODS General advice applicable to both Java and C# (and most

GENERAL ADVICE : METHODS General advice applicable to both Java and C# (and most other managed languages)

Methods and Method calls Consider the shown code. Where is the potential performance bottleneck?

Methods and Method calls Consider the shown code. Where is the potential performance bottleneck? The length() method will be called for very iteration of the loop (incurring stack push/pop costs for method parameters and returns). For a large loop it can be a heavy additional expense. Where the result of the method call is invariant over the duration of the loop, it is good practice to extract the method call, and store the result before the loop, e. g. private long get. Acc. Length(String string. Obj) { long len = 0; for(int i = 0; i < string. Obj. length(); i++) len+=(i+1); private returnlong len; get. Acc. Length(String string. Obj) { long len = 0; int string. Size = string. Obj. length(); for(int i = 0; i < string. Size; i++) len+=(i+1); return len; Aside: Understanding you can also calculate this quantity n(n+1)/2, where n is the string length, results in optimal performance (well, maybe short of having a lookup table).

Methods and Method calls if (enemies. element. At(i). is. Alive()) Similar to the last

Methods and Method calls if (enemies. element. At(i). is. Alive()) Similar to the last suggestion, it is good to avoid excessive recalculation by computing the expression once and binding the result to a variable which is reused. . if (enemies. element. At(i). is. Boss()). . . Enemy enemy = enemies. element. At(i); if (enemy. is. Alive()). . . if (enemy. is. Boss()). . . Making chunky calls… A chunky call is a function call that performs several related tasks (e. g. initialising fields). A chatty call only does one thing (with several chatty calls needed to get things done). Favour chunky calls for processes that run numerous times per second.

GENERAL ADVICE : BOXING ANDUNBOXING General advice applicable to both Java and C# (and

GENERAL ADVICE : BOXING ANDUNBOXING General advice applicable to both Java and C# (and most other managed languages)

Boxing/Unboxing (Overview) Boxing is the creation of a reference wrapper for a value type

Boxing/Unboxing (Overview) Boxing is the creation of a reference wrapper for a value type (e. g. storing an int within an Integer). Unboxing is the conversion of the reference wrapper to the value type, e. g. : Boxing int i = 123; object o = (object)i; Unboxing o = 123; i = (int)o;

Boxing/Unboxing (The cost…) Boxing and unboxing are computationally expensive processes. A new object must

Boxing/Unboxing (The cost…) Boxing and unboxing are computationally expensive processes. A new object must be created every time a value type is boxed. This can be more than an order of magnitude slower than a simple assignment. Additionally, the casting process when unboxing takes longer than a simple assignment. Boxing/unboxing also creates objects that must be stored/discarded by the GC.

Boxing/Unboxing (Hints and tips) Avoiding using non-generic collections (which always box value types). Instead,

Boxing/Unboxing (Hints and tips) Avoiding using non-generic collections (which always box value types). Instead, use generic collections where a defined value type can be specified (removing the need to box/unbox). Array. List list = new Array. List(); list. Add(56); // This will cause boxing List<int> list = new List<int>(); list. Int. Add(56); // This does not cause boxing

GENERAL ADVICE : MULTI -THREADING General advice applicable to both Java and C# (and

GENERAL ADVICE : MULTI -THREADING General advice applicable to both Java and C# (and most other managed languages)

Multi-threading Sequential programs are collections of functions executed in a defined sequence. Information is

Multi-threading Sequential programs are collections of functions executed in a defined sequence. Information is passed between functions via parameters, return values and shared data. Parallel programs are collections of tasks that execute together with other tasks. Tasks communicate using messages. CPUs with multiple cores are now the norm (e. g. the XBox 360 has 3 cores (6 hardware threads)). Core counts will continue to increase entailing concurrent programming will be needed to get maximum performance. This has a major impact on program design and opens a wider range of issues (e. g. thread-safe data manipulation, lock performance, etc. ).

ASIDE : XNA ADVICE Advice applicable to C#

ASIDE : XNA ADVICE Advice applicable to C#

Passing by reference and by value It is faster to pass a memory large

Passing by reference and by value It is faster to pass a memory large value type (e. g. Matrix) by reference and not by value. A pointer to the struct location need only be passed. A new matrix value will be created in the method and populated using the source matrix. Matrix matrix; process. Matrix(matrix); void process. Matrix( Matrix matrix ) {. . . } Warning: As with any reference type, care must be taken when modifying values within the method. Matrix matrix; process. Matrix(ref matrix); void process. Matrix( ref Matrix matrix ) {. . . }

Sprite. Batch performance Sprite. Batch is optimised for batch drawing (it’s in the 1000

Sprite. Batch performance Sprite. Batch is optimised for batch drawing (it’s in the 1000 batches, one sprite in name!) You should each try to: • Draw lots of sprites inside a single Begin/End call Sprite. Sort. Mode Immediate Deferred Texture 34 ms One batch, 1000 sprites, all using the same texture 0. 6 ms 0. 7 ms 1. 8 ms One batch, 1000 sprites, alternating between two different textures 11. 5 ms 11. 6 ms 1. 9 ms • If possible, use Sprite. Sort. Mode. • . . . , if this is not possible, Immediate and draw in texture use Sprite. Sort. Mode order or use a sprite sheet, or. . Texture

Summary Today we explored: Basic principles behind optimisation Advice on how to avoid common

Summary Today we explored: Basic principles behind optimisation Advice on how to avoid common performance issues : o d o T be o t n o i t c e s d Rea ject o r P n i d e t e l comp port e R t n e m p o l e Dev you t a h w t u o b a Think ubmit for the hope to s -in Week 6 hand velop e d o t e u n i t n Co ode exploratory c