Platinum Learn Enjoy Gold Put your phone on

Performance Tuning MDX Queries Chris Webb Crossjoin Consulting Limited / Solid Quality Mentors chris@crossjoin.

Agenda • • Performance tuning methodology Tuning the cube Tuning MDX in queries and

Performance Tuning Methodology • Before you start your work, ensure that: – Your test

Performance Tuning Methodology • Capture sample queries using Profiler • Look at the Query

Performance Tuning Methodology • Clear the cache using an XMLA Clear. Cache command before

Hardware • Throwing hardware at the problem is rarely the answer, unless: – You

Attribute Relationships • Make sure attribute relationships in your dimension are set up to

Tuning the Storage Engine • The percentage of time spent in the Storage Engine

Partitioning • You should partition your measure groups to reflect the slices that users

Partitioning • You should be partitioning so that a single partition contains between 2

Aggregations • Building aggregations improves query performance by reducing the amount of calculation done

Aggregation Design Methodology 1. Make your dimension design as ‘clean’ as possible – delete

Aggregation Design Methodology 4. Don’t bother with Usage-Based Optimisation: it may overwrite existing useful

The Formula Engine • If Storage Engine requests make up a negligible percentage of

Using Named Sets • Use named sets instead of repeatedly evaluating the same set

Non Empty Filtering • Removing empty values before a set operation, such as a

Non_Empty_Behavior • Setting the Non_Empty_Behavior property on calculated members can provide a massive performance

Non_Empty_Behavior • Non_Empty_Behavior has improves query performance in two ways: – When filtering out

Non_Empty_Behavior • In general, the rule for setting it is if you can say:

Appropriate Calculation Scope • Use scoped assignments to determine when and where calculations execute,

Aggregation-Aware Calculations • If you know where you have built your aggregations, it can

MDX to Avoid • ORDER – there’s a bug which makes it perform badly!

Warming the Cache • Analysis Services can cache Storage Engine data in most circumstances

How to Warm the Cache • The Create Cache MDX statement (available in AS

Resources • “Analysis Services 2005 Performance Guide” white paper • “Identifying and Resolving MDX

Platinum www. SQLBits. com Gold [Conference Web site] www. sqlbits. com www. SQLBlog. Casts.

Slides: 29

Download presentation

Platinum Learn & Enjoy Gold [Put your phone on Vibrate!] www. sqlbits. com Group BY: [Remember get your Tickets for Entry, Coach, Drink] Silver Feedback Forms: [Fill these out at the end of each session please] Lunch Time Sessions: [Once you eat, learn some more; Grok Talks in Chicago 1 and 2 Quest are in Menphis, Idera are in Everest] Ask The Experts [Sessions need to finish on time, take questions to the ATE area]

Performance Tuning MDX Queries Chris Webb Crossjoin Consulting Limited / Solid Quality Mentors chris@crossjoin. co. uk

Agenda • • Performance tuning methodology Tuning the cube Tuning MDX in queries and calculations Cheating: warming the cache

Performance Tuning Methodology • Before you start your work, ensure that: – Your test environment is comparable with your production environment – Nothing else is happening on the server – No-one will connect to the server and try to run queries – You know what exactly you are trying to tune!

Performance Tuning Methodology • Capture sample queries using Profiler • Look at the Query End events for the MDX used • Look at the Duration for how long the query took to run in ms • Note that for some clients (eg Excel before 2007) queries may be dependent on named sets declared earlier in the session

Performance Tuning Methodology • Clear the cache using an XMLA Clear. Cache command before running your query for the first time • Then run the query again immediately afterwards, so you have timings on a cold and warm cache • Disk caching may also have a noticeable impact if you are querying a newly-processed cube

Hardware • Throwing hardware at the problem is rarely the answer, unless: – You have problems with performance degredation with large numbers of concurrent users – You have a lot of data • Extra memory will mean you have more room for caching query results • Disk IO should be as fast as you can afford • Extra CPUs can help when you have queries that cross multiple partitions, or many concurrent users

Attribute Relationships • Make sure attribute relationships in your dimension are set up to reflect all 1: M relationships in your data • Failure to do so will mean that you will not get maximum benefit from aggregations and partitioning, and the engine itself will be less efficient

Query Execution

Tuning the Storage Engine • The percentage of time spent in the Storage Engine can be worked out by: – Summing the Duration of all Query Subcube events seen in Profiler – Dividing that by the Duration of the Query End event • If you see a Query Subcube event take more than 0. 5 seconds it should probably be tuned

Partitioning • You should partition your measure groups to reflect the slices that users are likely to use in their queries • Partitioning improves query performance in two ways: – It reduces overall IO because AS should only scan the partitions containing the data requested – It increases parallelism because AS can scan more than one partition simultaneously

Partitioning • You should be partitioning so that a single partition contains between 2 million rows and 15 million rows of data • Although AS should auto-detect the slice of a partition, it is good practice to set the Slice property manually, even on MOLAP partitions • In Profiler, Progress Report Begin/End events show which partitions are being read

Aggregations • Building aggregations improves query performance by reducing the amount of calculation done at query time • Aggregations start to show an effect on regular Sum measures where partition size is greater than a few million rows • Can have an effect on much smaller partitions where there are complex calculations

Aggregation Design Methodology 1. Make your dimension design as ‘clean’ as possible – delete or disable unnecessary attributes 2. Set the Aggregation. Usage property appropriately on each Cube dimension 3. Run the Aggregation Design Wizard 1. First using the ‘I Click Stop’ option 2. Then, if this creates too many aggregations, using ‘Performance Gain’ of 30%

Aggregation Design Methodology 4. Don’t bother with Usage-Based Optimisation: it may overwrite existing useful aggregations 5. If necessary, design aggregations manually using the version of Aggregation Manager built-in to BIDS Helper 6. Check aggregations are being used by looking for the Get Data From Aggregation event in Profiler

The Formula Engine • If Storage Engine requests make up a negligible percentage of the time taken by the query, then the Formula Engine is the problem • Another giveaway sign is that when a query runs on a multi-processor server, only one CPU shows activity – The Formula Engine only uses one thread per query • MDX Script Performance Analyser can help to work out which, if any, calculation is the problem

Using Named Sets • Use named sets instead of repeatedly evaluating the same set expression • Named sets can be defined – On the cube and in the session, but since they are static (until AS 2008) they are of limited usefulness – In the WITH clause, which is useful only if you have control over the MDX you’re executing – Inline, which can be very powerful but also a bit buggy

Non Empty Filtering • Removing empty values before a set operation, such as a Sum(), is often important • This relies on the fact that in many cases, AS can remove empty values very quickly • Do not use Non. Empty. Crossjoin() any more, use the Non. Empty() function instead: it is more reliable, predictable, and just as fast

Non_Empty_Behavior • Setting the Non_Empty_Behavior property on calculated members can provide a massive performance increase • However, knowing how to set it correctly can be very difficult: “I feel that it is the best not to mention NON_EMPTY_BEHAVIOR at all - unless you can explain what it does - and (no offence) I don't think you will be able to do it. ”, Mosha

Non_Empty_Behavior • Non_Empty_Behavior has improves query performance in two ways: – When filtering out empty tuples from a set using NON EMPTY or Non. Empty() – When working out how to request the data a calculation needs from the Storage Engine, even when there is no filtering taking place • Increasingly, the Formula Engine can ‘guess’ what it should be set to though

Non_Empty_Behavior • In general, the rule for setting it is if you can say: <subcube> = iif(Is. Empty(<empty expression>) , null, <else expression>); Then you can say: Non_Empty_Behavior(<subcube>)= <empty expression>;

Appropriate Calculation Scope • Use scoped assignments to determine when and where calculations execute, rather than conditional logic – For example, a year-to-date calculation will never be relevant at the year level – Do not use IIF with Hierarchy. Currentmember. Level – scope on the level itself

Aggregation-Aware Calculations • If you know where you have built your aggregations, it can make sense to write your calculations appropriately • For example, if you have an aggregation at the year level, the calculation: Q 1 + Q 2 + Q 3 might be better expressed as: Year – Q 4

MDX to Avoid • ORDER – there’s a bug which makes it perform badly! Use TOPCOUNT instead. • LOOKUPCUBE – redesign your cube instead • CASE – it isn’t optimised, use IIF instead • Non-deterministic functions like USERNAME, STRTOX – they stop AS being able to cache calculated values • Late-binding functions like LINKMEMBER – they stop AS being able to generate efficient query plans

Warming the Cache • Analysis Services can cache Storage Engine data in most circumstances • It can also cache calculation results in many cases too • As a result, most queries execute much faster (if not immediately) on a warm cache • So why not run a sample of common queries every night after cube processing?

How to Warm the Cache • The Create Cache MDX statement (available in AS 2 K and AS 2005 post SP 2) is useful for loading the Storage Engine cache • Several tools can be used to automate running of queries: – Ascmd, available in the SQL Server samples – SSIS, using the Execute SQL task

Resources • “Analysis Services 2005 Performance Guide” white paper • “Identifying and Resolving MDX Query Performance Bottlenecks” white paper • Mosha’s blog: http: //sqlblog. com/blogs/mosha • My blog: http: //cwebbbi. spaces. live. com/

Questions?

Platinum www. SQLBits. com Gold [Conference Web site] www. sqlbits. com www. SQLBlog. Casts. com [Becoming the premier Blogging site for SQL professionals] Silver www. SQLServer. FAQ. com [UK SQL Server Community Website] UK SQL Bloggers cwebbbi. spaces. live. com sqlblogcasts. com/blogs/simons sqlblogcasts. com/blogs/tonyrogerson Remember : : Feedback Forms!!