sqlbi com Manytomany Relationships in DAX When Many
sqlbi. com
Many-to-many Relationships in DAX When Many To Many Are Really Too Many Alberto Ferrari Senior Consultant SQLBI. COM sqlbi. com
Who’s Speaking? BI Expert and Consultant Problem Solving Complex Project Assistance Data. Warehouse Assesments and Development Courses, Trainings and Workshops SQLBI: Branch of a Microsoft BI Gold Partner Book Author, SSAS Maestro alberto. ferrari@sqlbi. com «Spaghetti English» Get prepared, I can’t help it
Agenda Many To Many Relationships Data Model DAX Formula pattern Classical Many To Many Cascading Many To Many Survey Data Model Basket Analysis
Prerequisites We will use, but not describe: Tabular basics DAX basics Evaluation contexts and interactions with relationships Data modeling with Many To Many After all… this is a 400 session
Current Accounts Model M 2 M
No support for M 2 M in Tabular Facts Multidimensional handles M 2 M relationships Tabular does not Thus, my fastest ever session ends here Or… we can dive into DAX and play with it
Demo – Classical M 2 M We will always start looking at the final result Then, we dive into the DAX code
The M 2 M DAX Pattern Leverages CALCULATE Row Contexts Filter Contexts Automatic transformation of Row Context into Filter Context using CALCULATE Next slide: the formula. Keep it in mind It will appear quite often from now on
The DAX Formula of M 2 M Pattern Amount. M 2 M : = CALCULATE ( SUM (Fact_Transaction[Amount]), FILTER ( Dim_Account, CALCULATE ( COUNTROWS (Bridge_Account. Customer) ) > 0 ) )
What the formula should perform Filter on Dim_Account Amount. M 2 M : = CALCULATE ( SUM (Fact_Transaction[Amount]), FILTER ( Dim_Account, CALCULATE ( COUNTROWS (Bridge_Account. Customer) ) > 0 ) ) Filter on Dim_Customer
How the formula works Amount. M 2 M : = CALCULATE ( SUM (Fact_Transaction[Amount]), FILTER ( Dim_Account, CALCULATE ( COUNTROWS (Bridge_Account. Customer) ) > 0 ) ) The filter is applied by the Tabular data model
How the formula works FILTER: For each row in Dim_Account CALCULATE : filters the bridge Amount. M 2 M : = CALCULATE ( SUM (Fact_Transaction[Amount]), FILTER ( Dim_Account, CALCULATE ( COUNTROWS (Bridge_Account. Customer) ) > 0 ) ) The filter is applied by the Tabular data model
How the formula works FILTER: For each row in Dim_Account CALCULATE : filters the bridge Only rows in Dim_Account where COUNTROWS () > 0 survive the FILTER Amount. M 2 M : = CALCULATE ( SUM (Fact_Transaction[Amount]), FILTER ( Dim_Account, CALCULATE ( COUNTROWS (Bridge_Account. Customer) ) > 0 ) ) The filter is applied by the Tabular data model
How the formula works FILTER: For each row in Dim_Account CALCULATE : filters the bridge Only rows in Dim_Account where COUNTROWS () > 0 survive the FILTER Amount. M 2 M : = CALCULATE ( SUM (Fact_Transaction[Amount]), FILTER ( Dim_Account, CALCULATE ( COUNTROWS (Bridge_Account. Customer) ) > 0 ) ) This filter is applied by the Tabular data model SUM is evaluated only for the accounts filtered from the customers
The Karma of CALCULATE Amount. M 2 M_Correct : = CALCULATE ( ) SUM (Fact_Transaction[Amount]), FILTER ( Dim_Account, CALCULATE ( COUNTROWS (Bridge_Account. Customer) ) > 0 ) Amount. M 2 M_Wrong : = CALCULATE ( SUM (Fact_Transaction[Amount]), FILTER ( Dim_Account, COUNTROWS (Bridge_Account. Customer) > 0 ) ) BI 123|many-to-many In DAX 16
Wrong formula in action Customer. Name Account Mark Paul Robert Luke Mark-Robert Mark-Paul Mark-Paul Amount. M 2 M : = CALCULATE ( SUM (Fact_Transaction[Amount]), FILTER ( Dim_Account, COUNTROWS (Bridge_Account. Customer) > 0 ) ) All the rows in the Account table survived the FILTER
Many-to-many in action Customer. Name Account Mark Paul Robert Luke Mark-Robert Mark-Paul Mark-Paul Amount. M 2 M : = CALCULATE ( SUM (Fact_Transaction[Amount]), FILTER ( Dim_Account, CALCULATE ( COUNTROWS (Bridge_Account. Customer) ) > 0 ) ) Two rows in the Account table survived the FILTER
The Karma of CALCULATE Amount. M 2 M_Correct : = CALCULATE ( ) SUM (Fact_Transaction[Amount]), FILTER ( Dim_Account, CALCULATE ( COUNTROWS (Bridge_Account. Customer) ) > 0 ) Amount. M 2 M_Wrong : = CALCULATE ( SUM (Fact_Transaction[Amount]), FILTER ( Dim_Account, COUNTROWS (Bridge_Account. Customer) > 0 ) ) BI 123|many-to-many In DAX 19
Many to Many - Conclusions Complexity Not in the data model Only in the formula Good understanding of CALCULATE and of relationships in Tabular needed This is just the beginning of our journey
Cascading Many To Many
Cascading Many To Many Filter on Dim_Account The pattern is the same, but this time we need to jump two steps to complete our task Filter on Dim_Category
Cascading M 2 M- Demo
Cascading Many To Many CALCULATE ( SUM (Fact_Transaction[Amount]), FILTER ( Dim_Account, CALCULATE ( COUNTROWS (Bridge_Account. Customer) ) > 0 ) ), FILTER ( Dim_Customer, CALCULATE ( COUNTROWS (Bridge_Customer. Category) ) > 0 ) )
Cascading Many To Many Generic Formula Works with any number of steps Be careful Always start with the farthest table And move one step at a time in the direction of the fact table One CALCULATE for each step Complexity: M x N (geometric…)
Cascading Alternative Need for some ETL here
Cascading Alternative The bridge now «feels» three different filters, but the formula becomes a classical many to many and runs faster
Flattened Cascading Many To Many Flattened Data Model Faster than the cascading one Simpler formula Needs ETL Worth a try in Multidimensional too…
Survey
Survey Data Model • • • Facts: One customer answers many questions One question is answered by many customers We want to use a Pivot. Table to query this
Survey Scenario Question: «What is the yearly income of consultants? » In other words… Take the customers who answered «Consultant» at the «Job» question Slice them using the answer to the question «Yearly Income»
Survey - Demo
Survey Analytical Data Model Question 1 = «Job» Answer 1 = «Consultant» Question 2 = «Yearly Income» Two «Filter» Dimensions One for «Job» = «Consultants» One for «Yearly Income»
Survey: The Final Result Value = Customer Count The fact table, this time, acts as the bridge
Survey Analytical Data Model No relationship between the fact table and the two filter dimensions Because here is only one value for ID_Answer
Survey Analytical Data Model • • The two filters Are applied on the Customers table Use separate instances of the Answers table
Survey: The DAX Formula IF ( COUNTROWS (VALUES (Filter 1[ID_Answer])) = 1 && COUNTROWS (VALUES (Filter 2[ID_Answer])) = 1, CALCULATE ( COUNTROWS (Customers), FILTER ( Customers, CALCULATE ( COUNTROWS (Answers), Answers[ID_Answer] = VALUES (Filter 2[ID_Answer])) > 0 ) Additional conditions to ), FILTER ( set the relationships with Customers, two tables in DAX only CALCULATE ( COUNTROWS (Answers), Answers[ID_Answer] = VALUES (Filter 1[ID_Answer])) > 0 ) ) )
Survey: The DAX Formula IF ( COUNTROWS (VALUES (Filter 1[ID_Answer])) = 1 && COUNTROWS (VALUES (Filter 2[ID_Answer])) = 1, CALCULATE ( COUNTROWS (Customers), FILTER ( Customers, CALCULATE ( COUNTROWS (Answers), Answers[ID_Answer] = VALUES (Filter 2[ID_Answer])) > 0 ) The two references to «Answers» ), FILTER ( work on separate instances of the Customers, same table CALCULATE ( COUNTROWS (Answers), Answers[ID_Answer] = VALUES (Filter 1[ID_Answer])) > 0 ) ) )
Survey - Conclusions Very powerful data model Very compact Handles any number of questions Duplicates only dimensions Different from the same pattern in Multidimensional We have shown two But it is not a limit Interesting topics Fact table as the bridge Relationships set in DAX Can be queried with a simple Pivot. Table
Basket Analysis
Basket Analysis: The Scenario Of all the customers who have bought a Mountain Bike, how many have never bought a mountain tire tube?
Basket Analysis in SQL Two iterations over the fact table needed SELECT COUNT (DISTINCT A. Customer. ID) FROM Fact. Sales A INNER JOIN Fact. Sales B ON A. Customer. ID = B. Customer. ID WHERE A. Product. Model = ‘MOUNTAIN TIRE TUBE' AND A. Year <= 2004 AND B. Product. Model = ‘MOUNTAIN-100' AND B. Year <= 2004
Look the query plan… This is the fact table… Do you really like to self-join it?
Basket Analysis: The Data Model Of all the customers who have bought a Mountain Bike, how many have never bought a mountain tire tube? We can filter «Mountain Tire Tube» with this table but… where do we filter «Mountain Bike» ?
Basket Analysis: The Data Model 2° filter: «Mountain Bike» No relationships 1° filter: «Mountain Tire Tube»
Basket Analysis- Demo
The Final Result 2° filter: «Mountain Tire Tube» Having. Product = Bought Both Not. Having. Product = Bought Bike, No Tire 1° filter: «Mountain Bike»
Having. Product : = CALCULATE ( COUNTROWS (DISTINCT (Fact. Sales[Customer. Key])), FILTER ( ALL (Dim. Time), Dim. Time[Time. Key] <= MAX (Dim. Time[Time. Key]) ), FILTER ( Dim. Customer, SUMX ( Product. Filter, CALCULATE ( COUNTROWS (Fact. Sales), ALL (Fact. Sales), Fact. Sales[Customer. Key] = EARLIER (Dim. Customer[Customer. Key]), Fact. Sales[Product. Key] = EARLIER (Product. Filter[Product. Key]), FILTER ( ALL (Dim. Time), Dim. Time[Time. Key] <= MAX (Dim. Time[Time. Key]) ) > 0 ) )
Having. Product : = CALCULATE ( COUNTROWS (DISTINCT (Fact. Sales[Customer. Key])), Count the number of customers… FILTER ( ALL (Dim. Time), Dim. Time[Time. Key] <= MAX (Dim. Time[Time. Key]) ), FILTER ( Dim. Customer, SUMX ( Product. Filter, CALCULATE ( COUNTROWS (Fact. Sales), ALL (Fact. Sales), Fact. Sales[Customer. Key] = EARLIER (Dim. Customer[Customer. Key]), Fact. Sales[Product. Key] = EARLIER (Product. Filter[Product. Key]), FILTER ( ALL (Dim. Time), Dim. Time[Time. Key] <= MAX (Dim. Time[Time. Key]) ) > 0 ) )
Having. Product : = CALCULATE ( COUNTROWS (DISTINCT (Fact. Sales[Customer. Key])), FILTER ( ALL (Dim. Time), Dim. Time[Time. Key] <= MAX (Dim. Time[Time. Key]) ), In the period of time before the end of the currently selected filter FILTER ( Dim. Customer, SUMX ( Product. Filter, CALCULATE ( COUNTROWS (Fact. Sales), ALL (Fact. Sales), Fact. Sales[Customer. Key] = EARLIER (Dim. Customer[Customer. Key]), Fact. Sales[Product. Key] = EARLIER (Product. Filter[Product. Key]), FILTER ( ALL (Dim. Time), Dim. Time[Time. Key] <= MAX (Dim. Time[Time. Key]) ) > 0 ) )
Having. Product : = CALCULATE ( COUNTROWS (DISTINCT (Fact. Sales[Customer. Key])), FILTER ( ALL (Dim. Time), Dim. Time[Time. Key] <= MAX (Dim. Time[Time. Key]) ), This filter needs a bit of attention… FILTER ( Dim. Customer, SUMX ( Product. Filter, CALCULATE ( COUNTROWS (Fact. Sales), ALL (Fact. Sales), Fact. Sales[Customer. Key] = EARLIER (Dim. Customer[Customer. Key]), Fact. Sales[Product. Key] = EARLIER (Product. Filter[Product. Key]), FILTER ( ALL (Dim. Time), Dim. Time[Time. Key] <= MAX (Dim. Time[Time. Key]) ) > 0 ) )
FILTER ( Dim. Customer, Only the customers SUMX ( Product. Filter, CALCULATE ( COUNTROWS (Fact. Sales), ALL (Fact. Sales), Fact. Sales[Customer. Key] = EARLIER (Dim. Customer[Customer. Key]), Fact. Sales[Product. Key] = EARLIER (Product. Filter[Product. Key]), FILTER ( ALL (Dim. Time), Dim. Time[Time. Key] <= MAX (Dim. Time[Time. Key]) ) > 0 ) )
FILTER ( Dim. Customer, Only the customers SUMX ( Product. Filter, Where, for each filtered product CALCULATE ( COUNTROWS (Fact. Sales), ALL (Fact. Sales), Fact. Sales[Customer. Key] = EARLIER (Dim. Customer[Customer. Key]), Fact. Sales[Product. Key] = EARLIER (Product. Filter[Product. Key]), FILTER ( ALL (Dim. Time), Dim. Time[Time. Key] <= MAX (Dim. Time[Time. Key]) ) > 0 ) )
FILTER ( Dim. Customer, Only the customers SUMX ( Product. Filter, Where, for each filtered product CALCULATE ( COUNTROWS (Fact. Sales), The number of rows in the fact table, for that product ALL (Fact. Sales), Fact. Sales[Customer. Key] = EARLIER (Dim. Customer[Customer. Key]), Fact. Sales[Product. Key] = EARLIER (Product. Filter[Product. Key]), FILTER ( ALL (Dim. Time), Dim. Time[Time. Key] <= MAX (Dim. Time[Time. Key]) ) > 0 ) )
FILTER ( Dim. Customer, Only the customers SUMX ( Product. Filter, Where, for each filtered product CALCULATE ( COUNTROWS (Fact. Sales), The number of rows in the fact table, for that product ALL (Fact. Sales), Fact. Sales[Customer. Key] = EARLIER (Dim. Customer[Customer. Key]), Fact. Sales[Product. Key] = EARLIER (Product. Filter[Product. Key]), FILTER ( ALL (Dim. Time), Dim. Time[Time. Key] <= MAX (Dim. Time[Time. Key]) ) > 0 ) ) Is greater than zero i. e. there exists at least one sale
FILTER ( Dim. Customer, Only the customers SUMX ( Product. Filter, Where, for each filtered product CALCULATE ( COUNTROWS (Fact. Sales), The number of rows in the fact table, for that product ALL (Fact. Sales), Fact. Sales[Customer. Key] = EARLIER (Dim. Customer[Customer. Key]), Fact. Sales[Product. Key] = EARLIER (Product. Filter[Product. Key]), These conditions are needed because there aren’t relationships between Filter and the fact table. Thus, we add them manually in DAX FILTER ( ALL (Dim. Time), Dim. Time[Time. Key] <= MAX (Dim. Time[Time. Key]) ) > 0 ) ) Is greater than zero i. e. there exists at least one sale
Ok, but «Not. Having. Product» ? Not. Having. Product : = CALCULATE ( COUNTROWS ( Instead of: FILTER ( Dim. Customer, COUNTROWS ( CALCULATE (COUNTROWS (Fact. Sales)) = 0 DISTINCT (Fact. Sales[Customer. Key]) ) ) ), FILTER ( ALL (Dim. Time), Dim. Time[Time. Key] <= MAX (Dim. Time[Time. Key]) ), FILTER ( Dim. Customer, SUMX ( Product. Filter, CALCULATE ( COUNTROWS (Fact. Sales), ALL (Fact. Sales), Fact. Sales[Customer. Key] = EARLIER (Dim. Customer[Customer. Key]), Fact. Sales[Product. Key] = EARLIER (Product. Filter[Product. Key]), FILTER ( ALL (Dim. Time), Dim. Time[Time. Key] <= MAX (Dim. Time[Time. Key]) ) > 0 ) )
Nice, but where are the M 2 M? Not. Having. Product : = CALCULATE ( COUNTROWS ( FILTER ( Dim. Customer, CALCULATE (COUNTROWS (Fact. Sales)) = 0 ) ), FILTER ( ALL (Dim. Time), Dim. Time[Time. Key] <= MAX (Dim. Time[Time. Key]) ), FILTER ( Dim. Customer, SUMX ( Product. Filter, CALCULATE ( COUNTROWS (Fact. Sales), ALL (Fact. Sales), Fact. Sales[Customer. Key] = EARLIER (Dim. Customer[Customer. Key]), Fact. Sales[Product. Key] = EARLIER (Product. Filter[Product. Key]), FILTER ( ALL (Dim. Time), Dim. Time[Time. Key] <= MAX (Dim. Time[Time. Key]) ) > 0 ) ) 1° M 2 M Pattern 2° M 2 M Pattern
Denali new features Multiple relationships between tables Active / Inactive relationships USERELATIONSHIP Role Dimensions Role Keys New filter function Selects a model relationship to use in calculations These additions make the data model much easier
Inactive Relationships in Denali Inactive Relationships
The Formula with Denali COUNTROWS ( FILTER ( Dim. Customer, CALCULATE ( COUNTROWS (Fact. Sales), USERELATIONSHIP ( Factsales[Product. Key], Product. Filter [Product. Key]) ) > 0 && CALCULATE ( COUNTROWS (Fact. Sales), USERELATIONSHIP ( Factsales[Product. Key], Dim. Product [Product. Key]) ) = 0 ) )
The formula with Denali It is not It is Much clearer Shows the real intentions of the author Key Point Shorter Faster (but it is too early to evaluate) It can be written by Excel Users It is straightforward, no complex and geeky logic In this example, Denali moves toward the users
Basket Analysis: Conclusions Modeled with the many to many pattern Fact table as a bridge one Relationships set in DAX Very Powerful Search for missing values «Easy» as searching for existing ones
No support for M 2 M in Tabular Facts Multidimensional handles M 2 M relationships. Tabular does not (apparently) DAX is the key to handle them Many interesting models can be created leveraging DAX and the many to many patterns
Conclusions Many To Many Relationships Very Powerful Can be handled in Tabular Even if not very easily Cascading Many To Many Survey Data Model Basket Analysis Are they worth some hours of study?
Questions and Answers sqlbi. com
Links SQLBI Website www. sqlbi. com Power. Pivot Workshop www. powerpivotworkshop. com Alberto Ferrari blog www. sqlblog. com/blogs/alberto_ferrari Marco Russo blog www. sqlblog. com/blogs/marco_russo For any question contact us at info@sqlbi. com
Coming up… Speaker Quest SQL Server Community SQLSentry Attunity Fusion-io Title Room Trivia Quiz: Test Your SQL Server and IT Knowledge and Win Aintree Prizes SQL Server Community presents : The A to Z of SQL Nuggets Lancaster Real Time and Historical Performance Troubleshooting with Pearce SQL Sentry Data Replication Redefined – best practices for replicating Empire data to SQL Server Is it a bird? Is it a plane? #SQLBITS Derby
- Slides: 68