MAXIMIZING RESOURCES DATAFLOWS DATASETS AND THE ENHANCED COMPUTE

  • Slides: 33
Download presentation
MAXIMIZING RESOURCES DATAFLOWS, DATASETS, AND THE ENHANCED COMPUTE ENGINE ANDREW SITZ

MAXIMIZING RESOURCES DATAFLOWS, DATASETS, AND THE ENHANCED COMPUTE ENGINE ANDREW SITZ

AGENDA • Welcome & Introduction • Presentation • Time for questions and interaction

AGENDA • Welcome & Introduction • Presentation • Time for questions and interaction

The modern business intelligence challenges Fragmented, incomplete data Requires a team of specialists Pulling

The modern business intelligence challenges Fragmented, incomplete data Requires a team of specialists Pulling together data from traditional and cloud data sources and figuring out how to enrich it is extremely difficult. Creating E 2 E BI solutions requires multiple BI tools. This requires specific knowledge of each of the tools and complex integration to build and maintain an E 2 E BI solution. Data preparation – the most timeconsuming task in analytics BI. Complex system design and architecture Traditional BI solutions require planning and designing in advance. Items like table structure and refresh order are complex to design and implement. Business data has no structural or semantic consistency Different applications, departments, and analysts define data in different ways, which makes data exploration, and reuse of data and apps extremely challenging. Complex system integration Traditional BI solutions span multiple applications and services. Sharing data across systems requires each system to understand the location, structure and meaning of the data

Traditional BI platform Data visualization BI modeling Gateways and connectors

Traditional BI platform Data visualization BI modeling Gateways and connectors

BI platform is evolving Data visualization BI modeling Data prep Gateways and connectors

BI platform is evolving Data visualization BI modeling Data prep Gateways and connectors

Power BI introduces dataflows Visualizations BI models Data prep Data (Azure Data Lake)

Power BI introduces dataflows Visualizations BI models Data prep Data (Azure Data Lake)

Dataflows Datasets Implementation CDM folder Analysis Services tabular model Storage CSV files Vertipaq (columnar)

Dataflows Datasets Implementation CDM folder Analysis Services tabular model Storage CSV files Vertipaq (columnar) Metadata Common Data Model – model. json BISM (part of. pbit file) Development Power Query Online Power Query in Power BI Desktop Primary purpose Data reuse Data analysis Reuse Acts as data source in multiple datasets Shared datasets across workspaces Scope of reuse Entity level reuse Dataset level reuse Mashup with other data sources Yes No Used for reporting Not directly Yes Reuse outside Power BI Yes, through ADLSg 2 Yes, through XMLA Data access methods Import, Direct Query Connection methods Import Live Connection Row-level security No Yes Certification and promotion Not yet Yes Query Reduction Incremental Refresh, query folding

DATA CHALLENGES IN POWER BI ISSUE CURRENT SOLUTION • Operations (join/sort/group by/distinct) expensive, difficult

DATA CHALLENGES IN POWER BI ISSUE CURRENT SOLUTION • Operations (join/sort/group by/distinct) expensive, difficult and slow • Issues resolved in PBI memory or in model (aggregations, composite modelling, M 2 M) • Long Refresh Times • Incremental refresh and/or cuts are made • Transformations on large data sources are slow • Pushed upstream (views), removed, or tack on to memory • Combining hundreds of GB of data in PBI Desktop not realistic • Analysts are not data engineers • Control and Change Management • 10 GB dataset limit won’t hold multi-billion row fact tables Composite Model • Dependency bottleneck • Higher involvement slower

ENHANCED COMPUTE ENGINE PERFORMANCE

ENHANCED COMPUTE ENGINE PERFORMANCE

DATA CHALLENGES IN POWER BI ISSUE NEW SOLUTION • Operations (join/sort/group by/distinct) difficult and

DATA CHALLENGES IN POWER BI ISSUE NEW SOLUTION • Operations (join/sort/group by/distinct) difficult and slow • Leverage dataflow cache highperformance SQL cache (Premium) • Long Refresh Times • Transformations on large data sources are slow • Combining hundreds of GB of data in PBI Desktop not realistic • Analysts are not data engineers • Control and Change Management

DATA CHALLENGES IN POWER BI ISSUE NEW SOLUTION • Operations (join/sort/group by/distinct) difficult and

DATA CHALLENGES IN POWER BI ISSUE NEW SOLUTION • Operations (join/sort/group by/distinct) difficult and slow • Leverage dataflow cache highperformance SQL cache (Premium) • Long Refresh Times • Refresh will be longer + Direct Query & less calc memory due to being able to leverage SQL cache is a better performance tradeoff • Transformations on large data sources are slow • Combining hundreds of GB of data in PBI Desktop not realistic • Analysts are not data engineers • Control and Change Management

DATA CHALLENGES IN POWER BI ISSUE NEW SOLUTION • Operations (join/sort/group by/distinct) difficult and

DATA CHALLENGES IN POWER BI ISSUE NEW SOLUTION • Operations (join/sort/group by/distinct) difficult and slow • Leverage dataflow cache highperformance SQL cache (Premium) • Long Refresh Times • Refresh will be longer + Direct Query & less calc memory due to being able to leverage SQL cache is a better performance tradeoff • Transformations on large data sources are slow • Combining hundreds of GB of data in PBI Desktop not realistic • Analysts are not data engineers • Control and Change Management • Composite models from curated datasets or aggregates of curated datasets

CHALLENGES ISSUE NEW SOLUTION • Operations (join/sort/group by/distinct) difficult and slow • Leverage PBI

CHALLENGES ISSUE NEW SOLUTION • Operations (join/sort/group by/distinct) difficult and slow • Leverage PBI allocated memory highperformance SQL cache (Premium) • Long Refresh Times • Refresh will be longer + Direct Query & less calc memory due to being able to leverage SQL cache is a better performance tradeoff • Transformations on large data sources are slow • Combining hundreds of GB of data in PBI Desktop not realistic • Analysts are not data engineers • Control and Change Management • Composite models from curated datasets or aggregates of curated datasets • Dataflows = Enterprise Grade + same look

CHALLENGES ISSUE NEW SOLUTION • Operations (join/sort/group by/distinct) difficult and slow • Leverage Dataflow

CHALLENGES ISSUE NEW SOLUTION • Operations (join/sort/group by/distinct) difficult and slow • Leverage Dataflow cache high-performance SQL cache (Premium) • Long Refresh Times • Refresh will be longer + Direct Query & less calc memory due to being able to leverage SQL cache is a better performance tradeoff • Transformations on large data sources are slow • Combining hundreds of GB of data in PBI Desktop not realistic • Analysts are not data engineers • Control and Change Management • Composite models from curated datasets or aggregates of curated datasets • Dataflows = Enterprise Grade + same look • CDM, entity logging, central change mgmt.

Dataflows Datasets Implementation CDM folder Analysis Services tabular model Storage CSV files, Azure SQL

Dataflows Datasets Implementation CDM folder Analysis Services tabular model Storage CSV files, Azure SQL cache Vertipaq (columnar) Metadata Common Data Model – model. json BISM (part of. pbit file) Development Power Query Online Power Query in Power BI Desktop Primary purpose Data reuse Data analysis Reuse Acts as data source in multiple datasets Shared datasets across workspaces Scope of reuse Entity level reuse Dataset level reuse Mashup with other data sources Yes No Used for reporting Not directly Yes Reuse outside Power BI Yes, through ADLSg 2 Yes, through XMLA Data access methods Import, Direct. Query (Feb 2020) Import, Direct Query Connection methods Import, Direct. Query (Feb 2020) Live Connection Row-level security No Yes Certification and promotion Not yet Yes Query Reduction Incremental Refresh, query folding

TIPS • Reduce redundancy of data Use Dataflows q Recycling sources can be optimized

TIPS • Reduce redundancy of data Use Dataflows q Recycling sources can be optimized with Dataflows and the Enhanced Compute Engine q Requires less resources to implement changes • Split ingestion and transformation to use the enhanced compute engine q Enhanced compute engine (Premium) only works against linked or computed entities q If you don’t have Premium, it can help with debugging • Use computed entities to leverage the enhanced compute engine q Make sure operations are foldable q Perform joins and filters first, as they are the most expensive operations • Avoid scheduling refresh for linked entities inside the same workspace q Locking provides transactional accuracy q Separate workspaces • Separate resources in a realistic, possible manner q Lock down ingestion area q Separate on-premise and cloud sources and have a separate dataflow per source q Can dataflows reside in another capacity? (note: Region must match) • Use Incremental Refresh (Premium) q Can be done on both the Dataflow and Dataset q Works with date keys

TIPS II • Control creation of centralized datasets from Dataflows • Monitor any Premium

TIPS II • Control creation of centralized datasets from Dataflows • Monitor any Premium resources in the Premium Capacity Metrics app q Average CPU and high utilization count. q Average Memory and high utilization count, and memory usage for specific datasets, dataflows, and paginated reports. q Active datasets loaded in memory. q Average and maximum query durations. q Average query wait times. q Average dataset and dataflow refresh times • Don’t use personal workspaces • Separate business critical and Self-Service BI into different capacities q Pro: separate workspaces q Semi-Pro: some on Premium, some on Pro q Premium: separate Premium capacities (most control) • Verify ability of queries to fold to source • Limit the number of visible visuals in a report page • Plan use of Direct. Query between teams

DATAFLOWS ARE NOT PREMIUM-ONLY Dataflow capability Scheduled refresh Power BI Pro 8 per day

DATAFLOWS ARE NOT PREMIUM-ONLY Dataflow capability Scheduled refresh Power BI Pro 8 per day Power BI Premium 48 Total Storage 10 GB/user 100 TB/node Dataflow Authoring with Power Query Online Yes Dataflow Management within Power BI Yes Dataflows Data Connector in the Power BI Desktop Yes Integration with Azure Yes Computed Entities (in-storage transformations via M) New connectors Yes Yes Dataflow incremental refresh Yes Running on Power BI Premium capacity / Parallel execution of transforms Yes Dataflow linked entities Yes Standardized Schema / Built-In Support for the Common Data Model Yes