When Can Partitioned Compute Help Improve Fabric Dataflow Performance?

Partitioned Compute is a new feature in Fabric Dataflows that allows you to run certain operations inside a Dataflow query in parallel and therefore improve performance. While UI support is limited at the moment it can be used in any Dataflow by adding a single line of fairly simple M code and checking a box in the Options dialog. But as with a lot of performance optimisation features (and this is particularly true of Dataflows) it can sometimes result in worse performance rather than better performance – you need to know how and when to use it. And so, in order to understand when this feature should and shouldn’t be used, I decided to do some tests and share the results here.

For my tests I created two queries within a single Dataflow Gen2 CICD. First, an M function called SlowFunction that takes a numeric value and returns that value with 1 added to it after a two second delay:

			
(input as number) as number => 
Function.InvokeAfter(()=>input+1, #duration(0,0,0,2))

Then the main query which returns a table of ten rows and calls the SlowFunction M function once per row:

			
let
  Rows = List.Transform({1 .. 10}, each {_}), 
  MyTable = #table(type table [RowNumber = number], Rows), 
  #"Added custom" = Table.TransformColumnTypes(
    Table.AddColumn(MyTable, "FunctionOutput", each SlowFunction([RowNumber])), 
    {{"FunctionOutput", Int64.Type}}
  )
in
  #"Added custom"

		

Here’s what the output of the query looks like:

Now, the first important question. How long does this Dataflow take to run? 10 rows calling a function that takes 2 seconds to run, so 10*2=20 seconds maybe? The answer is yes if you look at how long the preview takes to populate in the Dataflow Editor:

That’s just the preview though. If you’re refreshing a Dataflow there are other things that happen that affect how long that refresh takes, such as Staging and loading the data to a destination. There’s no way you can split the performance of your M code from these operations when looking at the duration of a Dataflow refresh in Recent Runs, which explains why some of the timings you will see later in this post seem strange. Don’t worry, though, it doesn’t stop you from seeing the important trends. I’m told that setting a CSV file in a Lakehouse as your data destination is the best way of minimising the impact of loading data on overall refresh durations but at the time of writing the CSV destination can’t be used with Partitioned Compute so all my tests used a Fabric Warehouse as a destination.

Here’s what Recent Runs showed when this Dataflow was refreshed:

The overall refresh time was 59 seconds; the query (called NonPartitioned here) that returns the table of ten rows and which was staged took 29 seconds.

Could this Dataflow benefit from Partitioned Compute? With Partitioned Compute enabled in the Options dialog, I added the necessary M code to the query:

			
let
  Rows = List.Transform({1 .. 10}, each {_}), 
  MyTable = #table(type table [RowNumber = number], Rows), 
  #"Added custom" = Table.TransformColumnTypes(
    Table.AddColumn(MyTable, "FunctionOutput", each SlowFunction([RowNumber])), 
    {{"FunctionOutput", Int64.Type}}
  ), 
  ReplacePartitionKey = Table.ReplacePartitionKey(#"Added custom", {"RowNumber"})
in
  ReplacePartitionKey

		

…and then refreshed. Here’s what Recent Runs showed:

The overall refresh duration went up to 1 minute 25 seconds; the query that does all the work (called Partitioned in this case) took 40 seconds. Note the screenshot immediately above shows the engine used is “PartitionedCompute” and that there are now ten activities listed instead of one: my M code used the RowNumber column in the table as the partition key so the Dataflow attempted to run each row of the table as a separate operation in parallel. And as you can see, this made performance worse. This is because using Partitioned Compute introduces yet another overhead and that overhead is much greater than any benefit gained from parallelism in this case.

So I wondered: what if the delay in the query is increased from 2 second to 100 seconds then? Does this increase in the delay mean that parallelism results in faster overall performance?

Here’s what Recent Runs showed for a version of my Dataflow with a 100 second delay for each row and which didn’t use Partitioned Compute:

10 rows * 100 seconds = 1000 seconds = 16 minutes 40 seconds, so it’s not surprising that the overall duration of this version of the Dataflow was slow at 17 minutes 29 seconds.

Here’s what Recent Runs shows for the version of this Dataflow that did use Partitioned Compute:

The overall duration was 4 minutes 41 seconds and the main query took 3 minutes 14 seconds. The important takeaway is that this is a lot faster than the version that didn’t use Partitioned Compute, so clearly Partitioned Compute made a big difference to performance here. As you might expect, it looks like parallelising operations that only take a few seconds results in worse performance while parallelising operations that take longer, say a minute or more, is probably a good idea. As always, you’ll need to test to see what benefits you get for your Dataflows.

These results raise a lot of questions too. 100 seconds = 1 minute 40 seconds, which is a lot less than 3 minutes 14 seconds. Does this mean that not every row in the table was evaluated in parallel? Is partitioning on the RowNumber column counter-productive and would it be better to partition in some other way to try to reduce the amount of attempted parallelism? Is there something else that is limiting the amount of parallelism? While this version of the Dataflow always performs better than the non-partitioned version, performance did vary a lot between refreshes. While these tests show how useful Partitioned Compute can be for slow Dataflows, there’s a lot more research to do and a lot more blog posts to write.

A Closer Look At Preview-Only Steps In Fabric Dataflows

I have been spending a lot of time recently investigating the new performance-related features that have rolled out in Fabric Dataflows over the last few months, so expect a lot of blog posts on this subject in the near future. Probably my favourite of these features is Preview-Only steps: they make such a big difference to my quality of life as a Dataflows developer.

The basic idea (which you can read about in the very detailed docs here) is that you can add steps to a query inside a Dataflow that are only executed when you are editing the query and looking at data in the preview pane; when the Dataflow is refreshed these steps are ignored. This means you can do things like add filters, remove columns or summarise data while you’re editing the Dataflow in order to make the performance of the editor faster or debug data problems. It’s all very straightforward and works well.

The more I thought about this feature, though, the more I wondered about how it actually works and how it can be used in more complex queries that involve hand-written M code – so I decided to do a few tests. First of all, consider the following M query:

			
let
  Step1 = 1,
  Step2 = Step1 - 1
in
  Step2

		

This query returns a numeric value: 0.

At this point it will return the same value in the editor when viewing the preview of the output and when the Dataflow refreshes. But if you right-click on Step2 and select “Enable only in previews” then the query still returns 0 in the preview but will return 1 when the Dataflow refreshes. You will also see this message displayed in the preview pane:

			
This data preview uses preview-only steps. 
Results may differ when running the dataflow.

This makes sense because the query follows a linear pattern: the output references Step2 which in turn references Step1 so if you disable Step2 then the the output simply skips it and returns the value of Step1.

But what if your M code is more complex? For example consider this query:

			
let
  x = 5,
  y = 8,
  xtimesy = x*y,
  outputtable = #table(type table[xy=number],{{xtimesy}})
in
  outputtable

		

Here’s what this returns in the editor: a table that contains the value 40.

What if you disable the step called “y”?

Here’s what I saw in my Warehouse when I loaded the output of the query to it:

I had no idea what the output would be before I saw it but, thinking about it, I assume what has happened is that since the step “y” has been disabled, “y” simply returns the value of the previous step in the query, “x”, and therefore the output becomes 5*5=25. This understanding of how preview-only steps work is backed-up by the fact that it is not possible to set step “x” to be preview-only. The option to do so is greyed out:

This all leads on to a pattern I used in a Dataflow this week and which I can see myself using a lot more in the future. While being able to disable steps when the Dataflow refreshes is very useful, sometimes what you need is to write some conditional logic in your M code that does one thing if you’re in the editor and another thing if the Dataflow is refreshing. Here’s a slightly modified version of the first query above which I have called IsRefresh in my Dataflow:

			
let
  RefreshValue = 1,
  PreviewValue = RefreshValue-1
in
  PreviewValue

		

With the PreviewValue step set to be preview-only:

…then what we have here is a query that returns 1 when the Dataflow is refreshing and 0 when you’re viewing its output in the editor. Here’s an example of how it can be used in a query:

			
let
  Source = #table(
    type table [
      NumericValue = number,
      TextValue    = text
    ],
    {
      {
        IsRefresh,
        if IsRefresh = 1 then
          "This is a refresh"
        else
          "This is a preview"
      }
    }
  )
in
  Source

		

This query returns the following when you preview the output in the editor:

But when the Dataflow refreshes, the output in the destination is this:

Report On SAP And Salesforce Data In Fabric With Business Process Solutions

If you want to build a reporting solution on SAP (S/4HANA or ECC) or Salesforce data in Fabric and don’t want to build everything from scratch then you should check out Business Process Solutions. It’s a free, Microsoft-developed solution currently in public preview; the announcement blog post from last year is here and you can find all the docs here. It’s implemented as a Fabric custom workload which means that you can deploy it to a new workspace easily with just a few clicks, although there is of course a bit of configuration needed so it can connect to your data sources.

After you’ve done that it will generate all the Fabric items (pipelines, semantic models, Power BI reports etc) needed and you can concentrate on analysing your data. I know the team that is building Business Process Solutions and they are very smart so I’m sure what they’ve built is well designed.Check it out!

Power BI, Parallelism And Dependencies Between SQL Queries In DirectQuery Mode

This is going to sound strange, but one of the things I like about tuning Power BI DirectQuery semantic models is that their generally-slower performance and the fact you can see the SQL queries that are generated to get data makes it much easier to understand some of the innermost workings of the Power BI engine. For example this week I was trying to tune a DAX query on a DirectQuery model using DAX Studio and the Server Timings showed me something like this:

As I described here, Power BI can send SQL queries in parallel in DirectQuery mode and you can see from the Timeline column there is some parallelism happening here – the last two SQL queries generated by the DAX query run at the same time – but everything has to wait for that first SQL query to complete. Why? Can this be tuned?

Here’s the scenario that produced the query above. I have a DirectQuery semantic model built from the ContosoDW SQL Server sample database:

There are three base measures defined:

			
Distinct Customers = DISTINCTCOUNT(FactOnlineSales[CustomerKey])
January Customers = 
CALCULATE([Distinct Customers], 
KEEPFILTERS('DimDate'[CalendarMonthLabel]="January"))
Monday Customers = 
CALCULATE([Distinct Customers], 
KEEPFILTERS('DimDate'[CalendarDayOfWeekLabel]="Monday"))

		

Note that these measures are written specifically to prevent fusion from taking place: each measure generates a separate SQL query. Here’s what DAX Studio’s Server Timings shows for the DAX query generated for the table shown above:

As you can see, the three SQL queries generated by the DAX query are run in parallel.

Now consider the following measure:

			
IF Test = IF([Distinct Customers]>3000, [January Customers], [Monday Customers])

Here’s what this measure returns:

If you run the query generated for this visual in DAX Studio, Server Timings shows what I showed in the first screenshot in this post:

The last two substantial SQL queries, on lines 4 and 5, can only run when the first SQL query, on line 1, has finished. The details of SQL queries tell you more about what’s going on here. The first SQL query, on line 1, just gets the values for the [Distinct Customers] measure for all rows in the table:

The WHERE clauses for the SQL queries on line 4:

..and line 5:

…show that these last two queries only get the values for the [January Customers] and [Monday Customers] measures for the rows where the [IF Test] measure needs to display them. And this explains why the first SQL query has to finish before these last two SQL queries can be run: the WHERE clauses of these last two SQL queries are constructed using the results returned by the first SQL query.

There is another way of evaluating the IF condition in the [IF Test] measure. Instead of “strict” evaluation, where the engine only gets the value of [January Customers] for the rows in the table where [Distinct Customers] is greater than 3000 and only gets the value [Monday Customers] for the remaining rows, it can get values for [January Customers] and [Monday Customers] for all rows in the table and then throw away the values it doesn’t need. This is “eager” evaluation and as you would expect, Marco and Alberto have a great article explaining strict and eager evaluation here that is worth reading; Power BI can decide to use either strict or eager evaluation with the IF function depending on which one it thinks will be more efficient. However you can force the use of eager evaluation by using the IF.EAGER DAX function instead of IF:

			
IF EAGER Test = 
IF.EAGER([Distinct Customers]>3000, [January Customers], [Monday Customers])

Here’s what Server Timings shows for the DAX query that uses IF.EAGER:

As you can see, the use of IF.EAGER means that the three substantial SQL queries generated by Power BI for this DAX query can now be run in parallel because there are no dependencies between them: they get the values of [Distinct Customers], [January Customers] and [Monday Customers] for all rows in the table. However, even though these three SQL queries are now run in parallel, it doesn’t result in any performance benefits here because it looks like the three queries are slower as a result of all being run at the same time. Power BI has made the right call to use strict evaluation with the IF function in this case but if you see it using strict evaluation I think it’s worth experimenting with IF.EAGER to see if it performs better – especially in DirectQuery mode where Power BI knows less about the performance characteristics of the database you’re using as your data source.

[Thanks to Phil Seamark for helping me understand this behaviour]

Measuring Power BI Report Page Load Times

If you’re performance tuning a Power BI report the most important thing you need to measure – and the thing your users certainly care about most – is how long it takes for a report page to load. Yet this isn’t something that is available anywhere in Power BI Desktop or in the Service (though you can use browser dev tools to do this) and developers often concentrate on tuning just the individual DAX queries generated by the report instead. Usually that’s all you need to do but running multiple DAX queries concurrently can affect the performance of each one, and there are other factors (for example geocoding in map visuals or displaying images) that affect report performance so if you do not look at overall page render times then you might miss them. In this post I’ll show you how you can measure report page load times, and the times taken for other forms of report interaction, using Performance Analyzer in the Service and Power Query.

Consider the following series of interactions with a published Power BI report:

The report itself isn’t really that important – just know that there are a series of interactions with a slowish report while Performance Analyzer is running. Here’s what Performance Analyzer shows by the end of these interactions:

Here’s a list of the interactions captured:

I changed from a blank report page to a page with a table visual, where the table visual was cached and displayed immediately
I then refreshed the table visual on that page by clicking the Refresh Visuals button in the Performance Analyzer pane
I changed to the next page in the report and all the visuals on that page rendered
I changed the slicer on that new page
I clicked on the bar chart to cross-filter the rest of the page

As you can see from the screenshot above, Performance Analyzer tells you how long each visual takes to render within each interaction but it doesn’t tell you how long each interaction took in total. In a lot of cases you can assume that the time taken for an interaction is the same as the time taken for the slowest visual to render, but that may not always be true.

So how can you use Performance Analyzer to measure the time taken for these interactions? How can you measure the amount of time taken to render a page in a report?

To solve this problem I created a Power Query query that takes the event data JSON file that you can export from Performance Analyzer and returns a table showing the amount of time taken for each interaction. Here’s the M code for this query:

			
let
    Source = Json.Document(File.Contents("C:\PowerBIPerformanceData.json")),
    ToTable = Table.FromRecords({Source}),
    Events = ToTable{0}[events],
    EventTable = Table.FromList(Events, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Expanded Column1" = Table.ExpandRecordColumn(EventTable, "Column1", {"name", "start", "id", "metrics", "end"}, {"name", "start", "id", "metrics", "end"}),
    #"Expanded metrics" = Table.ExpandRecordColumn(#"Expanded Column1", "metrics", {"sourceLabel"}, {"sourceLabel"}),
    #"Added Custom1" = Table.AddColumn(#"Expanded metrics", "UserActionID", each if [name]="User Action" then [id] else null),
    #"Added Custom2" = Table.AddColumn(#"Added Custom1", "UserActionLabel", each if [name]="User Action" then [sourceLabel] else null),
    #"Changed Type" = Table.TransformColumnTypes(#"Added Custom2",{{"start", type datetime}, {"end", type datetime}, {"UserActionID", type text}, {"sourceLabel", type text}, {"UserActionLabel", type text}}),
    #"Filled Down" = Table.FillDown(#"Changed Type",{"UserActionID", "UserActionLabel"}),
    #"Filtered Rows" = Table.SelectRows(#"Filled Down", each [start] > #datetime(1970, 1, 2, 0, 0, 0)),
    #"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each [end] > #datetime(1970, 1, 2, 0, 0, 0)),
    #"Grouped Rows" = Table.Group(#"Filtered Rows1", {"UserActionID", "UserActionLabel"}, {{"Start", each List.Min([start]), type nullable datetime}, {"End", each List.Max([end]), type nullable datetime}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Duration", each [End]-[Start], type duration),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"UserActionID"})
in
    #"Removed Columns"

		

Here’s the output of this query for the interactions shown above:

Some notes about this query:

You will need to change the Source step to point to the JSON file you have exported from Performance Analyzer
Each interaction is represented by a row in the table and identified by the UserActionLabel column
I’m calculating the durations by finding the minimum start time and the maximum end time for all events associated with an interaction and subtracting the former from the latter
There’s a bug (which hopefully gets fixed at some point) where some events have start and end dates in 1970, so I have filtered out any dates that are obviously wrong
The Duration column shows how long each interaction took and uses the Power Query duration data type, which is formatted as days.hours:minutes:seconds

The example above is fairly complex showing several different kinds of interactions. If you just want to find the amount of time taken to render all the visuals on a page you can click the Refresh Visuals button in Performance Analyzer to refresh all the visuals on the page – it may not give you a 100% “cold cache” page render but it will be good enough. I’m not a web developer but I think to really do things properly you’ll need to open the report on a blank page in the browser, do an “Empty Cache Hard Reload“, go to edit mode in the report, enable Performance Analyzer, then move to the page you want to test. If you’re testing a DirectQuery model then you’ll also want to include the overhead of opening connections (which can be substantial); the only way I have found to do that is either wait for at least an hour for any connections in the pool to be dropped, or if you’re using a gateway to restart it. One last point to make is that while you can use Performance Analyzer in Power BI Desktop and in the browser the behaviour of Power BI may be different in these two places, so always make sure you measure performance of published reports in the browser because that’s where your users will be using your reports.

Here’s what clicking the Refresh Visuals button in Performance Analyzer to refresh all the visuals on a page looks like:

This results in a single interaction and a single row in the output of the Power Query query above:

In this case you can see that the page refresh took 12.14 seconds.

As you will have realised by now, getting the amount of time it takes to load a report page isn’t straightforward and there are a lot of factors to take into account. Nonetheless using Performance Analyzer in this way is much better than not measuring page load times at all or (as I’ve seen some people do) using a stopwatch. If you try this and find something interesting please let me know: I’m doing a lot of testing with Performance Analyzer and learning new things all the time.

New Books: “The Definitive Guide To DAX” 3rd Edition And “Microsoft Power BI Visual Calculations”

For some reason I haven’t had any free copies of books to review recently; maybe the market for tech books has finally collapsed with AI? Books are still being published though and luckily, as someone who once published a book via an O’Reilly imprint, I have a lifetime subscription to O’Reilly online learning which gives me free access to all the tech books I ever need. Two books were published in the last few months that I was curious to read: the third edition of “The Definitive Guide To DAX” by my friends Marco Russo and Alberto Ferrari, and “Microsoft Power BI Visual Calculations” by my colleague Jeroen ter Heerdt, Madzy Stikkelorum and Marc Lelijveld. As I’ve said many times, I don’t write book reviews here (least of of reviews of books by friends or colleagues where I could never be unbiased), but I think there’s some value sharing my thoughts on these books.

“The Definitive Guide To DAX”, 3rd Edition

It’s generally accepted that the one book that anyone who is serious about Power BI should own is “The Definitive Guide To DAX”. If you don’t already own a copy you should buy one, but since most people who read my blog probably have one already the more interesting question to ask is what’s new in the third edition and whether it’s worth upgrading – especially since I’d seen Marco say that the book had been completely rewritten. I’ve heard the “completely rewritten” line before and I was sceptical but it turns out that it really is a very different book. It’s not completely rewritten because there is material there from previous editions but there are a lot of changes.

First of all, as you would expect, all the new additions to DAX since the second edition was published are covered including user defined functions, visual calculations, calendar-based time intelligence functionality and window functions. These are all really important features you will want to use in your semantic models and reports so this is the main reason you’d want to buy a copy of this edition.

Secondly, the main (and justfied) criticism of the previous editions was that they were, as we say in the UK, “heavy going”. They had absolutely all the information you would ever need but they were not the easiest books to read or understand. That has been addressed in the third edition: the tone is a little bit more friendly and difficult concepts are now explained visually as well as in text. As a result it’s easier to recommend the book for beginners.

Thirdly, some advanced topics (for example around performance tuning) have been dropped. For example I searched for the term “callback” in this new edition and found no mentions; that’s not true of the second edition. I have mixed feelings about this because it means the book isn’t as “definitive” as it used to be, but I can understand why it’s happened: with so much new content to add, keeping these advanced topics would have made an already long book too long. And let’s be honest, how often do you look at the details of a DAX query plan? If the aim is to teach DAX then cutting content means it’s easier for the reader to focus on the core concepts.

In summary, then, another great piece of work from Marco and Alberto and worth buying even if you have a copy of an earlier edition.

“Microsoft Power BI Visual Calculations”

A whole book about visual calculations? As I mentioned above, they’re covered in one chapter of “The Definitive Guide To DAX” but that book focuses on DAX; this one takes more time to explain the concepts and, crucially, includes a lot of practical examples of how to use them. Like user-defined functions, when visual calculations were released there was an explosion of community content showing how they can be used to solve problems that were difficult to solve in Power BI before – problems that no-one could have been anticipated that would be solved with visual calculations. The real value of this book is showing how to build a bump chart or a tornado chart with visual calculations and that makes it worth checking out.

Closing thoughts: why buy a book?

As you would expect, a lot of the information contained in these books is already available for free somewhere on the internet. And with AI you don’t even need to know how to search for it or stitch it all together – you can ask a question and get an answer customised to your exact scenario. So why buy books any more? I guess it depends on whether you only want to get your problems solved or understand how to solve problems yourself. For me (even though my attention span has eroded in recent years, just like everyone else’s) the only way to grasp really difficult concepts is through long-form written explanations or training courses, not fragments found in blog posts or 10-minute videos. I suspect that AI is the final nail in the coffin of the tech publishing industry but the tech book industry not being viable any more is not the same thing as tech books not being useful any more. Or maybe I’m just old-fashioned.

New Performance Optimisation for Excel PivotTables Connected To Power BI Semantic Models

Some good news: an important optimisation has rolled out for Excel PivotTables connected to Power BI semantic models! Back in 2019 I wrote about a very common problem affecting the MDX generated by PivotTables connected to Analysis Services where, when subtotals are turned off and grand totals are turned on, the query nevertheless returns the subtotal values. This led to extremely slow, expensive MDX queries being run and a lot of complaints. The nice people on the Excel team have now fixed this problem and PivotTables connected to Power BI semantic models generate MDX queries that only return the values needed by the PivotTable.

Here’s an example of a PivotTable connected to a published Power BI semantic model:

Note that the subtotals have been turned off but the grand totals are displayed – this is important. Here’s the MDX query generated for this PivotTable:

SELECT NON EMPTY 
{ /* GTOPT-BEGIN CSECTIONS=2 */ 
 /* GTOPT-SECT-BEGIN-1 Desc:GrandTotal */ 
{([Property Transactions].[New].[All],[Property Type].[Property Type Name].[All])}
 /* GTOPT-SECT-END-1 */ 
,
 /* GTOPT-SECT-BEGIN-2 Desc:Detailed */ 
{Hierarchize(CrossJoin({[Property Transactions].[New].[New].AllMembers}, 
{([Property Type].[Property Type Name].[Property Type Name].AllMembers)}))}
 /* GTOPT-SECT-END-2 */ 
} /* GTOPT-END */ 
 DIMENSION PROPERTIES PARENT_UNIQUE_NAME,HIERARCHY_UNIQUE_NAME ON COLUMNS  
 FROM [Model] 
 WHERE ([Measures].[Count Of Sales]) 
 CELL PROPERTIES VALUE, FORMAT_STRING, LANGUAGE, BACK_COLOR, FORE_COLOR, FONT_FLAGS

And here’s what this query returns:

There are 11 values displayed in the PivotTable and the MDX query returns 11 values. It’s what you’d expect but as I said, up to now, Excel would have generated an MDX query that returned 13 values – a query that also requested the subtotal values that aren’t displayed.

This optimisation should now be rolled out to 100% of Excel users. You can tell if you are using the new query pattern by looking for comments in the MDX code with the text “GTOPT” in – they’re easy to spot in the query shown above. Right now the optimisation only happens for PivotTables connected to Power BI semantic models but I’ve been told that in future it should also happen for PivotTables connected to Azure Analysis Services and SSAS; this is because some server-side optimisations are necessary to make the new MDX perform as well as possible.

You might be thinking that, despite my excitement, this is a very niche scenario but I assure you it’s not: Excel users very frequently create PivotTables that are formatted to look like tables, and having subtotals turned off and grand totals turned on is a key part of this. The more fields that are put on rows the more subtotals there are to calculate and the more the overhead increases; it’s not uncommon to find situations where the number of subtotal values is much greater than the number of values actually displayed in the PivotTable.

This doesn’t solve all the performance problems associated with PivotTables and Power BI though and more work is planned for the future.

[Thanks to Yaakov Ben Noon for driving this work!]

Calculate(), DAX Fusion And Filters On 0 In Power BI

Here’s a fun DAX performance tip that I found this week. Do you have measures that use Calculate() with a filter on a numeric column? Is one of the filters on the value 0? If so then this may affect you.

As always, a simple example is the best way of explaining the problem and the solution. Consider the following table in an Import mode semantic model:

Here are some measures that reference that table:

'Sales Amount' = SUM('Sales'[SalesAmount])
'Apples Sales' = CALCULATE([Sales Amount], Sales[ProductID]=0)
'Oranges Sales' = CALCULATE([Sales Amount], Sales[ProductID]=1)
'Pears Sales' = CALCULATE([Sales Amount], Sales[ProductID]=2)
'Grapes Sales' = CALCULATE([Sales Amount], Sales[ProductID]=3)

Let’s say you then have a report which shows the value of the ‘Oranges Sales’ measure:

Running the DAX query generated by this table in DAX Studio with Server Timings enabled shows that there is just one Storage Engine query generated by this DAX query:

Here’s the xmSQL for that single SE query:

SET DC_KIND="AUTO";
SELECT
    SUM ( 'Sales'[SalesAmount] )
FROM 'Sales'
WHERE
    'Sales'[ProductID] = 1;

The WHERE clause here filters on the ProductID 1, which is the ID of the product Oranges.

Now if you add the measures ‘Pear Sales’ and ‘Grapes Sales’ to the table visual:

…and run the query in DAX Studio again, you’ll still see a single SE query:

Here’s the xmSQL:

SET DC_KIND="AUTO";
SELECT
    'Sales'[ProductID],
    SUM ( 'Sales'[SalesAmount] ),
    SUM ( 'Sales'[SalesAmount] ),
    SUM ( 'Sales'[SalesAmount] )
FROM 'Sales'
WHERE
    'Sales'[ProductID] IN ( 1, 2, 3 ) ;

As you can see, the WHERE clause now filters on the ProductIDs 1, 2 or 3: the IDs of the three products used in the three measures. This is DAX fusion – specifically horizontal fusion – in action. It’s an optimisation where multiple filters on the same column can be combined into a single SE query. Fewer SE queries is generally better for performance. So far so good.

Now let’s add the measure ‘Apples Sales’ to the table visual:

Running the DAX query in DAX Studio now shows there are two SE queries:

The first SE query has the same xmSQL as the previous DAX query:

SET DC_KIND="AUTO";
SELECT
    'Sales'[ProductID],
    SUM ( 'Sales'[SalesAmount] ),
    SUM ( 'Sales'[SalesAmount] ),
    SUM ( 'Sales'[SalesAmount] )
FROM 'Sales'
WHERE
    'Sales'[ProductID] IN ( 1, 2, 3 ) ;

The second SE query has the following xmSQL and a WHERE clause that indicates it is retrieving the data for just Apples:

SET DC_KIND="AUTO";
SELECT
    SUM ( 'Sales'[SalesAmount] )
FROM 'Sales'
WHERE
    'Sales'[ProductID] IN ( null, 0 ) ;

Two SE queries can mean worse overall performance. Why is the filter on 0 in the ‘Apples Sales’ measure special? Why does it result in a second SE query, why does this second SE query filter on 0 or null, and why doesn’t horizontal fusion take place for Apples?

The answer lies with how DAX handles blanks and zeroes, something discussed in depth in this article by Marco Russo. The filter condition in the ‘Apples Sales’ measure:

'Apples Sales' = CALCULATE([Sales Amount], Sales[ProductID]=0)

..actually filters on 0 or blank and that’s why the xmSQL of that second SE query filters on 0 or null, and that in turn explains why horizontal fusion does not take place – all the other measures filter on a specific number, the ‘Apples Sales’ measure filters on the number 0 or blank.

The solution is to update the measures in the model to use the strictly equal to == operator like so:

'Sales Amount' = SUM('Sales'[SalesAmount])
'Apples Sales' = CALCULATE([Sales Amount], Sales[ProductID]==0)
'Oranges Sales' = CALCULATE([Sales Amount], Sales[ProductID]==1)
'Pears Sales' = CALCULATE([Sales Amount], Sales[ProductID]==2)
'Grapes Sales' = CALCULATE([Sales Amount], Sales[ProductID]==3)

After this change the DAX query that returns the measures for ‘Apples Sales’, ‘Oranges Sales’, ‘Pear Sales’ and ‘Grapes Sales’ now generates a single SE query, meaning that horizontal fusion is taking place for all measures:

Here’s the xmSQL for that query:

SET DC_KIND="AUTO";
SELECT
    'Sales'[ProductID],
    SUM ( 'Sales'[SalesAmount] ),
    SUM ( 'Sales'[SalesAmount] ),
    SUM ( 'Sales'[SalesAmount] ),
    SUM ( 'Sales'[SalesAmount] )
FROM 'Sales'
WHERE
    'Sales'[ProductID] IN ( 0, 1, 2, 3 ) ;

As you can see, the WHERE clause now filters on the Product IDs 0, 1, 2 or 3.

This example uses an Import mode model but this tip also applies to DirectQuery mode and because additional SE queries (which mean additional SQL queries) can have more of an impact on performance in DirectQuery mode then ensuring horizontal fusion takes place can be even more important in DirectQuery mode.

I think this tip could benefit a lot of semantic models out there. A lot of measures use Calculate() and filter on numeric columns: the customer I was working with this week had measures that filtered on a year offset column on their date dimension table (so filtering on offset 0 meant this year, filtering on offset 1 meant last year and so on) and I reproduced the problem on my sample DirectQuery semantic model based on the ContosoDW sample database with a filter on the NumberChildrenAtHome column of the Customer dimension table. Also, I can’t remember the last time I saw the strictly equal to operator in the wild. If you find this tip helps you, please let me know by leaving a comment!

Measuring Time To Display For Image Visuals In Power BI With Performance Analyzer

Carrying on my series on troubleshooting Power BI performance problems with Performance Analyzer, another situation where a report may be slow even when the DAX queries it generates against the underlying semantic model are fast is when you have large images displayed in an Image visual. Let’s see an example.

I created a Power BI report consisting of a single image visual that displayed a 25MB photo:

I then published the report – and this is important because the behaviour I’m describing here will only be relevant to published report – then cleared the browser cache, started Performance Analyzer and viewed the page with the image visual on. I then stopped Performance Analyzer, exported the results and visualised them with my custom visual. Here’s what I saw:

The Image visual took just over 6 seconds to load and almost all of that time is related to the Visual Container Resource Load event. You can guess that this is the time taken to load the image from the name of the event; looking at what happens behind the scenes when the report renders using Chrome DevTools confirms this.

It’s important to understand that the Image visual won’t take this long to load every time the report is rendered – caching means that the image will only need to be downloaded once. Still it’s another reminder that having a large image or a lot of small images on a page can have an impact on report performance.

Measuring Geocoding Performance In Power BI Map Visuals Using Performance Analyzer

When a user complains about their Power BI report being slow it’s normal for the developer to open up DAX Studio and start tuning the measures in the semantic model because 95% of the time the problem is the DAX. But there’s a whole class of report performance problems that are nothing to do with the semantic model or DAX and they are the ones that are much easier to troubleshoot now that Performance Analyzer is available in the browser as well as Desktop. Today I want to look at one of them: geocoding in map visuals.

What is geocoding? Let me show you an example. I have a semantic model with data about UK real estate sales where each row in the fact table represents the sale of some kind of property like a house or apartment. The model has the address of the property sold and the address includes a postcode (similar to a US zip code). I added an Azure Map visual onto a report and dragged my Postcode data into the Location well of the visual and got this:

Postcodes are just text values. How does Power BI know where each postcode is on a map? It does so by geocoding: sending each postcode to a web service which returns a latitude and longitude for that postcode which allows its location to be plotted. And if, as in this case, you have thousands of postcodes to geocode then this process can be slow.

Because geocoding can be slow Power BI will cache geocoded locations but you can turn this caching off on the Diagnostics tab of the Options dialog using the “Bypass geocoding cache” option:

I did this to get worst-case performance and then refreshed my Azure Map visual with Performance Analyzer running, then exported the output of Performance Analyzer and visualised it with my Performance Analyzer custom visual. Here’s what it showed:

What does this show us?

The visual as a whole took 107 seconds to display, as shown by the duration of the Visual Container Lifecycle event
The underlying DAX query (which returns 30002 rows – not all the postcodes but the limit of what can be displayed) took a fraction of a second to return
The visual took 6 seconds to render
The geocoding took 106 seconds

So, a great example of where a report is slow but where the model and its measures is not the cause and where Performance Analyzer is the only way to see what the actual cause is.

What can you do to improve performance? Well as I said, with caching enabled (as it is by default) then performance would be nowhere near this bad, but there are some things you can do. First of all it’s obvious that the visual itself displays more data than any user can make sense of and Power BI is sampling data in this case too, so redesigning the report so the user only sees a useable, comprehensible subset of the data on the map would be a good start. You could also geocode the data yourself during your ETL rather than waiting for Power BI to do it when it displays the report – postcodes are not going to change location – and this would give you latitudes and longitudes you can give the Azure Map visual instead. Finally, it’s probably worth experimenting with different visuals, including other map custom visuals, to see if they perform differently for your requirements.

Chris Webb's BI Blog

When Can Partitioned Compute Help Improve Fabric Dataflow Performance?

Like this:

A Closer Look At Preview-Only Steps In Fabric Dataflows

Like this:

Report On SAP And Salesforce Data In Fabric With Business Process Solutions

Like this:

Power BI, Parallelism And Dependencies Between SQL Queries In DirectQuery Mode

Like this:

Measuring Power BI Report Page Load Times

Like this:

New Books: “The Definitive Guide To DAX” 3rd Edition And “Microsoft Power BI Visual Calculations”

Like this:

New Performance Optimisation for Excel PivotTables Connected To Power BI Semantic Models

Like this:

Calculate(), DAX Fusion And Filters On 0 In Power BI

Like this:

Measuring Time To Display For Image Visuals In Power BI With Performance Analyzer

Like this:

Measuring Geocoding Performance In Power BI Map Visuals Using Performance Analyzer

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: