Report On SAP And Salesforce Data In Fabric With Business Process Solutions

If you want to build a reporting solution on SAP (S/4HANA or ECC) or Salesforce data in Fabric and don’t want to build everything from scratch then you should check out Business Process Solutions. It’s a free, Microsoft-developed solution currently in public preview; the announcement blog post from last year is here and you can find all the docs here. It’s implemented as a Fabric custom workload which means that you can deploy it to a new workspace easily with just a few clicks, although there is of course a bit of configuration needed so it can connect to your data sources.

After you’ve done that it will generate all the Fabric items (pipelines, semantic models, Power BI reports etc) needed and you can concentrate on analysing your data. I know the team that is building Business Process Solutions and they are very smart so I’m sure what they’ve built is well designed.Check it out!

Power BI, Parallelism And Dependencies Between SQL Queries In DirectQuery Mode

This is going to sound strange, but one of the things I like about tuning Power BI DirectQuery semantic models is that their generally-slower performance and the fact you can see the SQL queries that are generated to get data makes it much easier to understand some of the innermost workings of the Power BI engine. For example this week I was trying to tune a DAX query on a DirectQuery model using DAX Studio and the Server Timings showed me something like this:

As I described here, Power BI can send SQL queries in parallel in DirectQuery mode and you can see from the Timeline column there is some parallelism happening here – the last two SQL queries generated by the DAX query run at the same time – but everything has to wait for that first SQL query to complete. Why? Can this be tuned?

Here’s the scenario that produced the query above. I have a DirectQuery semantic model built from the ContosoDW SQL Server sample database:

There are three base measures defined:

			
Distinct Customers = DISTINCTCOUNT(FactOnlineSales[CustomerKey])
January Customers = 
CALCULATE([Distinct Customers], 
KEEPFILTERS('DimDate'[CalendarMonthLabel]="January"))
Monday Customers = 
CALCULATE([Distinct Customers], 
KEEPFILTERS('DimDate'[CalendarDayOfWeekLabel]="Monday"))

		

Note that these measures are written specifically to prevent fusion from taking place: each measure generates a separate SQL query. Here’s what DAX Studio’s Server Timings shows for the DAX query generated for the table shown above:

As you can see, the three SQL queries generated by the DAX query are run in parallel.

Now consider the following measure:

			
IF Test = IF([Distinct Customers]>3000, [January Customers], [Monday Customers])

Here’s what this measure returns:

If you run the query generated for this visual in DAX Studio, Server Timings shows what I showed in the first screenshot in this post:

The last two substantial SQL queries, on lines 4 and 5, can only run when the first SQL query, on line 1, has finished. The details of SQL queries tell you more about what’s going on here. The first SQL query, on line 1, just gets the values for the [Distinct Customers] measure for all rows in the table:

The WHERE clauses for the SQL queries on line 4:

..and line 5:

…show that these last two queries only get the values for the [January Customers] and [Monday Customers] measures for the rows where the [IF Test] measure needs to display them. And this explains why the first SQL query has to finish before these last two SQL queries can be run: the WHERE clauses of these last two SQL queries are constructed using the results returned by the first SQL query.

There is another way of evaluating the IF condition in the [IF Test] measure. Instead of “strict” evaluation, where the engine only gets the value of [January Customers] for the rows in the table where [Distinct Customers] is greater than 3000 and only gets the value [Monday Customers] for the remaining rows, it can get values for [January Customers] and [Monday Customers] for all rows in the table and then throw away the values it doesn’t need. This is “eager” evaluation and as you would expect, Marco and Alberto have a great article explaining strict and eager evaluation here that is worth reading; Power BI can decide to use either strict or eager evaluation with the IF function depending on which one it thinks will be more efficient. However you can force the use of eager evaluation by using the IF.EAGER DAX function instead of IF:

			
IF EAGER Test = 
IF.EAGER([Distinct Customers]>3000, [January Customers], [Monday Customers])

Here’s what Server Timings shows for the DAX query that uses IF.EAGER:

As you can see, the use of IF.EAGER means that the three substantial SQL queries generated by Power BI for this DAX query can now be run in parallel because there are no dependencies between them: they get the values of [Distinct Customers], [January Customers] and [Monday Customers] for all rows in the table. However, even though these three SQL queries are now run in parallel, it doesn’t result in any performance benefits here because it looks like the three queries are slower as a result of all being run at the same time. Power BI has made the right call to use strict evaluation with the IF function in this case but if you see it using strict evaluation I think it’s worth experimenting with IF.EAGER to see if it performs better – especially in DirectQuery mode where Power BI knows less about the performance characteristics of the database you’re using as your data source.

[Thanks to Phil Seamark for helping me understand this behaviour]

Measuring Power BI Report Page Load Times

If you’re performance tuning a Power BI report the most important thing you need to measure – and the thing your users certainly care about most – is how long it takes for a report page to load. Yet this isn’t something that is available anywhere in Power BI Desktop or in the Service (though you can use browser dev tools to do this) and developers often concentrate on tuning just the individual DAX queries generated by the report instead. Usually that’s all you need to do but running multiple DAX queries concurrently can affect the performance of each one, and there are other factors (for example geocoding in map visuals or displaying images) that affect report performance so if you do not look at overall page render times then you might miss them. In this post I’ll show you how you can measure report page load times, and the times taken for other forms of report interaction, using Performance Analyzer in the Service and Power Query.

Consider the following series of interactions with a published Power BI report:

The report itself isn’t really that important – just know that there are a series of interactions with a slowish report while Performance Analyzer is running. Here’s what Performance Analyzer shows by the end of these interactions:

Here’s a list of the interactions captured:

I changed from a blank report page to a page with a table visual, where the table visual was cached and displayed immediately
I then refreshed the table visual on that page by clicking the Refresh Visuals button in the Performance Analyzer pane
I changed to the next page in the report and all the visuals on that page rendered
I changed the slicer on that new page
I clicked on the bar chart to cross-filter the rest of the page

As you can see from the screenshot above, Performance Analyzer tells you how long each visual takes to render within each interaction but it doesn’t tell you how long each interaction took in total. In a lot of cases you can assume that the time taken for an interaction is the same as the time taken for the slowest visual to render, but that may not always be true.

So how can you use Performance Analyzer to measure the time taken for these interactions? How can you measure the amount of time taken to render a page in a report?

To solve this problem I created a Power Query query that takes the event data JSON file that you can export from Performance Analyzer and returns a table showing the amount of time taken for each interaction. Here’s the M code for this query:

			
let
    Source = Json.Document(File.Contents("C:\PowerBIPerformanceData.json")),
    ToTable = Table.FromRecords({Source}),
    Events = ToTable{0}[events],
    EventTable = Table.FromList(Events, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Expanded Column1" = Table.ExpandRecordColumn(EventTable, "Column1", {"name", "start", "id", "metrics", "end"}, {"name", "start", "id", "metrics", "end"}),
    #"Expanded metrics" = Table.ExpandRecordColumn(#"Expanded Column1", "metrics", {"sourceLabel"}, {"sourceLabel"}),
    #"Added Custom1" = Table.AddColumn(#"Expanded metrics", "UserActionID", each if [name]="User Action" then [id] else null),
    #"Added Custom2" = Table.AddColumn(#"Added Custom1", "UserActionLabel", each if [name]="User Action" then [sourceLabel] else null),
    #"Changed Type" = Table.TransformColumnTypes(#"Added Custom2",{{"start", type datetime}, {"end", type datetime}, {"UserActionID", type text}, {"sourceLabel", type text}, {"UserActionLabel", type text}}),
    #"Filled Down" = Table.FillDown(#"Changed Type",{"UserActionID", "UserActionLabel"}),
    #"Filtered Rows" = Table.SelectRows(#"Filled Down", each [start] > #datetime(1970, 1, 2, 0, 0, 0)),
    #"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each [end] > #datetime(1970, 1, 2, 0, 0, 0)),
    #"Grouped Rows" = Table.Group(#"Filtered Rows1", {"UserActionID", "UserActionLabel"}, {{"Start", each List.Min([start]), type nullable datetime}, {"End", each List.Max([end]), type nullable datetime}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Duration", each [End]-[Start], type duration),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"UserActionID"})
in
    #"Removed Columns"

		

Here’s the output of this query for the interactions shown above:

Some notes about this query:

You will need to change the Source step to point to the JSON file you have exported from Performance Analyzer
Each interaction is represented by a row in the table and identified by the UserActionLabel column
I’m calculating the durations by finding the minimum start time and the maximum end time for all events associated with an interaction and subtracting the former from the latter
There’s a bug (which hopefully gets fixed at some point) where some events have start and end dates in 1970, so I have filtered out any dates that are obviously wrong
The Duration column shows how long each interaction took and uses the Power Query duration data type, which is formatted as days.hours:minutes:seconds

The example above is fairly complex showing several different kinds of interactions. If you just want to find the amount of time taken to render all the visuals on a page you can click the Refresh Visuals button in Performance Analyzer to refresh all the visuals on the page – it may not give you a 100% “cold cache” page render but it will be good enough. I’m not a web developer but I think to really do things properly you’ll need to open the report on a blank page in the browser, do an “Empty Cache Hard Reload“, go to edit mode in the report, enable Performance Analyzer, then move to the page you want to test. If you’re testing a DirectQuery model then you’ll also want to include the overhead of opening connections (which can be substantial); the only way I have found to do that is either wait for at least an hour for any connections in the pool to be dropped, or if you’re using a gateway to restart it. One last point to make is that while you can use Performance Analyzer in Power BI Desktop and in the browser the behaviour of Power BI may be different in these two places, so always make sure you measure performance of published reports in the browser because that’s where your users will be using your reports.

Here’s what clicking the Refresh Visuals button in Performance Analyzer to refresh all the visuals on a page looks like:

This results in a single interaction and a single row in the output of the Power Query query above:

In this case you can see that the page refresh took 12.14 seconds.

As you will have realised by now, getting the amount of time it takes to load a report page isn’t straightforward and there are a lot of factors to take into account. Nonetheless using Performance Analyzer in this way is much better than not measuring page load times at all or (as I’ve seen some people do) using a stopwatch. If you try this and find something interesting please let me know: I’m doing a lot of testing with Performance Analyzer and learning new things all the time.

New Books: “The Definitive Guide To DAX” 3rd Edition And “Microsoft Power BI Visual Calculations”

For some reason I haven’t had any free copies of books to review recently; maybe the market for tech books has finally collapsed with AI? Books are still being published though and luckily, as someone who once published a book via an O’Reilly imprint, I have a lifetime subscription to O’Reilly online learning which gives me free access to all the tech books I ever need. Two books were published in the last few months that I was curious to read: the third edition of “The Definitive Guide To DAX” by my friends Marco Russo and Alberto Ferrari, and “Microsoft Power BI Visual Calculations” by my colleague Jeroen ter Heerdt, Madzy Stikkelorum and Marc Lelijveld. As I’ve said many times, I don’t write book reviews here (least of of reviews of books by friends or colleagues where I could never be unbiased), but I think there’s some value sharing my thoughts on these books.

“The Definitive Guide To DAX”, 3rd Edition

It’s generally accepted that the one book that anyone who is serious about Power BI should own is “The Definitive Guide To DAX”. If you don’t already own a copy you should buy one, but since most people who read my blog probably have one already the more interesting question to ask is what’s new in the third edition and whether it’s worth upgrading – especially since I’d seen Marco say that the book had been completely rewritten. I’ve heard the “completely rewritten” line before and I was sceptical but it turns out that it really is a very different book. It’s not completely rewritten because there is material there from previous editions but there are a lot of changes.

First of all, as you would expect, all the new additions to DAX since the second edition was published are covered including user defined functions, visual calculations, calendar-based time intelligence functionality and window functions. These are all really important features you will want to use in your semantic models and reports so this is the main reason you’d want to buy a copy of this edition.

Secondly, the main (and justfied) criticism of the previous editions was that they were, as we say in the UK, “heavy going”. They had absolutely all the information you would ever need but they were not the easiest books to read or understand. That has been addressed in the third edition: the tone is a little bit more friendly and difficult concepts are now explained visually as well as in text. As a result it’s easier to recommend the book for beginners.

Thirdly, some advanced topics (for example around performance tuning) have been dropped. For example I searched for the term “callback” in this new edition and found no mentions; that’s not true of the second edition. I have mixed feelings about this because it means the book isn’t as “definitive” as it used to be, but I can understand why it’s happened: with so much new content to add, keeping these advanced topics would have made an already long book too long. And let’s be honest, how often do you look at the details of a DAX query plan? If the aim is to teach DAX then cutting content means it’s easier for the reader to focus on the core concepts.

In summary, then, another great piece of work from Marco and Alberto and worth buying even if you have a copy of an earlier edition.

“Microsoft Power BI Visual Calculations”

A whole book about visual calculations? As I mentioned above, they’re covered in one chapter of “The Definitive Guide To DAX” but that book focuses on DAX; this one takes more time to explain the concepts and, crucially, includes a lot of practical examples of how to use them. Like user-defined functions, when visual calculations were released there was an explosion of community content showing how they can be used to solve problems that were difficult to solve in Power BI before – problems that no-one could have been anticipated that would be solved with visual calculations. The real value of this book is showing how to build a bump chart or a tornado chart with visual calculations and that makes it worth checking out.

Closing thoughts: why buy a book?

As you would expect, a lot of the information contained in these books is already available for free somewhere on the internet. And with AI you don’t even need to know how to search for it or stitch it all together – you can ask a question and get an answer customised to your exact scenario. So why buy books any more? I guess it depends on whether you only want to get your problems solved or understand how to solve problems yourself. For me (even though my attention span has eroded in recent years, just like everyone else’s) the only way to grasp really difficult concepts is through long-form written explanations or training courses, not fragments found in blog posts or 10-minute videos. I suspect that AI is the final nail in the coffin of the tech publishing industry but the tech book industry not being viable any more is not the same thing as tech books not being useful any more. Or maybe I’m just old-fashioned.

New Performance Optimisation for Excel PivotTables Connected To Power BI Semantic Models

Some good news: an important optimisation has rolled out for Excel PivotTables connected to Power BI semantic models! Back in 2019 I wrote about a very common problem affecting the MDX generated by PivotTables connected to Analysis Services where, when subtotals are turned off and grand totals are turned on, the query nevertheless returns the subtotal values. This led to extremely slow, expensive MDX queries being run and a lot of complaints. The nice people on the Excel team have now fixed this problem and PivotTables connected to Power BI semantic models generate MDX queries that only return the values needed by the PivotTable.

Here’s an example of a PivotTable connected to a published Power BI semantic model:

Note that the subtotals have been turned off but the grand totals are displayed – this is important. Here’s the MDX query generated for this PivotTable:

SELECT NON EMPTY 
{ /* GTOPT-BEGIN CSECTIONS=2 */ 
 /* GTOPT-SECT-BEGIN-1 Desc:GrandTotal */ 
{([Property Transactions].[New].[All],[Property Type].[Property Type Name].[All])}
 /* GTOPT-SECT-END-1 */ 
,
 /* GTOPT-SECT-BEGIN-2 Desc:Detailed */ 
{Hierarchize(CrossJoin({[Property Transactions].[New].[New].AllMembers}, 
{([Property Type].[Property Type Name].[Property Type Name].AllMembers)}))}
 /* GTOPT-SECT-END-2 */ 
} /* GTOPT-END */ 
 DIMENSION PROPERTIES PARENT_UNIQUE_NAME,HIERARCHY_UNIQUE_NAME ON COLUMNS  
 FROM [Model] 
 WHERE ([Measures].[Count Of Sales]) 
 CELL PROPERTIES VALUE, FORMAT_STRING, LANGUAGE, BACK_COLOR, FORE_COLOR, FONT_FLAGS

And here’s what this query returns:

There are 11 values displayed in the PivotTable and the MDX query returns 11 values. It’s what you’d expect but as I said, up to now, Excel would have generated an MDX query that returned 13 values – a query that also requested the subtotal values that aren’t displayed.

This optimisation should now be rolled out to 100% of Excel users. You can tell if you are using the new query pattern by looking for comments in the MDX code with the text “GTOPT” in – they’re easy to spot in the query shown above. Right now the optimisation only happens for PivotTables connected to Power BI semantic models but I’ve been told that in future it should also happen for PivotTables connected to Azure Analysis Services and SSAS; this is because some server-side optimisations are necessary to make the new MDX perform as well as possible.

You might be thinking that, despite my excitement, this is a very niche scenario but I assure you it’s not: Excel users very frequently create PivotTables that are formatted to look like tables, and having subtotals turned off and grand totals turned on is a key part of this. The more fields that are put on rows the more subtotals there are to calculate and the more the overhead increases; it’s not uncommon to find situations where the number of subtotal values is much greater than the number of values actually displayed in the PivotTable.

This doesn’t solve all the performance problems associated with PivotTables and Power BI though and more work is planned for the future.

[Thanks to Yaakov Ben Noon for driving this work!]

Calculate(), DAX Fusion And Filters On 0 In Power BI

Here’s a fun DAX performance tip that I found this week. Do you have measures that use Calculate() with a filter on a numeric column? Is one of the filters on the value 0? If so then this may affect you.

As always, a simple example is the best way of explaining the problem and the solution. Consider the following table in an Import mode semantic model:

Here are some measures that reference that table:

'Sales Amount' = SUM('Sales'[SalesAmount])
'Apples Sales' = CALCULATE([Sales Amount], Sales[ProductID]=0)
'Oranges Sales' = CALCULATE([Sales Amount], Sales[ProductID]=1)
'Pears Sales' = CALCULATE([Sales Amount], Sales[ProductID]=2)
'Grapes Sales' = CALCULATE([Sales Amount], Sales[ProductID]=3)

Let’s say you then have a report which shows the value of the ‘Oranges Sales’ measure:

Running the DAX query generated by this table in DAX Studio with Server Timings enabled shows that there is just one Storage Engine query generated by this DAX query:

Here’s the xmSQL for that single SE query:

SET DC_KIND="AUTO";
SELECT
    SUM ( 'Sales'[SalesAmount] )
FROM 'Sales'
WHERE
    'Sales'[ProductID] = 1;

The WHERE clause here filters on the ProductID 1, which is the ID of the product Oranges.

Now if you add the measures ‘Pear Sales’ and ‘Grapes Sales’ to the table visual:

…and run the query in DAX Studio again, you’ll still see a single SE query:

Here’s the xmSQL:

SET DC_KIND="AUTO";
SELECT
    'Sales'[ProductID],
    SUM ( 'Sales'[SalesAmount] ),
    SUM ( 'Sales'[SalesAmount] ),
    SUM ( 'Sales'[SalesAmount] )
FROM 'Sales'
WHERE
    'Sales'[ProductID] IN ( 1, 2, 3 ) ;

As you can see, the WHERE clause now filters on the ProductIDs 1, 2 or 3: the IDs of the three products used in the three measures. This is DAX fusion – specifically horizontal fusion – in action. It’s an optimisation where multiple filters on the same column can be combined into a single SE query. Fewer SE queries is generally better for performance. So far so good.

Now let’s add the measure ‘Apples Sales’ to the table visual:

Running the DAX query in DAX Studio now shows there are two SE queries:

The first SE query has the same xmSQL as the previous DAX query:

SET DC_KIND="AUTO";
SELECT
    'Sales'[ProductID],
    SUM ( 'Sales'[SalesAmount] ),
    SUM ( 'Sales'[SalesAmount] ),
    SUM ( 'Sales'[SalesAmount] )
FROM 'Sales'
WHERE
    'Sales'[ProductID] IN ( 1, 2, 3 ) ;

The second SE query has the following xmSQL and a WHERE clause that indicates it is retrieving the data for just Apples:

SET DC_KIND="AUTO";
SELECT
    SUM ( 'Sales'[SalesAmount] )
FROM 'Sales'
WHERE
    'Sales'[ProductID] IN ( null, 0 ) ;

Two SE queries can mean worse overall performance. Why is the filter on 0 in the ‘Apples Sales’ measure special? Why does it result in a second SE query, why does this second SE query filter on 0 or null, and why doesn’t horizontal fusion take place for Apples?

The answer lies with how DAX handles blanks and zeroes, something discussed in depth in this article by Marco Russo. The filter condition in the ‘Apples Sales’ measure:

'Apples Sales' = CALCULATE([Sales Amount], Sales[ProductID]=0)

..actually filters on 0 or blank and that’s why the xmSQL of that second SE query filters on 0 or null, and that in turn explains why horizontal fusion does not take place – all the other measures filter on a specific number, the ‘Apples Sales’ measure filters on the number 0 or blank.

The solution is to update the measures in the model to use the strictly equal to == operator like so:

'Sales Amount' = SUM('Sales'[SalesAmount])
'Apples Sales' = CALCULATE([Sales Amount], Sales[ProductID]==0)
'Oranges Sales' = CALCULATE([Sales Amount], Sales[ProductID]==1)
'Pears Sales' = CALCULATE([Sales Amount], Sales[ProductID]==2)
'Grapes Sales' = CALCULATE([Sales Amount], Sales[ProductID]==3)

After this change the DAX query that returns the measures for ‘Apples Sales’, ‘Oranges Sales’, ‘Pear Sales’ and ‘Grapes Sales’ now generates a single SE query, meaning that horizontal fusion is taking place for all measures:

Here’s the xmSQL for that query:

SET DC_KIND="AUTO";
SELECT
    'Sales'[ProductID],
    SUM ( 'Sales'[SalesAmount] ),
    SUM ( 'Sales'[SalesAmount] ),
    SUM ( 'Sales'[SalesAmount] ),
    SUM ( 'Sales'[SalesAmount] )
FROM 'Sales'
WHERE
    'Sales'[ProductID] IN ( 0, 1, 2, 3 ) ;

As you can see, the WHERE clause now filters on the Product IDs 0, 1, 2 or 3.

This example uses an Import mode model but this tip also applies to DirectQuery mode and because additional SE queries (which mean additional SQL queries) can have more of an impact on performance in DirectQuery mode then ensuring horizontal fusion takes place can be even more important in DirectQuery mode.

I think this tip could benefit a lot of semantic models out there. A lot of measures use Calculate() and filter on numeric columns: the customer I was working with this week had measures that filtered on a year offset column on their date dimension table (so filtering on offset 0 meant this year, filtering on offset 1 meant last year and so on) and I reproduced the problem on my sample DirectQuery semantic model based on the ContosoDW sample database with a filter on the NumberChildrenAtHome column of the Customer dimension table. Also, I can’t remember the last time I saw the strictly equal to operator in the wild. If you find this tip helps you, please let me know by leaving a comment!

Measuring Time To Display For Image Visuals In Power BI With Performance Analyzer

Carrying on my series on troubleshooting Power BI performance problems with Performance Analyzer, another situation where a report may be slow even when the DAX queries it generates against the underlying semantic model are fast is when you have large images displayed in an Image visual. Let’s see an example.

I created a Power BI report consisting of a single image visual that displayed a 25MB photo:

I then published the report – and this is important because the behaviour I’m describing here will only be relevant to published report – then cleared the browser cache, started Performance Analyzer and viewed the page with the image visual on. I then stopped Performance Analyzer, exported the results and visualised them with my custom visual. Here’s what I saw:

The Image visual took just over 6 seconds to load and almost all of that time is related to the Visual Container Resource Load event. You can guess that this is the time taken to load the image from the name of the event; looking at what happens behind the scenes when the report renders using Chrome DevTools confirms this.

It’s important to understand that the Image visual won’t take this long to load every time the report is rendered – caching means that the image will only need to be downloaded once. Still it’s another reminder that having a large image or a lot of small images on a page can have an impact on report performance.

Measuring Geocoding Performance In Power BI Map Visuals Using Performance Analyzer

When a user complains about their Power BI report being slow it’s normal for the developer to open up DAX Studio and start tuning the measures in the semantic model because 95% of the time the problem is the DAX. But there’s a whole class of report performance problems that are nothing to do with the semantic model or DAX and they are the ones that are much easier to troubleshoot now that Performance Analyzer is available in the browser as well as Desktop. Today I want to look at one of them: geocoding in map visuals.

What is geocoding? Let me show you an example. I have a semantic model with data about UK real estate sales where each row in the fact table represents the sale of some kind of property like a house or apartment. The model has the address of the property sold and the address includes a postcode (similar to a US zip code). I added an Azure Map visual onto a report and dragged my Postcode data into the Location well of the visual and got this:

Postcodes are just text values. How does Power BI know where each postcode is on a map? It does so by geocoding: sending each postcode to a web service which returns a latitude and longitude for that postcode which allows its location to be plotted. And if, as in this case, you have thousands of postcodes to geocode then this process can be slow.

Because geocoding can be slow Power BI will cache geocoded locations but you can turn this caching off on the Diagnostics tab of the Options dialog using the “Bypass geocoding cache” option:

I did this to get worst-case performance and then refreshed my Azure Map visual with Performance Analyzer running, then exported the output of Performance Analyzer and visualised it with my Performance Analyzer custom visual. Here’s what it showed:

What does this show us?

The visual as a whole took 107 seconds to display, as shown by the duration of the Visual Container Lifecycle event
The underlying DAX query (which returns 30002 rows – not all the postcodes but the limit of what can be displayed) took a fraction of a second to return
The visual took 6 seconds to render
The geocoding took 106 seconds

So, a great example of where a report is slow but where the model and its measures is not the cause and where Performance Analyzer is the only way to see what the actual cause is.

What can you do to improve performance? Well as I said, with caching enabled (as it is by default) then performance would be nowhere near this bad, but there are some things you can do. First of all it’s obvious that the visual itself displays more data than any user can make sense of and Power BI is sampling data in this case too, so redesigning the report so the user only sees a useable, comprehensible subset of the data on the map would be a good start. You could also geocode the data yourself during your ETL rather than waiting for Power BI to do it when it displays the report – postcodes are not going to change location – and this would give you latitudes and longitudes you can give the Azure Map visual instead. Finally, it’s probably worth experimenting with different visuals, including other map custom visuals, to see if they perform differently for your requirements.

Diagnosing Power BI DirectQuery Performance Problems Caused By SQL Queries That Return Large Resultsets

One very common cause of Power BI performance problems is having a table with a large number of rows on your report. It’s a problem I wrote about here, and while I used an Import mode for my example in that post I also mentioned that this can be an even bigger problem in DirectQuery mode: while the DAX query for the table visual might have a TOPN filter that asks for 502 rows, the query going back to the DirectQuery data source (usually a SQL query) may not have that filter applied and could return a much larger number of rows, which could then take a long time for Power BI to read. I wrote about this in more detail here and showed how you can diagnose the problem in Performance Analyzer by looking at the Execute DirectQuery event and ActualQueryDuration, RowsRead and DataReadDuration metrics. But now I have a custom visual to display Performance Analyzer export data, what does this look like? Also, what can Execution Metrics tell us?

Using a simple DirectQuery model built from the ContosoDW SQL Server sample:

…I built a report with a table visual whose DAX query triggered a single SQL query that returned 475038 rows:

Here’s what a Profiler trace that included the Execution Metrics event showed me:

Some things to notice here:

The DAX query takes 3.5 seconds, as seen in the Duration column for the Query End event and the durationMs Execution Metric
The DirectQuery End event has a Duration of 2 seconds, leaving a gap of 1.5 seconds that needs to be explained
This Duration of 2 seconds for the DirectQuery End event matches to the externalQueryExecutionTimeMs Exection Metric, which is 2.054 seconds, but the docs only say that this is the “Total time spent on executing all external datasource queries during the request” which is a bit vague
The actual explanation for the gap comes from the directQueryIterationTimeMs Execution Metric which is 1.1 seconds, although this is still 0.4 seconds short of the 1.5 second gap mentioned above
The directQueryTotalRows Execution Metric shows that 475038 rows were returned by the SQL query
Execution Metrics provide an aggregated summary of metrics at the DAX query level; in this case there is only one SQL query generated but if (as is often the case) there was more than one, it would be hard to know what each SQL query was contributing to the problem

Here’s what my custom visual showed with data from Performance Analyzer for the same DAX query:

Now here’s the same visual with the tooltip from the Execute Direct Query event which shows some of the metrics associated with that event, shown:

This shows something very similar to what the Execution Metrics event in Profiler showed:

The Execute DAX Query event has a duration of 3.5 seconds
The Execute Direct Query event has a duration of 2.1 seconds – meaning that again there is a gap to be explained, a gap where no activity seems to be taking place in the visual (clearly visible in the first of the two screenshots immediately above – the tooltip obscures this gap)
As mentioned in my older blog post, this gap is explained by the DataReadDuration metric (documented here) from the Execute Direct Query event – which, as shown in the tooltip in the screenshot above, is 1.1 seconds
The amount of time it takes to read all the data from a large resultset can only be measured from the client (ie Power BI) side – a lot of customers I work with measure SQL query performance on the server side and see fast queries, but a fast query that returns a large number of rows that all need to be sent to Power BI can of course be slow
The ActualQueryDuration metric, also shown in the tooltip, gives you the amount of time it took to get the first row back from the SQL query
Unlike the Execution Metrics Profiler event, this DataReadDuration metric is available for each SQL query generated by a DAX query, which means you can tell exactly which SQL query/queries are causing problems

What can we learn from this? Apart from the fact that table visuals with vertical scrollbars can be a really bad thing, any time you have a DirectQuery model that generates SQL queries that return a very large number of rows, you could be paying a very heavy price to read all those rows – especially if you are getting close to the Max Intermediate Row Set Count limit, which is set to 1 million rows by default. What can you do about this? Apart from redesigning your report, I blogged about a technique here where aggregations can help for scenarios involving degenerate dimensions; using the new calendar-based time intelligence feature can also help to reduce the number of rows returned by SQL queries, as I described here. In general you’ll have to try to tune the DAX in your measures and your model to see what you can do to optimise the SQL queries Power BI generates so they return fewer rows.

21st Blog Birthday: Centralised And Decentralised BI And AI

As the saying goes, history doesn’t repeat itself but it rhymes. While 2025 has seen the excitement around new advances in AI continue to grow in data and analytics as much as anywhere else, it’s also seen the re-emergence of old debates. In particular, one question has raised its head yet again: should there be a single, central place for your data to live and your security, semantic models, metrics and reports to be defined, or should you take a more distributed approach and delegate some of the responsibility for managing your data and defining and maintaining those models, metrics and reports to the wider business?

At first the answer seems obvious: centralisation is best. If you want a single version of the truth then all your data, all your semantic models and all your metrics should be centralised. Anything else leads to inconsistency between reports, inefficiencies, security threats and compliance issues. But while this is a noble ideal and is very appealing to central data teams building an empire I think history has already proved that this approach doesn’t really work. If it did, Microstrategy and Business Objects would have solved enterprise BI twenty years ago and all companies would have a long-established, lovingly curated central semantic model, sitting on an equally long-established, lovingly curated central data warehouse, that all business users love to use. That’s not the case though and there’s a reason why the self-service revolution of Tableau, Qlik and ultimately Power BI happened: old style centralised BI owned by a centralised data team solved many problems (notably the problems of the centralised data team) but not all, and most importantly not all those of the business users. I’m not saying that those older tools were bad or that centralised BI was a total failure, far from it; at best they provided an important set of quality-controlled core reports and at worst they were a convenient place for users to go to export data to Excel. But no-one can deny that those older tools died away for a reason and I feel like some modern data platforms are repeating the same mistake.

In contrast the Power BI approach – and now the approach of Fabric – of empowering business users within an environment where what they are doing can be monitored, governed and guided might seem dangerous but at the end of the day it’s more successful because it is grounded in the reality of how people use data. You can still manage your most important data and reports centrally but you have to accept that a lot, in fact most of the work that happens with data happens away from the centre. “Discipline at the core, flexibility at the edge” as my boss likes to say. This is as much a question of data culture as it is the technology that you use, but Power BI and Fabric support this approach by offering some tools that are easy to use for people whose day job might not be data and by being cheap enough to be enabled for a mass audience of users, while also providing other tools that appeal to the data professionals.

Central data teams sometimes think of their business users as children, and as a parent if you saw your six year-old pick up a bottle of vodka and try to take a swig you’d snatch it out of their hands in the same way that some data teams try to limit access to data and the tools to use with it. Business users aren’t children though, or if they are they are more like my pretty-much grown up children, and you can’t take that bottle of vodka away from them. If you do they’ll just go to the shops, buy another one and drink it out of your sight. Instead you can make sure they are aware of the dangers of alcohol, you can set an example of responsible consumption, you can educate them on how to make sophisticated cocktails as an alternative to drinking the cheap stuff neat. And while, inevitably, they will still make mistakes (think of that spaghetti Power BI model that takes four hours to refresh and two minutes to display a page as the equivalent of a teenage hangover) and some may go off the rails completely, as an approach it’s more likely to be successful overall than total prohibition in my experience.

This is an old argument and one you’ve heard before I’m sure. Why am I talking about it again? Well apart from the fact that, as I mentioned, some vendors are selling the centralise-everything dream once more, I think we’re on the verge of another self-service BI revolution that’s going to be even bigger than the one that happened fifteen or so years ago and maybe as big as the one that happened when Excel landed on desktop PCs forty years ago, a revolution driven by AI. Whether I like it or not or whether it will lead to better decisions or not is irrelevant, it’s coming. Developers whose opinion I trust like Jeffrey Wang are already saying how it’s transforming their work. More importantly I’ve tried it, it let me do stuff I couldn’t do before and even if the quality was not great it did what I needed, and most of all it was fun. Once business users whose job it is to crunch data get their hands on these tools (when the tools are ready – I don’t think they are quite yet), understand what they can do, and start having fun themselves it will be impossible to stop them. An agent grabbing a chunk of data from your centralised, secure data catalog and then taking it away to who-knows-where to do who-knows-what will be the new version of exporting to Excel. Already a lot of the BI managers I talk to are aware of the extent that their business users are feeding data into ChatGPT on their personal devices to get their work done, even if company rules tell them not to. We need to accept that business users will want to use AI tools and provide a flexible, safe, governed way for these new ways of working with data to occur.

No data platform is ready for this future yet because no-one knows exactly what that future will look like. I can imagine that some things will be familiar: there will probably still be reports as well as natural language conversations and there will probably still be semantic models behind the scenes somewhere. How those reports and semantic models get built and who (or what) does the building remains to be seen. The only thing I am sure of is that business users will have more powerful tools available to them, that they will use these tools and they will get access to the data they need to use with these tools.

Chris Webb's BI Blog

Report On SAP And Salesforce Data In Fabric With Business Process Solutions

Like this:

Power BI, Parallelism And Dependencies Between SQL Queries In DirectQuery Mode

Like this:

Measuring Power BI Report Page Load Times

Like this:

New Books: “The Definitive Guide To DAX” 3rd Edition And “Microsoft Power BI Visual Calculations”

Like this:

New Performance Optimisation for Excel PivotTables Connected To Power BI Semantic Models

Like this:

Calculate(), DAX Fusion And Filters On 0 In Power BI

Like this:

Measuring Time To Display For Image Visuals In Power BI With Performance Analyzer

Like this:

Measuring Geocoding Performance In Power BI Map Visuals Using Performance Analyzer

Like this:

Diagnosing Power BI DirectQuery Performance Problems Caused By SQL Queries That Return Large Resultsets

Like this:

21st Blog Birthday: Centralised And Decentralised BI And AI

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: