Diagnosing Power BI DirectQuery Connection Limit Performance Problems With Execution Metrics

A few months ago I blogged about the new limits available for the Maximum Connections Per Data Source property in Premium and why the number of connections that a DirectQuery semantic model can open to a data source is so important for report performance. At that time, however, there was no way for you to know whether the performance of your reports was being affected by a lack of available connections. The good news is that, with the announcement this week of the new Execution Metrics event in Log Analytics and Profiler, you can now see when this is happening.

To illustrate this I created a Power BI semantic model with three tables in DirectQuery mode connected to SQL Server. Each table consisted of a single row and column and was bound to a SQL query that took 10 seconds to run (using the TSQL custom function I blogged about here).

I then built a report connected to this model containing three cards, each of which displayed the single value returned by each of these three tables. As you would expect, the DAX queries associated with each of these card visuals took 10 seconds to run when run in isolation.

I left the Max Connections Per Data Source property set to the default value of 10 and published the model and the report to the Power BI Service:

I then opened the report and let it render to make sure there were some connections open in Power BI’s connection pool, then re-ran the report while running a Profiler trace that recorded the Query Begin/End, DirectQuery Begin/End and the new Execution Metrics event.

As you might expect, the three DAX queries for the three cards were run in parallel and each query took around 10 seconds to run. Each DAX query generated a single SQL query and each SQL query also took around 10 seconds to run. Here’s what the Execution Metrics event returned for the first DAX query (the Execution Metrics for the other two DAX queries were more or less identical):

{
	"timeStart": "2024-05-17T10:39:53.833Z",
	"timeEnd": "2024-05-17T10:40:04.381Z",

	"durationMs": 10548,
	"datasourceConnectionThrottleTimeMs": 0,
	"directQueryConnectionTimeMs": 4,
	"directQueryExecutionTimeMs": 10105,
	"directQueryIterationTimeMs": 126,
	"directQueryTotalTimeMs": 10109,
	"queryProcessingCpuTimeMs": 0,
	"totalCpuTimeMs": 16,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 0,

	"directQueryTimeoutMs": 225000,
	"tabularConnectionTimeoutMs": 225000,

	"commandType": 27,
	"queryDialect": 3,
	"queryResultRows": 1,
	"directQueryRequestCount": 1,
	"directQueryTotalRows": 1
}

The datasourceConnectionThrottleTimeMs metric is 0, which indicates that no time was spent waiting for a connection before the model could run the sole SQL query linked to this DAX query.

I then created a copy of the semantic model, changed the Maximum Connections Per Data Source property to 1, and published this new model and an identical report and ran a Profiler trace while running the report. Changing the Maximum Connections Per Data Source property to 1 meant that this new model could only have one connection open to SQL Server at any one time and therefore could only run one SQL query at a time.

Here’s what the Profiler trace looked like:

The first thing to note is that even though the three DAX queries were still run in parallel they now took 10, 21 and 33 seconds to run, as seen from the Duration column for each of the three Query End events. The SQL queries generated by each DAX query, however, still only took 10 seconds each, as shown from the Duration column for the three DirectQuery End events.

Here’s what the Execution Metrics event for the last DAX query to finish (the one that took 33 seconds) returned:

{
	"timeStart": "2024-05-17T10:12:19.436Z",
	"timeEnd": "2024-05-17T10:12:52.341Z",

	"durationMs": 32906,
	"datasourceConnectionThrottleTimeMs": 21896,
	"directQueryConnectionTimeMs": 21901,
	"directQueryExecutionTimeMs": 10531,
	"directQueryIterationTimeMs": 129,
	"directQueryTotalTimeMs": 32434,
	"queryProcessingCpuTimeMs": 0,
	"totalCpuTimeMs": 0,
	"executionDelayMs": 9,

	"approximatePeakMemConsumptionKB": 0,

	"directQueryTimeoutMs": 204000,
	"tabularConnectionTimeoutMs": 225000,

	"commandType": 27,
	"queryDialect": 3,
	"queryResultRows": 1,
	"directQueryRequestCount": 1,
	"directQueryTotalRows": 1
}

This clearly shows that out of the total query duration of 33 seconds, 22 seconds of that was spent waiting for a connection to become available, as indicated by the datasourceConnectionThrottleTimeMs value. This was because this DAX query had to wait for the SQL queries generated by the two other DAX queries to complete before it could get a connection back to the source. As for the other two DAX queries generated for this report, the query that took 10 seconds still had a datasourceConnectionThrottleTimeMs of 0 because it was the first query to run and there was a connection available for it to use, while the second query to run that took 22 seconds had a datasourceConnectionThrottleTimeMs of 11 seconds because it only had to wait for the first DAX query’s SQL query to run before it could use the connection.

What should you do if you have performance problems and can see that datasourceConnectionThrottleTimeMs is consistently high? Before anything else you need to tune your model and your data source (see here and here for some tips); the more SQL queries your model generates when a report is run, and the slower those queries are, the more likely you are to run out of connections. If that doesn’t work and you are using Pro/Shared or a capacity that is lower than an F64, then you will need buy an F64 capacity or greater (see the table here for the maximum number of DirectQuery connections supported by each SKU) all to allow for more than 10 connections back to your source. Once you have done that you will also need to set the Max Connections Per Data Source property in your model to something larger than 10. However, you also need to be aware that increasing the maximum number of connections can make performance worse if your data source cannot handle the number of SQL queries that Power BI is trying to run on it.

New HostProperties Data In The ApplicationContext Column In Power BI Log Analytics

If you’re a fan of using Log Analytics for monitoring Power BI activity, then you may have noticed there’s some new data in the ApplicationContext column of the PowerBIDatasetsWorkspace table. Up until recently the ApplicationContext column only contained IDs that identify the report and the visual that generated a DAX query (something I blogged about here); it now contains additional information on the type of Power BI report that generated the query and an ID for the user session.

Here’s an example of the payload with the new data in bold:

[
  {
    "ReportId": "2beeb311-56c8-471a-83a3-6d7523d40dc7",
    "VisualId": "477b5dc44249fe897411",
    "HostProperties": {
      "ConsumptionMethod": "Power BI Web App",
      "UserSession": "a3d941bd-c374-4e0e-b911-5086310cb345"
    }
  }
]

Here’s an example KQL query that returns the new ConsumptionMethod and UserSession data from HostProperties:

PowerBIDatasetsWorkspace
| where TimeGenerated > ago(1d)
| where OperationName == 'QueryEnd' 
| where ApplicationContext!=""
| extend hp = parse_json(ApplicationContext)
| extend ConsumptionMethod = hp.Sources[0].HostProperties.ConsumptionMethod, 
UserSession = hp.Sources[0].HostProperties.UserSession
|project TimeGenerated, EventText, ApplicationName, ConsumptionMethod, UserSession
| order by TimeGenerated desc

ConsumptionMethod is less useful than it might first appear: at the time of writing it only returns data for Power BI reports (and not other types of report such as Paginated reports), although it will allow you to differentiate between different methods of viewing a Power BI report such as viewing via a browser or viewing via Teams. It should be used in combination with the ApplicationName column to get a fuller picture of the way reports are being consumed.

UserSession is something that I need to explore in more detail. Grouping user activity into sessions is something I blogged about here, but this is the user session ID used internally and therefore a lot more reliable. I don’t know the rules which govern how activity is grouped into sessions though, so I will only blog about this when I find out more.

[Thanks to my colleague Andreea Sandu for this information]

Getting Report Visual IDs With Power BI Desktop Developer Mode

Back in 2021 I wrote a post showing how you can link a DAX query generated by a Power BI report in Log Analytics to a visual in a Power BI report. In that post I showed how you can get the unique IDs for each visual in a report using Performance Analyzer, an approach which works but has drawbacks. Then, in 2022 I wrote another post showing how you can get the same IDs using the Power BI Embedded Analytics Playground, which again works but isn’t perfect. After that, this August, my colleague Rui Romano pinged me on Teams to point out that the new Power BI Desktop Developer Mode and the .pbip file format provides the best and easiest way to get these IDs.

If you do save a Power BI report in .pbip format, look in the report folder and open the report.json file (where the report definition is stored) with a tool like Visual Studio Code then you can find the IDs for visuals quite easily. The sections array lists all the pages in the report; the visualContainers array within that lists all the visuals on a page; and the visual’s ID can be found in the config property as its name:

As a result, if you’re looking in Log Analytics and you need to understand which visual a query was generated by, you can just copy the ID from the ApplicationContext column of a QueryEnd event and search for that ID in the report.json file. Even when you’ve found that ID, though, it can be hard to know which visual it relates to. This is where a tool I blogged about a few weeks ago, PBI Inspector, comes in handy: as part of its output it draws wireframes of each report page showing each visual, its type and its ID. Here’s the report page from the example above, with two visuals on it, and the associated PBI Inspector wireframe:

Yet another way in which Power BI Developer Mode makes our lives better!

Exploring Power BI Import Mode Throughput With Log Analytics, Part 3: Throughput Variations During A Refresh

In the second post in this series I discussed a KQL query that can be used to analyse Power BI refresh throughput at the partition level. However, if you remember back to the first post in this series, it’s actually possible to get much more detailed information on throughput by looking at the ProgressReportCurrent event, which fires once for every 10000 rows read during partition refresh.

Here’s yet another mammoth KQL query that you can use to analyse the ProgressReportCurrent event data:

PowerBIDatasetsWorkspace
| where TimeGenerated > ago(1d)
| where OperationName == "ProgressReportCurrent" or OperationName=="ProgressReportBegin" or OperationName == "ProgressReportEnd"
| where XmlaObjectPath!=""
| where OperationDetailName=="ReadData"
| project OperationName, XmlaObjectPath, EventText, TimeGenerated, ProgressCounter, OperationDetailName, XmlaRequestId, ObjectPathRequestId = strcat(XmlaObjectPath ,XmlaRequestId)
| partition by ObjectPathRequestId
(
    sort by TimeGenerated asc
    | scan declare( PrevEventTimeGenerated:datetime, TimeSincePrevEvent:timespan=time(0s), TimeSinceRefreshBegin:timespan=time(0s), NumberOfRowsRead:int=0) with 
    (
        step RefreshStarts: OperationName=="ProgressReportBegin";
        step RefreshInProgress: OperationName == "ProgressReportCurrent" =>
            PrevEventTimeGenerated = iff(ProgressCounter==10000, RefreshStarts.TimeGenerated, RefreshInProgress.TimeGenerated),
            TimeSincePrevEvent = iff(ProgressCounter==10000, TimeGenerated - RefreshStarts.TimeGenerated, TimeGenerated-RefreshInProgress.TimeGenerated),
            TimeSinceRefreshBegin = TimeGenerated - RefreshStarts.TimeGenerated,
            NumberOfRowsRead = iff(ProgressCounter==10000, 10000, ProgressCounter-RefreshInProgress.ProgressCounter);
        step RefreshEnds: OperationName == "ProgressReportEnd" => 
            PrevEventTimeGenerated = RefreshInProgress.TimeGenerated,
            TimeSincePrevEvent = TimeGenerated - RefreshInProgress.TimeGenerated,
            TimeSinceRefreshBegin = TimeGenerated - RefreshStarts.TimeGenerated,
            NumberOfRowsRead = ProgressCounter - RefreshInProgress.ProgressCounter;
    )
    | where OperationName == "ProgressReportCurrent" or OperationName == "ProgressReportEnd"
)
| project XmlaObjectPath, Table = split(XmlaObjectPath,".", 2)[0], Partition = split(XmlaObjectPath,".", 3)[0], OperationName, 
StartTime = format_datetime(PrevEventTimeGenerated,'yyyy-MM-dd HH:mm:ss.fff' ), EndTime = format_datetime(TimeGenerated,'yyyy-MM-dd HH:mm:ss.fff' ), 
ProgressCounter, SecondsSincePreviousEvent = TimeSincePrevEvent/1s, SecondsSinceRefreshBegin=TimeSinceRefreshBegin/1s,NumberOfRowsRead, 
NumberOfRowsReadPerSecond=NumberOfRowsRead/(TimeSincePrevEvent/1s), XmlaRequestId

Here’s a brief explanation of what it does:

It filters the Log Analytics data down to get events from the last day and just the ProgressReportCurrent events, as well as the ProgressReportBegin/End events which are fired before and after the ProgressReportCurrent events.
It then splits the data into groups of rows (‘partitions’ in KQL, but of course not the partitions that are being refreshed) by a combination of XmlaRequestId (ie the refresh operation) and XmlaObjectPath (ie the partition that is being refreshed)
For each group of rows it will then:
- Find the ProgressReportBegin event and from this get the time when data started to be read from the source
- Get all subsequent ProgressReportCurrent events and calculate the amount of time elapsed since the previous event (which might be the ProgressReportBegin event or a previous ProgressReportCurrent event) and the number of rows read
- When the ProgressReportEnd event is encountered, calculate the amount of time elapsed since the previous ProgressReportCurrent event and the number of rows (which will be less than 10000) read since then
Filter out the ProgressReportBegin events because we don’t need them any more
Finally, add columns that splits out the table name and partition name and calculates the number of rows read per second for each row by dividing the number of rows read for each event by the amount of time elapsed since the previous event

What can this query tell us about throughput?

First of all, something interesting but not necessarily useful. At least for the data source I’m using for my tests, when I plot a column chart with the number of rows read on the x axis and the amount of time elapsed since the last event on the y axis (ie the amount of time it takes to read 10000 rows for all but the last column) then I noticed that every 200000 rows something happens to slow down the read:

I have no idea what this is, whether it’s a quirk of this particular source or connector, but it’s a great example of the kind of patterns that become obvious when you visualise data rather than look at a table of numbers.

Plotting time on the x axis of a line chart and the cumulative total of rows read on the y axis gives you something more useful. Here’s the chart for one of the refreshes mentioned in my last post where four partitions of the same table are refreshed in parallel:

In this case throughput is fine up until the end of the refresh at which point something happens to the February, March and April partitions but not the January partition to slow them down for about 30 seconds, after which throughput goes back to what it was before. Here’s the same chart zoomed in a bit:

Here’s the same problem shown in the first graph above, where the number of rows read is on the x axis, showing how for example with the April partition there’s a sudden spike where it takes 14 seconds to read 10000 rows rather than around 0.3 seconds:

What is this, and why isn’t the January partition affected? Maybe it was a network issue or caused by something happening in the source database? Looking at another refresh that also refreshes the same four partitions in parallel, it doesn’t seem like the same thing happens – although if you look closely at the middle of the refresh there might be a less pronounced flattening off:

Again, the point of all this is not the mysterious blips I’ve found in my data but the fact that if you take the same query and look at your refreshes, you may find something different, something more significant and something you can explain and do something about.

Exploring Power BI Import Mode Throughput With Log Analytics, Part 2: High-Level Statistics

In the first post in this series I described the events in Log Analytics that can be used to understand throughput – the speed that Power BI can read from your dataset when importing data from it – during refresh. While the individual events are easy to understand when you look at a simple example they don’t make it easy to analyse the data in the real world, so here’s a KQL query that takes all the data from all these events and gives you one row per partition per refresh:

//Headline stats for partition refresh with one row for each partition and refresh
//Get all the data needed for this query and buffer it in memory
let RowsForStats = 
materialize(
  PowerBIDatasetsWorkspace
| where TimeGenerated > ago(1d)
| where OperationName == "ProgressReportEnd"
| where OperationDetailName == "ExecuteSql" or OperationDetailName == "ReadData" 
or (OperationDetailName == "TabularRefresh" and (EventText contains "partition"))  
);
//Get just the events for the initial SQL execution phase
let ExecuteSql = 
RowsForStats
| where OperationDetailName == "ExecuteSql"
| project XmlaRequestId, XmlaObjectPath, 
ExecuteSqlStartTime = format_datetime(TimeGenerated - (DurationMs * 1ms),'yyyy-MM-dd HH:mm:ss.fff' ), 
ExecuteSqlEndTime = format_datetime(TimeGenerated,'yyyy-MM-dd HH:mm:ss.fff' ), 
ExecuteSqlDurationMs = DurationMs, ExecuteSqlCpuTimeMs = CpuTimeMs;
//Get just the events for the data read and calculate rows read per second
let ReadData =
RowsForStats
| where OperationDetailName == "ReadData"
| project XmlaRequestId, XmlaObjectPath, 
ReadDataStartTime = format_datetime(TimeGenerated - (DurationMs * 1ms),'yyyy-MM-dd HH:mm:ss.fff' ), 
ReadDataEndTime = format_datetime(TimeGenerated,'yyyy-MM-dd HH:mm:ss.fff' ), 
ReadDataDurationMs = DurationMs, ReadDataCpuTime = CpuTimeMs, 
TotalRowsRead = ProgressCounter, RowsPerSecond = ProgressCounter /(toreal(DurationMs)/1000);
//Get the events for the overall partition refresh
let TabularRefresh =
RowsForStats
| where OperationDetailName == "TabularRefresh"
| parse EventText with * '[MashupCPUTime: ' MashupCPUTimeMs:long ' ms, MashupPeakMemory: '  MashupPeakMemoryKB:long ' KB]'
| project XmlaRequestId, XmlaObjectPath, 
TabularRefreshStartTime = format_datetime(TimeGenerated - (DurationMs * 1ms),'yyyy-MM-dd HH:mm:ss.fff' ), 
TabularRefreshEndTime = format_datetime(TimeGenerated,'yyyy-MM-dd HH:mm:ss.fff' ), 
TabularRefreshDurationMs = DurationMs, TabularRefreshCpuTime = CpuTimeMs, 
MashupCPUTimeMs, MashupPeakMemoryKB;
//Do an inner join on the three tables so there is one row per partition per refresh
ExecuteSql
| join kind=inner ReadData on XmlaRequestId, XmlaObjectPath
| join kind=inner TabularRefresh on XmlaRequestId, XmlaObjectPath
| project-away XmlaRequestId1, XmlaRequestId2, XmlaObjectPath1, XmlaObjectPath2
| extend Table = tostring(split(XmlaObjectPath,".", 2)[0]), Partition = tostring(split(XmlaObjectPath,".", 3)[0])
| project-reorder XmlaRequestId, Table, Partition
| order by XmlaRequestId, ExecuteSqlStartTime desc

It’s a bit of a monster query but what it does is quite simple:

First it gets all the events relating to partition refresh in the past 1 day (which of course you can change) and materialises the results.
Then it filters this materialised result and gets three sets of tables:
- All the ExecuteSql events, which tell you how long the data source took to start returning data and how much CPU time was used.
- All the ReadData events, which tell you how long Power BI took to read all the rows from the source after the data started to be returned, how much CPU time was used, and how many rows were read. Dividing duration by rows read lets you calculate the number of rows read per second during this phase.
- All the TabularRefresh events, which give you overall data on how long the partition refresh took, how much CPU time was used, plus information on Power Query peak memory usage and CPU usage.

What can this tell us about refresh throughput though? Let’s use it to answer some questions we might have about throughput.

What is the impact of parallelism on throughput? I created a dataset on top of the NYC taxi data Trip table with a single table, and in that table created four partitions containing data for January, February, March and April 2013, each of which contained 13-15 million rows. I won’t mention the type of data source I used because I think it distracts from what I want to talk about here, which is the methodology rather than the performance characteristics of a particular source.

I then ran two refreshes of these four partitions: one which refreshed them all in parallel and one which refreshed them sequentially, using custom TSML refresh commands and the maxParallelism property as described here. I did a refresh of type dataOnly, rather than a full refresh, in the hope that it would reduce the number of things happening in the Vertipaq engine during refresh that might skew my results. Next, I used the query above as the source for a table in Power BI (for details on how to use Log Analytics as a source for Power BI see this post; I found it more convenient to import data rather than use DirectQuery mode though) to visualise the results.

Comparing the amount of time taken for the SQL query used to start to return data (the ExecuteSqlDurationMs column from the query above) for the four partitions for the two refreshes showed the following:

The times for the four partitions vary a lot for the sequential refresh but are very similar for the parallel refresh; the January partition, which was refreshed first, is slower in both cases. The behaviour I described here regarding the first partition refreshed in a batch could be relevant.

Moving on to the Read Data phase, looking at the number of rows read per second (the RowsPerSecond column from the query above) shows a similar pattern:

There’s a lot more variation in the sequential refresh. Also, as you would expect, the number of rows read per second is much higher when partitions are refreshed sequentially compared to when they are refreshed in parallel.

Looking at the third main metric, the overall amount of time taken to refresh each partition (the TabularRefreshDurationMs column from the query above) again shows no surprises:

Each individual partition refreshes a lot faster in the sequential refresh – almost twice as fast – compared to the parallel refresh. Since four partitions are being refreshed in parallel during the second refresh, though, this means that any loss of throughput for an individual partition as a result of refreshing in parallel is more than compensated for by the parallelism, making the parallel refresh faster overall. This can be shown using by plotting the TabularRefreshStartTime and TabularRefreshEndTime columns from the query above on a timeline chart (in this case the Craydec Timelines custom visual) for each refresh and each partition:

On the left of the timeline you can see the first refresh where the partitions are refreshed sequentially, and how the overall duration is just over 20 minutes; on the right you can see the second refresh where the partitions are refreshed in parallel, which takes just under 10 minutes. Remember also that this is just looking at the partition refresh times, not the overall time taken for the refresh operation for all partitions, and it’s only a refresh of type dataOnly rather than a full refresh.

So does this mean more parallelism is better? That’s not what I’ve been trying to say here: more parallelism is better for overall throughput in this test but if you keep on increasing the amount of parallelism you’re likely to reach a point where it makes throughput and performance worse. The message is that you need to test to see what the optimal level of parallelism – or any other factor you can control – is for achieving maximum throughput during refresh.

These tests only show throughput at the level of the ReadData event for a single partition, but as mentioned in my previous post there is even more detailed data available with the ProgressReportCurrent event. In my next post I’ll take a closer look at that data.

[Thanks to Akshai Mirchandani for providing some of the information in this post, and hat-tip to my colleague Phil Seamark who has already done some amazing work in this area]

Exploring Power BI Import Mode Throughput With Log Analytics, Part 1: The Events

If you’re tuning refresh for a Power BI Import mode dataset one of the areas you’ll be most interested in is throughput, that is to say how quickly Power BI can read data from the data source. It can be affected by a number of different factors: how quickly the data source can return data; network latency; the efficiency of the connector you’re using; any transformations in Power Query; the number of columns in the data and their data types; the amount of other objects in the same Power BI dataset being refreshed in parallel; and so on. How do you know if any or all of these factors is a problem for you? It’s a subject that has always interested me and now that Log Analytics for Power BI datasets is GA we have a powerful tool to analyse the data, so I thought I’d do some testing and write up my findings in a series of blog posts.

The first thing to understand is what events there are in Log Analytics that provide data on throughput. The events to look at are ProgressReportBegin, ProgressReportCurrent and ProgressReportEnd (found in the OperationName column), specifically those with OperationDetailName ExecuteSql, ReadData and Tabular Refresh. Consider the following KQL query that looks for at this data for a single refresh and for a single partition:

let RefreshId = "e5edc0de-f223-4c78-8e2d-01f24b13ccdc";
let PartitionObjectPath = "28fc7514-d202-4969-922a-ec86f98a7ea2.Model.TIME.TIME-ffca8cb8-1570-4f62-8f04-993c1d1d17cb";
PowerBIDatasetsWorkspace
| where TimeGenerated > ago(3d)
| where XmlaRequestId == RefreshId
| where XmlaObjectPath == PartitionObjectPath
| where OperationDetailName == "ExecuteSql" or OperationDetailName == "ReadData" or OperationDetailName == "TabularRefresh"
| project XmlaObjectPath, Table = split(XmlaObjectPath,".", 2)[0], Partition = split(XmlaObjectPath,".", 3)[0], 
TimeGenerated, OperationName, OperationDetailName, EventText, DurationMs, CpuTimeMs, ProgressCounter
| order by XmlaObjectPath, TimeGenerated asc;

Some notes on what this query does:

All the data comes from the PowerBIDatasetsWorkspace table in Log Analytics
I’ve put an arbitrary filter on the query to only look for data in the last three days, which I know contains the data for the specific refresh I’m interested in
An individual refresh operation can be identified by the value in the XmlaRequestId column and I’m filtering by that
An individual partition being refreshed can be identified by the value in the XmlaObjectPath column and I’m filtering by that too
The value in the XmlaObjectPath column can be parsed to obtain both the table name and partition name
The query filters the events down to those mentioned above

Here are some of the columns from the output of this query, showing data for the refresh of a single table called TIME with a single partition:

Some more notes on what we can see here:

The TimeGenerated column gives the time of the event. Although the time format shown here only shows seconds, it actually contains the time of the event going down to the millisecond level – unlike in a Profiler trace, where time values are rounded to the nearest second and are therefore lot less useful.
The first event returned by this query is a ProgressReportBegin event of type TabularRefresh which marks the beginning of the partition refresh.
As I blogged here, after that are a pair of ProgressReportBegin/End events of type ExecuteSql. The value in the DurationMs column tells you how long it takes for the data source (which includes time taken by the actual query generated against the data source and any Power Query transformations) to start to return data – which was, in this case, 6016 milliseconds or 6 seconds.
Next there is a ProgressReportBegin event which indicates the beginning of data being read from the source.
After that there are a series of ProgressReportCurrent events which mark the reading of chunks of 10000 rows from the source. The ProgressCounter column shows the cumulative number of rows read.
Next there is a ProgressReportEnd event that marks the end of the data read. The ProgressCounter shows the total number of rows read (which must be less than 10000 rows greater than the value in the ProgressCounter column for the previous ProgressReportCurrent event); the DurationMs column shows the total time taken to read the data. In this case 86430 rows of data were read in 1.4 seconds.
Finally there is a ProgressReportEnd event of type TabularRefresh, the pair of the first event shown returned. It not only shows the total amount of time taken and the CPU time used to refresh the partition (which includes other operations that are nothing to do with throughput such as compressing data and building dictionaries) in the DurationMs column, as I blogged here it also includes data on the CPU and peak memory used by Power Query during the refresh. In this case the total refresh time was 7.6 seconds, so not much more than the time taken to read the data.

Clearly there’s a lot of useful, interesting data here, but how can we analyse it or visualise it? Stay tuned for my next post…

Monitoring Power BI Dataset Refresh Memory And CPU Usage With Log Analytics

Maybe the fourth- or fifth-most exciting Power BI-related announcement last month (admittedly it was an exciting month) was that Log Analytics for Power BI datasets is now GA and you can now link multiple Power BI workspaces to a single Log Analytics workspace. This, for me, means that enabling Log Analytics has gone from being useful to essential for anyone interested in monitoring Analysis Services engine activity in an enterprise Power BI/Fabric deployment. It also works with Direct Lake datasets too!

Anyway, following on from my recent posts on measuring memory and CPU usage during dataset refresh, both for the refresh as a whole and for individual partitions, here are some KQL queries that allow you to extract that information from Log Analytics.

Here’s the KQL query to get the memory and CPU usage for a refresh (see this post for details):

PowerBIDatasetsWorkspace
| where OperationName == "CommandEnd"
| parse-where EventText with * '[PeakMemory: ' PeakMemoryKB:long ' KB, MashupCPUTime: ' MashupCPUTimeMs:long ' ms, MashupPeakMemory: '  MashupPeakMemoryKB:long ' KB]'
| extend DatasetName=ArtifactName
| project TimeGenerated, CpuTimeMs, DurationMs, MashupCPUTimeMs, MashupPeakMemoryKB, PeakMemoryKB, XmlaRequestId, DatasetName, PowerBIWorkspaceName

And here’s the KQL query to get the memory and CPU usage for individual partitions/Power Query queries (see this post for details):

PowerBIDatasetsWorkspace
| where OperationName == "ProgressReportEnd"
| parse-where EventText with * '[MashupCPUTime: ' MashupCPUTimeMs:long ' ms, MashupPeakMemory: '  MashupPeakMemoryKB:long ' KB]'
| extend Table=split(XmlaObjectPath,'.').[2], Partition=split(XmlaObjectPath,'.').[3]
| extend DatasetName=ArtifactName
| project TimeGenerated, Table, Partition, CpuTimeMs, DurationMs, MashupCPUTimeMs, MashupPeakMemoryKB, XmlaRequestId, DatasetName, PowerBIWorkspaceName

You could use these queries in a Power BI DirectQuery dataset to report on all your refreshes, similar to what I did in this series of posts last year. The XmlaRequestId columns contains the unique identifier for the refresh operation so you could build a one-to-many relationship between the table with the refresh-level data and the table with the partition-level data. Maybe I’ll get round to building that report sometime…

[Thanks to Jast Lu, who wrote the original version of these queries]

Understanding WaitTime In Power BI

Another new metric has appeared in Profiler/Log Analytics recently, added to the end of the query text shown in the TextData column for the Query End event. It’s called WaitTime:

What does WaitTime represent? Here’s the technical explanation: it’s the wait time on the query thread pool in the Analysis Services engine before the query starts to run. But what does this mean for you as someone trying to tune DAX queries in Power BI?

As I explained recently in my post on load testing in Power BI, when you publish your dataset to the Power BI Service it runs on an instance of a version of the Analysis Services engine on a node somewhere inside the infrastructure that we at Microsoft manage for you. You don’t see any of this, but there’s still a machine with a limited amount of CPU and memory available and it can get overloaded if there are too many expensive queries running, or too many datasets being refreshed at the same time, or even one large dataset being refreshed; part of the magic of the Power BI Service is how we move datasets around to ensure this happens as infrequently as possible. If, however, you’re very unlucky you may run a query on a dataset that’s running on an overloaded machine and the performance of that query may be affected because the CPU is too busy. In that case you may see a value for WaitTime that is greater than zero and this means that your query was slower than it might otherwise have been.

This can happen for datasets in both Import mode and DirectQuery mode. It can happen because of something that’s happening on the dataset you’re querying – such as a load test or a full refresh – or it can happen because someone else is doing something similar with a dataset that happens to be on the same node as yours and the Power BI Service hasn’t managed to move your dataset to a quieter node yet. It is not the same thing as throttling in Premium (or “interactive request delays” as it’s officially called) but if you have a capacity that’s being throttled because of something you’re doing with a specific dataset then queries on that dataset may also have a WaitTime greater than zero.

I haven’t seen a non-zero value for WaitTime yet, but I only found out about it yesterday so I’m looking forward to seeing whether I see one next time I’m doing some performance tuning. If you see one let me know in the comments.

[Thanks once again to Akshai Mirchandani for this information]

Measuring Memory And CPU Usage In Power BI During Dataset Refresh

This post is a follow-up to my recent post on identifying CPU and memory-intensive Power Query queries in Power BI. In that post I pointed out that Profiler and Log Analytics now gives you information on the CPU and memory used by an individual Power Query query when importing data into Power BI. What I didn’t notice when I wrote that post is that there is also now information available in Profiler and Log Analytics that tells you about peak memory and CPU usage across all Power Query queries for a single refresh in the Power BI Service, as well as memory usage for the refresh as a whole.

Using the same dataset from my previous post, I ran a Profiler trace on the workspace and captured the Command Begin and Command End events while I refreshed the dataset. Here’s what Profiler shows for the Command End event:

In the upper pane, the Duration column tells you how long the refresh took in milliseconds and the CPUTime column tells you how much CPU was used by both the Analysis Services engine and the Power Query engine during the refresh. This is not new, and I wrote about the CPUTime column last year here.

In the lower pane where the TMSL for the refresh operation is shown – this is the text from the TextData column – underneath the TMSL there are three new values shown:

PeakMemory shows the maximum amount of memory used during the refresh operation. This includes memory used by the Analysis Services engine and the Power Query engine.
MashupCPUTime shows the total amount of CPU used by the Power Query engine for all Power Query queries used to import data into the dataset. This value will always be less than the value shown in the CPUTime column in the upper pane.
MashupPeakMemory shows the maximum amount of memory used during the refresh operation by just the Power Query engine across all query evaluations. It’s important to note that this value may not be entirely accurate since the memory usage values, and therefore the peaks, are captured asynchronously so there could be higher peaks that are not detected.

This new information should be extremely useful for troubleshooting refresh failures that are caused by excessive memory consumption.

[Thanks to Akshai Mirchandani for this information]

Identifying CPU And Memory-Intensive Power Query Queries During Refresh In The Power BI Service

Last year I wrote a post about a change in the Power BI Service that meant the CPU Time associated with Power Query transformations was added to the total shown for a dataset refresh operation in Profiler traces and Log Analytics:

https://blog.crossjoin.co.uk/2022/07/03/measuring-power-query-cpu-usage-during-power-bi-dataset-refresh/

This was useful, but it didn’t tell you directly how much CPU Time was used by Power Query and it didn’t tell you which tables or partitions in a refresh were using the most CPU. It also didn’t tell you anything about Power Query memory usage. The good news that recently there has been another change that solves these problems.

Let’s say you have a Power BI dataset that consists of a single table whose source is the following Power Query query:

let
  Source = #table(type table [MyNumber = number], List.Transform({1 .. 1000000}, each {_})), 
  #"Added Custom" = Table.AddColumn(
    Source, 
    "ARandomNumber", 
    each Number.RandomBetween(0, 10000), 
    type number
  ), 
  #"Sorted Rows" = Table.Sort(#"Added Custom", {{"ARandomNumber", Order.Ascending}})
in
  #"Sorted Rows"

This query creates a table with a million rows, adds a column with random numbers in and then sorts on that column – which is, as you’d expect, a very CPU and memory-hungry operation.

If you refresh this dataset in the Power BI Service and run a Profiler trace on it, looking at the Command Begin/End and Progress Report Begin/End events, this is what you’ll see:

The final Command End event shows the toal duration of the refresh as well as the amount of CPU used by both the Analysis Services engine and Power Query – in this case 24094ms.

If you look at the Progress Report End event associated with the finish of the refresh for the only partition of the only table in the dataset (highlighted in the screenshot above), there’s some extra information:

It shows the amount of CPU Time and the maximum amount of memory used by the Power Query engine while refreshing this partition. In this case the Power Query engine used 19468ms of CPU and reached a peak of 581848KB of memory. I can tell this is going to be really useful for troubleshooting refresh performance issues and out-of-memory errors.

[Thanks to Akshai Mirchandani, Xiaodong Zhang, Ting-Wei Chang and Jast Lu for this information]

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: