In the first post in this series I described the events in Log Analytics that can be used to understand throughput – the speed that Power BI can read from your dataset when importing data from it – during refresh. While the individual events are easy to understand when you look at a simple example they don’t make it easy to analyse the data in the real world, so here’s a KQL query that takes all the data from all these events and gives you one row per partition per refresh:
//Headline stats for partition refresh with one row for each partition and refresh //Get all the data needed for this query and buffer it in memory let RowsForStats = materialize( PowerBIDatasetsWorkspace | where TimeGenerated > ago(1d) | where OperationName == "ProgressReportEnd" | where OperationDetailName == "ExecuteSql" or OperationDetailName == "ReadData" or (OperationDetailName == "TabularRefresh" and (EventText contains "partition")) ); //Get just the events for the initial SQL execution phase let ExecuteSql = RowsForStats | where OperationDetailName == "ExecuteSql" | project XmlaRequestId, XmlaObjectPath, ExecuteSqlStartTime = format_datetime(TimeGenerated - (DurationMs * 1ms),'yyyy-MM-dd HH:mm:ss.fff' ), ExecuteSqlEndTime = format_datetime(TimeGenerated,'yyyy-MM-dd HH:mm:ss.fff' ), ExecuteSqlDurationMs = DurationMs, ExecuteSqlCpuTimeMs = CpuTimeMs; //Get just the events for the data read and calculate rows read per second let ReadData = RowsForStats | where OperationDetailName == "ReadData" | project XmlaRequestId, XmlaObjectPath, ReadDataStartTime = format_datetime(TimeGenerated - (DurationMs * 1ms),'yyyy-MM-dd HH:mm:ss.fff' ), ReadDataEndTime = format_datetime(TimeGenerated,'yyyy-MM-dd HH:mm:ss.fff' ), ReadDataDurationMs = DurationMs, ReadDataCpuTime = CpuTimeMs, TotalRowsRead = ProgressCounter, RowsPerSecond = ProgressCounter /(toreal(DurationMs)/1000); //Get the events for the overall partition refresh let TabularRefresh = RowsForStats | where OperationDetailName == "TabularRefresh" | parse EventText with * '[MashupCPUTime: ' MashupCPUTimeMs:long ' ms, MashupPeakMemory: ' MashupPeakMemoryKB:long ' KB]' | project XmlaRequestId, XmlaObjectPath, TabularRefreshStartTime = format_datetime(TimeGenerated - (DurationMs * 1ms),'yyyy-MM-dd HH:mm:ss.fff' ), TabularRefreshEndTime = format_datetime(TimeGenerated,'yyyy-MM-dd HH:mm:ss.fff' ), TabularRefreshDurationMs = DurationMs, TabularRefreshCpuTime = CpuTimeMs, MashupCPUTimeMs, MashupPeakMemoryKB; //Do an inner join on the three tables so there is one row per partition per refresh ExecuteSql | join kind=inner ReadData on XmlaRequestId, XmlaObjectPath | join kind=inner TabularRefresh on XmlaRequestId, XmlaObjectPath | project-away XmlaRequestId1, XmlaRequestId2, XmlaObjectPath1, XmlaObjectPath2 | extend Table = tostring(split(XmlaObjectPath,".", 2)), Partition = tostring(split(XmlaObjectPath,".", 3)) | project-reorder XmlaRequestId, Table, Partition | order by XmlaRequestId, ExecuteSqlStartTime desc
It’s a bit of a monster query but what it does is quite simple:
- First it gets all the events relating to partition refresh in the past 1 day (which of course you can change) and materialises the results.
- Then it filters this materialised result and gets three sets of tables:
- All the ExecuteSql events, which tell you how long the data source took to start returning data and how much CPU time was used.
- All the ReadData events, which tell you how long Power BI took to read all the rows from the source after the data started to be returned, how much CPU time was used, and how many rows were read. Dividing duration by rows read lets you calculate the number of rows read per second during this phase.
- All the TabularRefresh events, which give you overall data on how long the partition refresh took, how much CPU time was used, plus information on Power Query peak memory usage and CPU usage.
What can this tell us about refresh throughput though? Let’s use it to answer some questions we might have about throughput.
What is the impact of parallelism on throughput? I created a dataset on top of the NYC taxi data Trip table with a single table, and in that table created four partitions containing data for January, February, March and April 2013, each of which contained 13-15 million rows. I won’t mention the type of data source I used because I think it distracts from what I want to talk about here, which is the methodology rather than the performance characteristics of a particular source.
I then ran two refreshes of these four partitions: one which refreshed them all in parallel and one which refreshed them sequentially, using custom TSML refresh commands and the maxParallelism property as described here. I did a refresh of type dataOnly, rather than a full refresh, in the hope that it would reduce the number of things happening in the Vertipaq engine during refresh that might skew my results. Next, I used the query above as the source for a table in Power BI (for details on how to use Log Analytics as a source for Power BI see this post; I found it more convenient to import data rather than use DirectQuery mode though) to visualise the results.
Comparing the amount of time taken for the SQL query used to start to return data (the ExecuteSqlDurationMs column from the query above) for the four partitions for the two refreshes showed the following:
The times for the four partitions vary a lot for the sequential refresh but are very similar for the parallel refresh; the January partition, which was refreshed first, is slower in both cases. The behaviour I described here regarding the first partition refreshed in a batch could be relevant.
Moving on to the Read Data phase, looking at the number of rows read per second (the RowsPerSecond column from the query above) shows a similar pattern:
There’s a lot more variation in the sequential refresh. Also, as you would expect, the number of rows read per second is much higher when partitions are refreshed sequentially compared to when they are refreshed in parallel.
Looking at the third main metric, the overall amount of time taken to refresh each partition (the TabularRefreshDurationMs column from the query above) again shows no surprises:
Each individual partition refreshes a lot faster in the sequential refresh – almost twice as fast – compared to the parallel refresh. Since four partitions are being refreshed in parallel during the second refresh, though, this means that any loss of throughput for an individual partition as a result of refreshing in parallel is more than compensated for by the parallelism, making the parallel refresh faster overall. This can be shown using by plotting the TabularRefreshStartTime and TabularRefreshEndTime columns from the query above on a timeline chart (in this case the Craydec Timelines custom visual) for each refresh and each partition:
On the left of the timeline you can see the first refresh where the partitions are refreshed sequentially, and how the overall duration is just over 20 minutes; on the right you can see the second refresh where the partitions are refreshed in parallel, which takes just under 10 minutes. Remember also that this is just looking at the partition refresh times, not the overall time taken for the refresh operation for all partitions, and it’s only a refresh of type dataOnly rather than a full refresh.
So does this mean more parallelism is better? That’s not what I’ve been trying to say here: more parallelism is better for overall throughput in this test but if you keep on increasing the amount of parallelism you’re likely to reach a point where it makes throughput and performance worse. The message is that you need to test to see what the optimal level of parallelism – or any other factor you can control – is for achieving maximum throughput during refresh.
These tests only show throughput at the level of the ReadData event for a single partition, but as mentioned in my previous post there is even more detailed data available with the ProgressReportCurrent event. In my next post I’ll take a closer look at that data.
[Thanks to Akshai Mirchandani for providing some of the information in this post, and hat-tip to my colleague Phil Seamark who has already done some amazing work in this area]