Measuring DirectQuery Performance In Power BI

If you have a slow DirectQuery report in Power BI one of the first questions you need to ask is how long the SQL queries that Power BI generates take to run. This is a more complicated question to answer than you might think, though, and in this post I’ll explain why.

I happen to have access to some of the famous New York taxi data in a Snowflake database, and in there is a table with trip data that has 173 million rows that I have a built a Power BI dataset from. The data and the database used are not really important here though – what is important is that it’s DirectQuery and a large-ish amount of data. Here’s a report page with a single table visual on it, showing passenger count aggregated by the hack license field:

It’s slow, but how slow? Here’s what Performance Analyzer shows when I refresh the table:

The DAX query takes 5.4 seconds but the Direct Query time is only 3.3 seconds – and the numbers don’t seem to add up. Here’s what Profiler captures for the same refresh shown in Performance Analyzer:

This shows there’s a gap of 2 seconds between the DirectQuery End event and the Query End event. What if I paste the DAX query into DAX Studio? Here’s what the Server Timings tab shows:

This is a different query execution to the two examples above, both of which show data for the same execution, which explains why the numbers are slightly different here – but again there seems to be an extra second of stuff happening and DAX Studio suggests that it’s in the Formula Engine.

So what is going on? The answer lies in understanding what the DirectQuery End Profiler event actually measures: it’s the amount of time between the Analysis Services engine handing a query over to the Power Query engine and the Analysis Services engine receiving the first row in the resultset back, including the time taken for the Power Query engine to fold the query.

Therefore if it takes a long time to get all the rows in the resultset then that could explain what’s going on here. Unfortunately there’s no way of knowing from Profiler events how long this takes – but there is another way. Going back to Performance Analyzer, if you export the data from it to JSON (by clicking the Export button) and load it into Power Query, you can see more detail about a DirectQuery query execution. Here’s the data from the first execution above:

[There’s a very good paper documenting what’s in the Performance Analyzer JSON file here]

Looking at the record in the metrics column for the Execute Direct Query event you can see the same 3.2 second duration shown above in Profiler. Notice that there are two other metrics here as well: RowsRead, which is the total number of rows returned by the resultset; and DataReadDuration, which is the amount of time to read these rows after the first row has been received plus some other Analysis Services Engine operations such as encoding of column values, joining with unpushed semijoins, projections of aggregations such as Average and saving the resultset to the in-memory cache. In this case the SQL query has returned 43191 rows and this takes 1.95 seconds – which explains the gap between the end of the Execute Direct Query event and the end of the query.

One last question: why this SQL query is returning so many rows when the DAX query is only asking for the top 502 rows?

The reason is that, at the time of writing at least, the Analysis Services engine can only push a top(n) operation down to a DirectQuery SQL query in very simple scenarios where there are no measures and no aggregation involved – and in this case we’re summing up values. As a result, if you’re using DirectQuery mode and have a visual like this that can potentially display a large number of rows and includes a measure or aggregated values, you may end up with slow performance.

[Thanks to Jeffrey Wang for providing the information in this post]

Capturing SQL Queries Generated By A Power BI DirectQuery Dataset

If you’re using DirectQuery mode for one or more tables in your Power BI dataset, the chances are that you will want to see the SQL (or whatever query language your DirectQuery data source uses) that is generated by Power BI when your report is run. If you can view the queries that are run in the tooling for the data source itself, for example, using Extended Events or SQL Server Profiler for SQL Server, then great – but you may not have permissions to do this. The good news is that you can capture the SQL queries in Power BI Desktop too, even though it’s not always obvious how to do so.

For some data sources like SQL Server then Performance Analyzer will give you the SQL queries generated. All you need to do is go to the View tab in the main Power BI Desktop window, click on the Performance Analyzer button to display the Performance Analyzer pane, click on Start Recording and then Refresh Visuals, find the event corresponding to the visual whose queries you want to view, expand it and then click on the “Copy query” link:

This will copy the DAX query generated by your visual to the clipboard; in the case of SQL Server DirectQuery sources you’ll also get the SQL query generated for that DAX query.

However this method does not work for all DirectQuery data sources; for them you’ll need to use the Query Diagnostics functionality in the Power Query Editor. You need to open the Power Query Editor window, go to the Tools tab on the ribbon, click on the Start Diagnostics button, go back to the main Power BI window, refresh your visuals (you can use the Refresh visuals button in Performance Analyzer for this again) and then go back to the Power Query Editor and click the Stop Diagnostics button. When you do this several new Power Query queries will appear which contain diagnostics data. Go to the one that has a name that starts with “Diagnostics_Detailed” and somewhere in there – where exactly depends on the data source – you’ll find the query generated. For example, for a Snowflake data source you’ll see the SQL generated somewhere in the Data Source Query column:

For an Azure Data Explorer DirectQuery data source the KQL query will be in one of the Record values in the Additional Info column:

One thing to watch out for is that you may also see what look like SQL Server TSQL queries, even when you’re not using a data source that can be queried with TSQL. Here’s an example from the Azure Data Explorer example above:

You can ignore these queries: they’re not useful, although they do give you an interesting insight into how DirectQuery mode works behind the scenes.

%d bloggers like this: