If you have a slow DirectQuery report in Power BI one of the first questions you need to ask is how long the SQL queries that Power BI generates take to run. This is a more complicated question to answer than you might think, though, and in this post I’ll explain why.
I happen to have access to some of the famous New York taxi data in a Snowflake database, and in there is a table with trip data that has 173 million rows that I have a built a Power BI dataset from. The data and the database used are not really important here though – what is important is that it’s DirectQuery and a large-ish amount of data. Here’s a report page with a single table visual on it, showing passenger count aggregated by the hack license field:
It’s slow, but how slow? Here’s what Performance Analyzer shows when I refresh the table:
The DAX query takes 5.4 seconds but the Direct Query time is only 3.3 seconds – and the numbers don’t seem to add up. Here’s what Profiler captures for the same refresh shown in Performance Analyzer:
This shows there’s a gap of 2 seconds between the DirectQuery End event and the Query End event. What if I paste the DAX query into DAX Studio? Here’s what the Server Timings tab shows:
This is a different query execution to the two examples above, both of which show data for the same execution, which explains why the numbers are slightly different here – but again there seems to be an extra second of stuff happening and DAX Studio suggests that it’s in the Formula Engine.
So what is going on? The answer lies in understanding what the DirectQuery End Profiler event actually measures: it’s the amount of time between the Analysis Services engine handing a query over to the Power Query engine and the Analysis Services engine receiving the first row in the resultset back, including the time taken for the Power Query engine to fold the query.
Therefore if it takes a long time to get all the rows in the resultset then that could explain what’s going on here. Unfortunately there’s no way of knowing from Profiler events how long this takes – but there is another way. Going back to Performance Analyzer, if you export the data from it to JSON (by clicking the Export button) and load it into Power Query, you can see more detail about a DirectQuery query execution. Here’s the data from the first execution above:
[There’s a very good paper documenting what’s in the Performance Analyzer JSON file here]
Looking at the record in the metrics column for the Execute Direct Query event you can see the same 3.2 second duration shown above in Profiler. Notice that there are two other metrics here as well: RowsRead, which is the total number of rows returned by the resultset; and DataReadDuration, which is the amount of time to read these rows after the first row has been received plus some other Analysis Services Engine operations such as encoding of column values, joining with unpushed semijoins, projections of aggregations such as Average and saving the resultset to the in-memory cache. In this case the SQL query has returned 43191 rows and this takes 1.95 seconds – which explains the gap between the end of the Execute Direct Query event and the end of the query.
One last question: why this SQL query is returning so many rows when the DAX query is only asking for the top 502 rows?
The reason is that, at the time of writing at least, the Analysis Services engine can only push a top(n) operation down to a DirectQuery SQL query in very simple scenarios where there are no measures and no aggregation involved – and in this case we’re summing up values. As a result, if you’re using DirectQuery mode and have a visual like this that can potentially display a large number of rows and includes a measure or aggregated values, you may end up with slow performance.
[Thanks to Jeffrey Wang for providing the information in this post]