Working out how long a Power Query M query takes to run – the first step to troubleshooting refresh performance problems in Power BI or Excel – is something I have blogged about several times (see here, here and here for example). However, there are problems with all of these methods such as the need to alter the M code of your query. Luckily there is another method that I have found, using a combination of DAX Studio and SQL Server Profiler.
How can you get hold of these tools? You can download DAX Studio for free here:
Any serious Power BI developer should have it installed already, but if you have an older version you may need to update it to get the features I talk about in this post.
SQL Server Profiler is part of SQL Server Management Studio, and you can download it for free from here:
You can generate SQL Server Profiler trace files for diagnostic purposes from Power BI Desktop, as Kasper shows here, but these traces don’t show any information about refreshes as far as I can see. You can also hook Profiler up to Power BI Desktop direct, as Sam Lester shows here, but that’s a bit fiddly to do. Luckily DAX Studio now has a preview feature which makes connecting Profiler to Power BI Desktop much easier.
First of all, open up your report in Power BI Desktop, then open up DAX Studio and connect it to the instance of Power BI Desktop you just opened; if you want to use this technique with Excel Power Query queries then you will need to launch DAX Studio from inside Excel. Then, in DAX Studio, open the Options dialog and enable the “Show External Tools” preview feature:
Once you have done this, on the Advanced tab on DAX Studio’s ribbon you’ll see an option to launch SQL Profiler and connect it to the Power BI report or Excel file that DAX Studio is currently connected to:
Clicking the button opens up Profiler with a trace running. You’ll see that a lot of trace events are selected – probably too many to make sense of – so I suggest that you stop the trace, click the Properties button on the toolbar and then, on the Events Selection tab of the Trace Properties dialog select just the following four trace events:
- Command Begin
- Command End
- Progress Report Begin
- Progress Report End
Restart the trace and then refresh your dataset in Power BI Desktop. You’ll still see a lot of events generated in Profiler, but look for the following:
- A Command Begin event followed by a lot of Progress Report Begin/End events, which is the command that marks the start of the refresh.
- After that there will be a series of Progress Report Begin events that have an EventSubclass of “25 – ExecuteSQL” – one for each of the tables that are being loaded into your dataset and therefore one for each of the Power Query M queries associated with those tables. Don’t be confused by the name of the EventSubclass, these events will appear whatever data source you are using: it looks like the Vertipaq engine requests data from the Power Query engine using a basic form of SQL. For example, here’s a screenshot showing the three events I get when refreshing a dataset with three tables in it called “Property Type”, “Date” and “Price Paid 2017”:
- Finally, below that there will be a series of Progress Report End events (one for each Progress Report Begin event) with the EventSubclass of “25 – ExecuteSQL”, and the Duration column here will tell you how long it took to get data from the Power Query M query associated with the table. For example, this screenshot shows that it took 5460ms (about 5.5 seconds) to get the data from the “Price Paid 2017” Power Query query:
And there you have it, exact timings for each of the Power Query M queries associated with each of the tables in your dataset. Remember that the time taken by each Power Query M query will include the time taken by any other queries that it references, and it does not seem to be possible to find out the amount of time taken by any individual referenced query in Profiler.
There is a lot more interesting information that can be found in this way: for example, dataset refresh performance is not just related to the performance of the Power Query M queries that are used to load data; time is also needed to build all of the structures inside the dataset by the Vertipaq engine once the data has been returned, and Profiler gives you a lot of information on these operations too. Discussion of that will have to wait until a future blog post though…
UPDATE April 2019:
Having used this technique quite a lot in the last few months, I’ve seen that the Progress Report End/25 – Execute SQL event doesn’t always give an accurate value for the amount of time taken to execute an M query – or rather it does, but things are more complicated than I initially understood. You may also need to look at the trace events that come immediately afterwards: Progress Report Begin/17 – Read Data and Progress Report Begin/17 – End Data. Akshai Mirchandani of the SSAS team gave me a great explanation of the significance of these events:
The SQL query traces indicate the begin/end of query execution against the data source – theoretically, once this is done, rows are ready to be pulled from the source.
That’s when you will see the “ReadData” begin event. Rows start getting pulled at that point, until all rows are read (and encoded/compressed) – then you see the end event.
It is not always predictable how each provider/source behaves. Some of them might defer the expensive query execution until the first “Read” operation, but typically the SQL execution step tries to ensure that the command is valid, prepare the command and (at the very least) start the execution of the command.
Sometimes, the command execution might compute all the results and buffer them too.
In effect, its up to the provider/source exactly how this actually behaves…
It looks like the Power Query engine can behave in both of the ways Akshai describes depending on what you’re doing in the query.