A quick point: while the Power Query Query Diagnostics functionality is relatively new, it’s based on Power BI/Power Query trace logging that has been around for a while. I’ve just realised that this means you can use the Diagnostics.Trace M function that I blogged about back in 2016 to add your own messages to the output of Query Diagnostics. Using the example query from that blog post, here’s what you’ll see in the detailed Query Diagnostics output query when that query is run (I’ve removed all but the important columns to make it easier to read):
The amount of detail you get with Query Diagnostics can be overwhelming; inserting your own messages should make it a lot easier to work out what is happening, when and how often – especially in scenarios where the query name and step name aren’t shown.
Do you have a Power BI report that has a row of card visuals on it, something like this:
It’s a very common thing to do, but if your report is too slow to render you may find that a design like this is part of the problem. Now look at the following report which contains a single matrix that has been configured to look as un-matrix-like as possible but shows the same data:
This is likely to perform faster, but why?
Before we go any further, I don’t want you to go and change your reports if you’re not going to get any benefit from doing so. Use Performance Analyzer (as shown here) to determine which visuals on your report are the cause of slow performance – there’s no point redesigning visuals that are fast anyway.
As a general rule the more visuals you put on a report page the slower it’s going to get. It’s logical if you think about it: the more visuals there are, the more queries have to be run against your dataset and the more work Power BI has to do to render the report. I know there is a tendency to try to pack as much information onto a page as possible and this often happens when someone else has designed the report you’re trying to build, but you should always try to resist this. Splitting a single large page into multiple smaller pages, using slicers or filters to reduce the amount of data shown at any one time and avoiding gigantic Excel-like tables are a good idea.
Once you’ve done that you need to see whether you can reduce the number of visuals needed to display the same amount of information: in the example above, the four separate cards show the same data as the single matrix. In part this is because the cards generate four separate DAX queries against the dataset to get the data they need whereas the matrix only generates one DAX query. There is a certain overhead to running a DAX query, so reducing the number of DAX queries needed to get the same amount of data is a good thing. More importantly, in this example Power BI can get the four values required for the single DAX query generated by the matrix much more efficiently than it can in the four separate DAX queries needed by the cards. This is because of something called “DAX Fusion”, which my colleague Phil Seamark blogged about in great detail here, and this can make a signficant difference to performance (Marco Russo’s post on why Analyze In Excel reports may be slower than Power BI reports here also contains some useful information on DAX Fusion).
This is only a simple example; if you want to see a really sophisticated demonstration of replacing several visuals with a matrix I suggest you watch this video by another colleague of mine, Miguel Myers. And of course it doesn’t just apply to cards: any time you can replace multiple visuals with a single visual (remember to be careful using custom visuals though) you’re likely to gain some performance.
Last week’s post showed an M function that took Power Query diagnostics data and formatted in a way that made it suitable for visualisation in a Power BI Decomposition Tree visual. This is great for understanding what’s going on at a high level, but by doing this you also lose a lot of detailed information from the diagnostics logs that could be useful for performance tuning. This week I have an M function for you that takes a different approach to solving the same problem that might be more appealing for the hardcore Power Query fans out there.
[Note: I recommend calling the function on the “detailed” diagnostics query and not the summarised view, because I’ve found what might be a bug with the summarised view]
The function will then return a table with one row for each query that is evaluated while the diagnostics were running:
[Note: You may find that some query evaluations have nulls instead of the name of the query; that’s an issue with the diagnostics data that will be addressed soon]
You could get the same thing simply by filtering the raw diagnostics data down to the rows where the Path column contains 0, but I’ve added two extra things that make the data much easier to understand.
First of all, the original data contains columns called “Exclusive Duration” and “Exclusive Duration (%)”, which tell you the amount of time each operation took in seconds and what percentage this was of the entire query evaluation. However, the data itself is in a parent/child structure so while a parent operation may have a very small duration it may have many slow descendant operations. Therefore my function adds two new columns called “Exclusive Duration (Including Child Operations)” and “Exclusive Duration (%)(Including Child Operations)”. The first of these new columns returns the total duration of each operation and all its child operations, while the second returns the percentage that the duration of each operation and all its child operations make up of the whole query duration. At the root level, this means the first of these new columns shows the total duration of each query in seconds and the second of these new columns should always show 100%.
Secondly, the function adds a new column called Child Rows that contains a nested table containing the child operations associated with each operation:
You can either click on the Table link in this column to navigate to a new table containing all the child operations, or use the Expand/Aggregate button in the top-right hand corner to keep the current operations and join the nested rows onto them (I strongly recommend you do the former). Experienced Power Query developers will be familiar with the concept of nested tables and I think this provides an easy way of navigating the parent/child operation structure that is similar to what you see with a Decomposition Tree but keeps all the detail columns that the diagnostics data provides you with and means you don’t have to leave the Power Query Editor.
I would love to hear your feedback about this and know if there are any bugs or other information that could be added. You can download a sample pbix file for this function here.
Recently I’ve been working with the Power Query team to come up with some ways to help developers understand the data returned by the new Power Query diagnostics functionality. In this, the first of two posts, I’m going to share a Power Query function that reformats diagnostics data in a way that makes it easy to visualise using the Power BI Decomposition Tree visual.
Copy the code above into a new blank query in the Power Query Editor window, and give the query a meaningful name like DiagnosticsForDecompTree
Pass in the name of a query created by using the Query Diagnostics functionality (you’ll be able to choose any query in your pbix file using the dropdown box) and invoke the function
Load the output of the new query that gets created by invoking the function into your Power BI dataset
Add a new Decomposition Tree visual to a report page
Drag the Exclusive Duration column into the Analyze well of the visual
Drag all the columns whose names start with “Level” (ie “Level.1”, “Level.2” and so on) into the Explain By well of the visual
Add a filter or slicer on the Query column to filter the data shown in the visual down to a single query evaluation.
The output will look something like this:
What does this function do? The original output of a diagnostics query has one row for each event; each event may have one or more child event, and the Path column shows the details of this parent/child structure. The function takes this structure and expands it out to multiple columns, so for each event you see all of its ancestor events in a separate column on the same row. Since each event’s duration only gives you the time taken for it to run and doesn’t include the time taken its child events, when you break down an event in the Decomposition Tree you’ll see some child events with the name (Blank) – these give you the duration of just the parent event and make sure everything adds up properly. This function should work for both the detailed diagnostics output as well as the summary output.
You can download a sample pbix file showing an example of how to use it here.
This is only a first draft of the function, so I’d really appreciate it if you could try this with your own diagnostics data and check if it works properly and also if it produces useful output. Please leave your feedback in a comment either here or on the Github Gist. There are almost certainly going to be bugs!
Of course the real question is is how you interpret what you see in the Decomposition Tree and how you can use it to improve the performance of your queries. That, my friends, is a journey we’re on together, but I think that being able to visualise diagnostics data easily is an important first step to being able to understand what’s going on inside the Power Query engine when a query is evaluated.
[Thanks to Jorge Gomez Basanta, Colin Popell and Matt Masson for their help with this code]