Visualising Power Query Diagnostics Data In A Power BI Decomposition Tree

Recently I’ve been working with the Power Query team to come up with some ways to help developers understand the data returned by the new Power Query diagnostics functionality. In this, the first of two posts, I’m going to share a Power Query function that reformats diagnostics data in a way that makes it easy to visualise using the Power BI Decomposition Tree visual.

Here’s the code for the function:

Here’s how to use it:

  1. Copy the code above into a new blank query in the Power Query Editor window, and give the query a meaningful name like DiagnosticsForDecompTree
  2. Pass in the name of a query created by using the Query Diagnostics functionality (you’ll be able to choose any query in your pbix file using the dropdown box) and invoke the functionInvokeFunction
  3. Load the output of the new query that gets created by invoking the function into your Power BI dataset
  4. Add a new Decomposition Tree visual to a report page
  5. Drag the Exclusive Duration column into the Analyze well of the visual
  6. Drag all the columns whose names start with “Level” (ie “Level.1”, “Level.2” and so on) into the Explain By well of the visual
  7. Add a filter or slicer on the Query column to filter the data shown in the visual down to a single query evaluation.

The output will look something like this:

What does this function do? The original output of a diagnostics query has one row for each event; each event may have one or more child event, and the Path column shows the details of this parent/child structure. The function takes this structure and expands it out to multiple columns, so for each event you see all of its ancestor events in a separate column on the same row. Since each event’s duration only gives you the time taken for it to run and doesn’t include the time taken its child events, when you break down an event in the Decomposition Tree you’ll see some child events with the name (Blank) – these give you the duration of just the parent event and make sure everything adds up properly. This function should work for both the detailed diagnostics output as well as the summary output.

You can download a sample pbix file showing an example of how to use it here.

This is only a first draft of the function, so I’d really appreciate it if you could try this with your own diagnostics data and check if it works properly and also if it produces useful output. Please leave your feedback in a comment either here or on the Github Gist. There are almost certainly going to be bugs!

Of course the real question is is how you interpret what you see in the Decomposition Tree and how you can use it to improve the performance of your queries. That, my friends, is a journey we’re on together, but I think that being able to visualise diagnostics data easily is an important first step to being able to understand what’s going on inside the Power Query engine when a query is evaluated.

[Thanks to Jorge Gomez Basanta, Colin Popell and Matt Masson for their help with this code]

The “Visual Has Exceeded The Available Resources” Error In Power BI

[Update 18th June 2023: this post about this error in the Power BI Service. If you’re running into this error in Power BI Desktop read this post]

Very occasionally you may find that a report that is slow to render – but still renders successfully – in Power BI Desktop, shows the error “Visual has exceeded the available resources” for some visuals when published to the Power BI Service:

Clicking on “See details” will show a dialog that looks something like this:

This visual has exceeded the available resources. Try filtering to decrease the amount of data displayed.

Please try again later or contact support. If you contact support, please provide these details.More details Resource Governing: The query exceeded the maximum memory allowed for queries executed in the current workload group (Requested 1048580KB, Limit 1048576KB).

The official Power BI documentation has similar advice to what’s shown in this dialog about what to do here, but what’s really going on?

The information in the “More details” section of the section dialog gives you a clue: in this case it’s resource governance. When you run a DAX query in Power BI it will always use a certain amount of memory; inefficient DAX calculations can cause a query to try to grab a lot of memory. In Power BI Desktop these queries may run successfully but be slow, but the Power BI Service can’t just let a query use as many resources as it wants (if it did, it may affect the performance of other queries being run by other users) so there is a resource governor that will kill queries that are too resource hungry. In the case of the visual above the query behind it tried to use more than 1GB of memory and was killed by the resource governor.

The limits enforced by the resource governor cannot be changed in Power BI shared capacity. The limits are higher in a Premium capacity and vary depending on the capacity size, and if you are a Premium Capacity Admin there are a number of different settings on a capacity you can change that will affect this, described here. For example the Query Memory Limit (%) setting controls the amount of memory that any given query can use before the resource governor kicks in.

To demonstrate this I moved the workspace containing the report in the screenshot above to a Power BI Embedded A4 capacity and then changed the Query Memory Limit % setting to 1.

This resulted in the error dialog showing that the resource governor killed the query when it tried to use 1% of the 25GB of RAM available in an A4, which is 262144KB:

While it may be useful to change this setting if you’re the admin of an overloaded Premium capacity, the real solution is to tune the DAX calculations used in this visual (I suggest you read the section on materialisation starting on p568 of “The Definitive Guide to DAX” or this older article which covers the same ground) so they use less memory and return faster or change what the visual displays.

You may see the “Visual has exceeded the available resources” error in other scenarios where the resource governor is not involved. For example, setting the Query Timeout setting on the capacity to 5 seconds like so:

…means that the visual now times out after five seconds instead, and the “See details” dialog shows the following error instead:

More details The XML for Analysis request timed out before it was completed. Timeout value: 5 sec.

Introduction To M In Power Query And Power BI

Since last week’s post proved very popular (it turns out people want video – who knew?) I thought I’d post the recording of a presentation I did earlier this year at the Power BI User Group in Wellington, New Zealand, called “Introduction to M in Power Query and Power BI”. It’s a session I’ve presented at various conferences and user groups over the years but this is the best version of it I think:

It’s an hour-long introduction to the M language that sits behind Power Query in Power BI, Excel, dataflows, Analysis Services and other tools. It covers pretty much everything you need to now about the language including let expressions; tables, records and lists; error handling; and writing custom M functions.

If you’re keen to watch even more Power BI-related videos, my colleague Matthew Roche has just started his own YouTube channel and published a video on dataflows vs datasets.

Why Does Power BI Query My Data Source More Than Once?

This week I was honoured to be a guest on the Guy In A Cube channel, and for my topic I decided to tackle one of life’s eternal questions: why does Power BI query my data source more than once when I refresh my dataset?

You can watch the video here:

Although I’ve never answered this question directly in a blog post before, nevertheless almost every technique I showed in the video is something I’ve blogged about so I thought it would be useful to collect all the links to these posts in one place to provide some background to what I show in the video.

The first thing to say is that this is really a Power Query question, not just a Power BI question; a lot of what I show will therefore be relevant to Excel’s Get&Transform, Power BI dataflows and every other manifestation of Power Query out there, although I can’t guarantee that all the options and behaviour will be the same in these places.

If you’re developing in Power BI Desktop and you think that refresh is taking a long time, you should definitely check whether the Power Query engine is hitting your data source more than once. There are lots of ways to do this. Some data sources have tools that show when they are queried, such as the Run History screen in Microsoft Flow that I show in the video or SQL Server Profiler. Other ways include using Fiddler for web services or Process Monitor for files.

Next, if you find that Power Query is indeed querying your data source multiple times, the most common explanation is that it’s doing a “background refresh”. This happens by default when you refresh a dataset and is Power Query refreshing all the data previews that you see when you click on a step in the Power Query Editor window. In most cases this is so quick you don’t notice it but when you have lots of queries with lots of steps it can be a gigantic problem. You can turn it off in the Options dialog in Power BI Desktop, as shown in the video, or in the Excel Query Options dialog. I blogged about this issue here.

Another possible reason is the data privacy settings you have applied. This is a gigantic, complex topic and something that I and other people have blogged about many times. I wrote a multi-part series of posts explaining data privacy settings starting here and there’s also a great explanation of what’s going on in behind the scenes written by the dev team that I link to here; I also wrote a post here showing an example of the performance impact of data privacy checks. As I say in the post, however, do not play around with these settings unless you really know what you’re doing.

If you’re ok with writing some M code, using the technique I blogged about here to implement basic query folding on a web data source can be important – in the example in the video, it was doing this that reduced the number of calls to the web service from six to three.

Some transformations can also lead to data being read from a data source more than once. For example, in this post I showed how a merge transformation leads to multiple reads from the same Excel file and how to stop this happening and drastically improve performance.

Moving on, another important lesson to learn is how referenced queries are evaluated. This is one of the most counter-intuitive things about Power Query! With the set of referenced queries used in my demo:

…where three queries called Referenced Query 1, Referenced Query 2 and Referenced Query 3 each reference a query called Call Web Service and are loaded into a dataset (and where Call Web Service is not loaded into the dataset), most people would assume that when a refresh takes place the following happens:

  1. The Call Web Service query is run, getting the data from the web service
  2. The data returned is then passed to Referenced Query 1, Referenced Query 2 and Referenced Query 3

This is wrong. In fact what happens is this:

  1. Referenced Query 1 is refreshed. Because it references the Call Web Service query, that means the Call Web Service query is also evaluated.
  2. Referenced Query 2 is refreshed. Because it references the Call Web Service query, that means the Call Web Service query is also evaluated.
  3. Referenced Query 3 is refreshed. Because it references the Call Web Service query, that means the Call Web Service query is also evaluated.

This explains why the Call Web Service query is still being evaluated three times once the M code above has been added to it. You can read a detailed explanation of how referenced queries are evaluated here.

There is something we can do to help here though: the Power Query engine can cache the results returned from some types of data source, such as web services, but to take full advantage of this caching in this particular scenario you have to turn off the Enable Parallel Loading Of Tables option on the Options dialog as I show here. With that done, each of the three referenced queries run one after the other, instead of in parallel. When the first of these queries runs its calls the Call Web Service query, which gets data from the web service; when the next two queries run, though, and they trigger two more evaluations of Call Web Service, the data for the call to the web service has been cached. This caching can be turned off using the IsRetry option in Web.Contents as Curt Hagenlocher explains in a comment here.

The final thing shown in the video is that the infamous Table.Buffer M function does not improve performance in this case and in fact makes performance worse. If you use Table.Buffer inside the Call Web Service query it will indeed buffer the result of the call to the web service into memory, but the data that is buffered cannot be shared between separate query executions – so the buffering happens three times, once for each time Call Web Service is evaluated by the three referenced queries. Again this is something discussed in this post. Of course there are plenty of other scenarios where Table.Buffer will help performance, but these will only be in cases where the same query requests data from the same data source more than once.

Phew, that was a lot. I don’t pretend to know everything about when and why Power Query gets data from a data source multiple times but this is the sum total of my knowledge right now. I hope it’s useful!

[Update: it’s just been pointed out to me, quite rightly, that the names of my queries are confusing. For example, “Referenced Query 1” is the query doing the referencing, not the query being referenced – which is Call Web Service. I haven’t changed the names because I want the blog post to remain consistent with the video, and I hope the diagrams and the context make everything clear]

BIFocal Show Interview

Just a quick post to let you know that an interview I recorded for the BIFocal Show podcast at the Microsoft Business Applications Summit in Atlanta a few months ago is now available for your listening pleasure:

John White and Jason Himmelstein, like less-furry versions of Paginated Report Bear, speak to all the top people in the world of Microsoft BI (they spoke to Marco Russo last week) so I highly recommend subscribing if you don’t do so already.

Paginated Report Bear And The Future Of Analysis Services Multidimensional

“Who, or what, is Paginated Report Bear?” I hear you ask. Well, he’s the breakout social media star of 2019, a furry YouTube sensation whose incisive interviews of members of the Power BI development team have become renowned for their deep technical content and insights into the Power BI roadmap. If you’re not watching his videos on YouTube, you’re missing out. Guy In A Cube is so 2018.

[That’s me on the left and Paginated Report Bear on the right]

Anyway, one question I am asked all the time is what the future holds for Analysis Services Multidimensional. While there is no firm news on what’s happening here, two of Paginated Report Bear’s recent interviews have discussed this topic and are particularly revealing. If you’re an SSAS MD and MDX fan I strongly recommend you watch the interviews with Josh Caplan:

…and Amir Netz:

I also had the honour to be interviewed by him at the Microsoft Business Applications Summit this week and we discussed this topic too, although I would like to stress that unlike Amir and Josh I have absolutely no influence on the decisions made in this area; in any case, my opinions on this topic might surprise some of you.

I’m Joining Microsoft

I have an announcement: tomorrow (Monday June 3rd) I’m starting a new job on the Power BI CAT team at Microsoft. It won’t affect what happens here on my blog, but I wanted to write this post because so many people have asked me why I’m making this move.

First and foremost the job at Microsoft offers some exciting new challenges for me that I wouldn’t get as a self-employed person. I’ll get to work on some of the biggest, most complex Power BI implementations in the world, provide feedback to the Power BI development team, and still be able to speak at conferences and do many of the other things I love doing now. I’ll also have the pleasure of working with a truly stellar bunch of colleagues who I know I’ll learn a lot from. And of course, what better product to work on than Power BI and what better tech company to work for nowadays than Microsoft? Power BI is going from strength to strength and I want to make a direct contribution to its future success.

What’s more the offer from Microsoft came at a time when I was getting a bit bored with the work I’ve been doing. If you do any job for long enough it gets repetitive and in my case after thirteen years (over a quarter of my life!) of running my own company I felt like I needed a change. Also, as I have made the shift from being a SSAS/MDX guy to being a Power BI guy I’ve been doing less and less technical consultancy and more and more training, mostly in the form of introductory Power BI courses. I enjoy training, I’d like to think I’m fairly good at it and it has proved very lucrative indeed – I just don’t want to be a full-time trainer, teaching the same material week after week.

Training and consultancy also involve a lot of travel. Over the last few years I’ve averaged more than ten nights per month in hotels and on top of that there were many nights when I got home late after a long journey back from a customer site. My wife has been very supportive and it’s all my kids have ever known, but it’s tiring and I want to spend more time with my family before my kids grow up and leave home. I’ve been to some interesting places on business I would never have been to otherwise and worked with some great companies, so yes, I have enjoyed myself. Business travel is nowhere near as glamorous or thrilling as it may seem, though, and I’m happy that I’ll be doing less of it. There’s also the risk that Brexit (if and when and how it ever happens) will stop me from working in Europe as easily as I have done in the past, so travelling as much might not even have been an option going forward.

Being self-employed has been a great experience and it’s something I would recommend to anyone who is thinking of doing it. I’m immensely grateful to all my customers, business partners and fellow members of the SQL and Power BI communities for making Crossjoin Consulting so successful. However it’s time for me to move on and try something new. Wish me luck! I’ll be back to blogging about Power BI, Power Query, SSAS, DAX and M next week.

Extracting All The M Code From A Power BI Dataset Using The DISCOVER_M_EXPRESSIONS DMV

DMVs (Dynamic Management Views) are, as the Analysis Services documentation states, “queries that return information about model objects, server operations, and server health”. They’re also available in Azure Analysis Service, Power BI and Power Pivot and are useful for a variety of reasons, for example for generating documentation.

Several as-yet undocumented DMVs have appeared in Power BI recently and one that caught my eye was DISCOVER_M_EXPRESSIONS. Unfortunately, when I tried to run it in DAX Studio against an open Power BI file I got an error saying it was only available in the Power BI Service:

Luckily, now that XMLA Endpoints are now in preview and SQL Server Management Studio 18 has been released (which supports connections to Power BI via XMLA Endpoints) we can test it against a published dataset stored in a Premium capacity. The following query can be run from a DAX query window in SQL Server Management Studio:

[sourcecode language=’text’ padlinenumbers=’true’]
select * from
$system.discover_m_expressions
[/sourcecode]

…returns a list of all the Power Query queries  in the selected dataset and their M code:

If you don’t have Premium you can run the same query from an Excel table against any published dataset using the technique I blogged about here:

I know there are other methods for doing this (for example using copy/paste) it’s useful to be able to do this via a DMV because it means you can automate the process of extracting all your M code easily.

Some of the other new DMVs look like they are worthy of a blog post too – I can guess what most of them do from their names, but others are more mysterious and perhaps hint at features that have not been announced yet.

Making Power BI Drillthrough Return The “Right” Rows When You Use It With Complex Measures

When an end user sees a strange value in a Power BI report, their first reaction is usually to want to see the detail-level data from the underlying table. Power BI’s drillthrough feature is a great way of  letting them do this, but it only returns meaningful results if you use it on measures that do simple aggregations such as sums or counts; if you have more complex calculations then usually what the drillthrough returns won’t be the rows that go to make up the value the user has clicked on.

Here’s an example. Say you have a simple Power BI model with a Sales table that contains the following data:

There is also a Date table with date and month columns, and the entire model looks like this:

Let’s say you create a measure called Sales Value that sums up the contents of the Sales column:

[sourcecode language=’text’ padlinenumbers=’true’]
Sales Value = SUM(‘Sales'[Sales])
[/sourcecode]

You could use this in a column chart to show sales by month, like so:

If the user wants to see the underlying data for one of the bars in this chart, drillthrough will work well – you just need to create another page (called, in this case, Month Drillthrough), put a table on it that displays the full contents of the Sales table:

[It’s important to note that it’s the Date column from the Sales table that’s shown here, not the Date column from the Date table]

Then drag the Month column from the Date table into the Drillthrough filter area:

…and you will be able to drillthrough from one of the columns in the chart, in this case the bar for May 2018:

…and that filter will be passed over to the Date Drillthrough page, so you only see the row in the table showing sales for May 5th 2018:

But what happens if you want to display year-to-date values in your column chart? If you create the following measure:

[sourcecode language=’text’ ]
YTD Sales = CALCULATE([Sales Value], DATESYTD(‘Date'[Date]))
[/sourcecode]

…and use it in the bar chart, you will see the following:

The problem comes when the user does the same drillthrough on May 2018 – which now shows the value 16 – and gets exactly the same table that they did before, showing only the sales transactions for May:

In this case, because the user clicked on the year-to-date value for May 2018 they would expect to see all the rows from the Sales table that went to make up that YTD value for May, that’s to say all the rows from the Sales table where the date was in the range January 2018 to May 2018.

The solution is to use some DAX that takes the month filter passed by the drillthrough and ensures that it filters the table shown not by the selected month, but all months in the year-to-date (similar to, but not exactly the same as, what I describe here).

Here’s a measure that does the job:

[sourcecode language=’text’ ]
SalesIgnoringDate =
var CurrentDateFromSales =
CALCULATE(
SELECTEDVALUE(‘Sales'[Date]),
CROSSFILTER(
‘Date'[Date],
Sales[Date],
None
)
)
return
IF(
CONTAINS(
DATESYTD(‘Date'[Date]),
‘Date'[Date],
CurrentDateFromSales
),
CALCULATE(
[Sales Value],
CROSSFILTER(
‘Date'[Date],
Sales[Date],
None)
)
)
[/sourcecode]

What this does is:

  • Uses the DAX Crossfilter() function to disable the relationship between the Date and Sales table, and then use the SelectedValue() function to find the date from the Sales table shown on the current row of the table on the drillthrough report page, and store it in the CurrentDateFromSales variable.
  • Constructs a table using the DatesYTD() function and the Date column of the Date table, which contains all of the dates from the beginning of the current year up to and including the last date in the filter context – which will be the last date in the month selected in the drillthrough.
  • Uses the Contains() function to see if the date saved in the CurrentDateFromSales appears in the table returned in the previous step.
  • If it does appear, return the value of the Sales Value measure. Once again, this needs to have the relationship between the Sales and Date table disabled using the CrossFilter() function.

This measure can be used in the table on the drillthrough page instead of the Sales Value measure. Last of all, since your users will not want to see a measure called SalesIgnoringDate in their report, you can rename the SalesIgnoringDate column on the table to Sales Value.

Here’s the end result (in this case I created a new drillthrough page called YTD Drillthrough with the new measure on):

You can download the sample pbix file here.

This is just one example, and different types of calculation on your source page will require vastly different DAX measures on your drillthrough page to ensure that a meaningful set of rows is returned. The basic concepts will remain the same whatever the calculation, though: you need to create a measure that ignores the filter applied by the drillthrough and instead returns a value when you want a row to appear in your drillthrough table and returns a blank value when you don’t want a row to appear.

It’s a shame that drillthrough in the SSAS Tabular sense is not available in Power BI, because being able to set a the Detail Rows Expression property on a measure in Power BI would make this problem a lot easier to solve.

Listing Windows Language Code Identifiers And Their Associated Date And Number Formats With M In Power BI/Power Query

In a comment on my blog post about international date and number formats and changing data types with the “using locale” option in Power Query/Power BI, Jan Karel Pieterse asked if there was any way to get a list of the thousand and decimal separators used for number formatting by each language and region. Since this is exactly the kind of geeky question that fascinates me I decided to write an M query to answer it and – for bonus points – to find the default date format used too.

To start off, I found a table of all Windows Language Code Identifiers on this page:

Of course this can be loaded into Power Query easily using the “From Web” source. After that it’s easy to add a column to the table that takes a sample date (March 22nd 2018) and number (one hundred thousand and one tenth) and converts it to text using the language code identifier on each row: the Text.From() function does this for dates, and for numbers you have to use Number.ToText() if you want to get thousand separators and decimal separators. There are a few minor problems to deal with, such as the fact that Power Query doesn’t know what to do with dates for the “Congo Swahili” language code identifier and some rows have multiple language tags, but nothing serious.

Here’s the full code:

let
//Sample dates and numbers to show
SampleDate = #date(2018,3,22),
SampleNumber = 100000+(1/10),
//MS web page with list of language tags
LocaleWebPage =
Web.Page(
Web.Contents(
"https://msdn.microsoft.com/en-us/library/cc233982.aspx"
)
),
LocaleList = LocaleWebPage{1}[Data],
RemoveColumns =
Table.SelectColumns(
LocaleList,
{"Language", "Location (or type)", "Language tag"}
),
SplitColumn =
Table.SplitColumn(
RemoveColumns,
"Language tag",
Splitter.SplitTextByAnyDelimiter(
{",","or"},
QuoteStyle.Csv
),
{"Language tag"}
),
//Create example columns
DateExample =
Table.AddColumn(
SplitColumn,
"Date",
each Text.From(SampleDate, [Language tag])
, Text.Type),
NumberExample =
Table.AddColumn(
DateExample,
"Number",
each Number.ToText(SampleNumber,"N", [Language tag])
, Text.Type),
//Remove any rows containing errors
RemoveErrors = Table.RemoveRowsWithErrors(NumberExample)
in
RemoveErrors

Here’s some of the output:

So, if you’ve ever wondered how the Cornish speakers of south-west England like to format their dates or whether the Oromo speakers of Ethiopia use a comma or a full stop as a decimal separator, wonder no more. And if you are not interested in M at all and just want to download an Excel workbook with a list of all LCIDs and how numbers and dates are formatted for them, you can do so here.