Since last week’s post proved very popular (it turns out people want video – who knew?) I thought I’d post the recording of a presentation I did earlier this year at the Power BI User Group in Wellington, New Zealand, called “Introduction to M in Power Query and Power BI”. It’s a session I’ve presented at various conferences and user groups over the years but this is the best version of it I think:
It’s an hour-long introduction to the M language that sits behind Power Query in Power BI, Excel, dataflows, Analysis Services and other tools. It covers pretty much everything you need to now about the language including let expressions; tables, records and lists; error handling; and writing custom M functions.
This week I was honoured to be a guest on the Guy In A Cube channel, and for my topic I decided to tackle one of life’s eternal questions: why does Power BI query my data source more than once when I refresh my dataset?
You can watch the video here:
Although I’ve never answered this question directly in a blog post before, nevertheless almost every technique I showed in the video is something I’ve blogged about so I thought it would be useful to collect all the links to these posts in one place to provide some background to what I show in the video.
The first thing to say is that this is really a Power Query question, not just a Power BI question; a lot of what I show will therefore be relevant to Excel’s Get&Transform, Power BI dataflows and every other manifestation of Power Query out there, although I can’t guarantee that all the options and behaviour will be the same in these places.
If you’re developing in Power BI Desktop and you think that refresh is taking a long time, you should definitely check whether the Power Query engine is hitting your data source more than once. There are lots of ways to do this. Some data sources have tools that show when they are queried, such as the Run History screen in Microsoft Flow that I show in the video or SQL Server Profiler. Other ways include using Fiddler for web services or Process Monitor for files.
Next, if you find that Power Query is indeed querying your data source multiple times, the most common explanation is that it’s doing a “background refresh”. This happens by default when you refresh a dataset and is Power Query refreshing all the data previews that you see when you click on a step in the Power Query Editor window. In most cases this is so quick you don’t notice it but when you have lots of queries with lots of steps it can be a gigantic problem. You can turn it off in the Options dialog in Power BI Desktop, as shown in the video, or in the Excel Query Options dialog. I blogged about this issue here.
Another possible reason is the data privacy settings you have applied. This is a gigantic, complex topic and something that I and other people have blogged about many times. I wrote a multi-part series of posts explaining data privacy settings starting here and there’s also a great explanation of what’s going on in behind the scenes written by the dev team that I link to here; I also wrote a post here showing an example of the performance impact of data privacy checks. As I say in the post, however, do not play around with these settings unless you really know what you’re doing.
If you’re ok with writing some M code, using the technique I blogged about here to implement basic query folding on a web data source can be important – in the example in the video, it was doing this that reduced the number of calls to the web service from six to three.
Some transformations can also lead to data being read from a data source more than once. For example, in this post I showed how a merge transformation leads to multiple reads from the same Excel file and how to stop this happening and drastically improve performance.
Moving on, another important lesson to learn is how referenced queries are evaluated. This is one of the most counter-intuitive things about Power Query! With the set of referenced queries used in my demo:
…where three queries called Referenced Query 1, Referenced Query 2 and Referenced Query 3 each reference a query called Call Web Service and are loaded into a dataset (and where Call Web Service is not loaded into the dataset), most people would assume that when a refresh takes place the following happens:
The Call Web Service query is run, getting the data from the web service
The data returned is then passed to Referenced Query 1, Referenced Query 2 and Referenced Query 3
This is wrong. In fact what happens is this:
Referenced Query 1 is refreshed. Because it references the Call Web Service query, that means the Call Web Service query is also evaluated.
Referenced Query 2 is refreshed. Because it references the Call Web Service query, that means the Call Web Service query is also evaluated.
Referenced Query 3 is refreshed. Because it references the Call Web Service query, that means the Call Web Service query is also evaluated.
This explains why the Call Web Service query is still being evaluated three times once the M code above has been added to it. You can read a detailed explanation of how referenced queries are evaluated here.
There is something we can do to help here though: the Power Query engine can cache the results returned from some types of data source, such as web services, but to take full advantage of this caching in this particular scenario you have to turn off the Enable Parallel Loading Of Tables option on the Options dialog as I show here. With that done, each of the three referenced queries run one after the other, instead of in parallel. When the first of these queries runs its calls the Call Web Service query, which gets data from the web service; when the next two queries run, though, and they trigger two more evaluations of Call Web Service, the data for the call to the web service has been cached. This caching can be turned off using the IsRetry option in Web.Contents as Curt Hagenlocher explains in a comment here.
The final thing shown in the video is that the infamous Table.Buffer M function does not improve performance in this case and in fact makes performance worse. If you use Table.Buffer inside the Call Web Service query it will indeed buffer the result of the call to the web service into memory, but the data that is buffered cannot be shared between separate query executions – so the buffering happens three times, once for each time Call Web Service is evaluated by the three referenced queries. Again this is something discussed in this post. Of course there are plenty of other scenarios where Table.Buffer will help performance, but these will only be in cases where the same query requests data from the same data source more than once.
Phew, that was a lot. I don’t pretend to know everything about when and why Power Query gets data from a data source multiple times but this is the sum total of my knowledge right now. I hope it’s useful!
[Update: it’s just been pointed out to me, quite rightly, that the names of my queries are confusing. For example, “Referenced Query 1” is the query doing the referencing, not the query being referenced – which is Call Web Service. I haven’t changed the names because I want the blog post to remain consistent with the video, and I hope the diagrams and the context make everything clear]
Just a quick post to let you know that an interview I recorded for the BIFocal Show podcast at the Microsoft Business Applications Summit in Atlanta a few months ago is now available for your listening pleasure:
John White and Jason Himmelstein, like less-furry versions of Paginated Report Bear, speak to all the top people in the world of Microsoft BI (they spoke to Marco Russo last week) so I highly recommend subscribing if you don’t do so already.
“Who, or what, is Paginated Report Bear?” I hear you ask. Well, he’s the breakout social media star of 2019, a furry YouTube sensation whose incisive interviews of members of the Power BI development team have become renowned for their deep technical content and insights into the Power BI roadmap. If you’re not watching his videos on YouTube, you’re missing out. Guy In A Cube is so 2018.
[That’s me on the left and Paginated Report Bear on the right]
Anyway, one question I am asked all the time is what the future holds for Analysis Services Multidimensional. While there is no firm news on what’s happening here, two of Paginated Report Bear’s recent interviews have discussed this topic and are particularly revealing. If you’re an SSAS MD and MDX fan I strongly recommend you watch the interviews with Josh Caplan:
…and Amir Netz:
I also had the honour to be interviewed by him at the Microsoft Business Applications Summit this week and we discussed this topic too, although I would like to stress that unlike Amir and Josh I have absolutely no influence on the decisions made in this area; in any case, my opinions on this topic might surprise some of you.
I have an announcement: tomorrow (Monday June 3rd) I’m starting a new job on the Power BI CAT team at Microsoft. It won’t affect what happens here on my blog, but I wanted to write this post because so many people have asked me why I’m making this move.
First and foremost the job at Microsoft offers some exciting new challenges for me that I wouldn’t get as a self-employed person. I’ll get to work on some of the biggest, most complex Power BI implementations in the world, provide feedback to the Power BI development team, and still be able to speak at conferences and do many of the other things I love doing now. I’ll also have the pleasure of working with a truly stellar bunch of colleagues who I know I’ll learn a lot from. And of course, what better product to work on than Power BI and what better tech company to work for nowadays than Microsoft? Power BI is going from strength to strength and I want to make a direct contribution to its future success.
What’s more the offer from Microsoft came at a time when I was getting a bit bored with the work I’ve been doing. If you do any job for long enough it gets repetitive and in my case after thirteen years (over a quarter of my life!) of running my own company I felt like I needed a change. Also, as I have made the shift from being a SSAS/MDX guy to being a Power BI guy I’ve been doing less and less technical consultancy and more and more training, mostly in the form of introductory Power BI courses. I enjoy training, I’d like to think I’m fairly good at it and it has proved very lucrative indeed – I just don’t want to be a full-time trainer, teaching the same material week after week.
Training and consultancy also involve a lot of travel. Over the last few years I’ve averaged more than ten nights per month in hotels and on top of that there were many nights when I got home late after a long journey back from a customer site. My wife has been very supportive and it’s all my kids have ever known, but it’s tiring and I want to spend more time with my family before my kids grow up and leave home. I’ve been to some interesting places on business I would never have been to otherwise and worked with some great companies, so yes, I have enjoyed myself. Business travel is nowhere near as glamorous or thrilling as it may seem, though, and I’m happy that I’ll be doing less of it. There’s also the risk that Brexit (if and when and how it ever happens) will stop me from working in Europe as easily as I have done in the past, so travelling as much might not even have been an option going forward.
Being self-employed has been a great experience and it’s something I would recommend to anyone who is thinking of doing it. I’m immensely grateful to all my customers, business partners and fellow members of the SQL and Power BI communities for making Crossjoin Consulting so successful. However it’s time for me to move on and try something new. Wish me luck! I’ll be back to blogging about Power BI, Power Query, SSAS, DAX and M next week.
DMVs (Dynamic Management Views) are, as the Analysis Services documentation states, “queries that return information about model objects, server operations, and server health”. They’re also available in Azure Analysis Service, Power BI and Power Pivot and are useful for a variety of reasons, for example for generating documentation.
Several as-yet undocumented DMVs have appeared in Power BI recently and one that caught my eye was DISCOVER_M_EXPRESSIONS. Unfortunately, when I tried to run it in DAX Studio against an open Power BI file I got an error saying it was only available in the Power BI Service:
Luckily, now that XMLA Endpoints are now in preview and SQL Server Management Studio 18 has been released (which supports connections to Power BI via XMLA Endpoints) we can test it against a published dataset stored in a Premium capacity. The following query can be run from a DAX query window in SQL Server Management Studio:
select * from
…returns a list of all the Power Query queries in the selected dataset and their M code:
If you don’t have Premium you can run the same query from an Excel table against any published dataset using the technique I blogged about here:
I know there are other methods for doing this (for example using copy/paste) it’s useful to be able to do this via a DMV because it means you can automate the process of extracting all your M code easily.
Some of the other new DMVs look like they are worthy of a blog post too – I can guess what most of them do from their names, but others are more mysterious and perhaps hint at features that have not been announced yet.
When an end user sees a strange value in a Power BI report, their first reaction is usually to want to see the detail-level data from the underlying table. Power BI’s drillthrough feature is a great way of letting them do this, but it only returns meaningful results if you use it on measures that do simple aggregations such as sums or counts; if you have more complex calculations then usually what the drillthrough returns won’t be the rows that go to make up the value the user has clicked on.
Here’s an example. Say you have a simple Power BI model with a Sales table that contains the following data:
There is also a Date table with date and month columns, and the entire model looks like this:
Let’s say you create a measure called Sales Value that sums up the contents of the Sales column:
Sales Value = SUM('Sales'[Sales])
You could use this in a column chart to show sales by month, like so:
If the user wants to see the underlying data for one of the bars in this chart, drillthrough will work well – you just need to create another page (called, in this case, Month Drillthrough), put a table on it that displays the full contents of the Sales table:
[It’s important to note that it’s the Date column from the Sales table that’s shown here, not the Date column from the Date table]
Then drag the Month column from the Date table into the Drillthrough filter area:
…and you will be able to drillthrough from one of the columns in the chart, in this case the bar for May 2018:
…and that filter will be passed over to the Date Drillthrough page, so you only see the row in the table showing sales for May 5th 2018:
But what happens if you want to display year-to-date values in your column chart? If you create the following measure:
…and use it in the bar chart, you will see the following:
The problem comes when the user does the same drillthrough on May 2018 – which now shows the value 16 – and gets exactly the same table that they did before, showing only the sales transactions for May:
In this case, because the user clicked on the year-to-date value for May 2018 they would expect to see all the rows from the Sales table that went to make up that YTD value for May, that’s to say all the rows from the Sales table where the date was in the range January 2018 to May 2018.
The solution is to use some DAX that takes the month filter passed by the drillthrough and ensures that it filters the table shown not by the selected month, but all months in the year-to-date (similar to, but not exactly the same as, what I describe here).
Uses the DAX Crossfilter() function to disable the relationship between the Date and Sales table, and then use the SelectedValue() function to find the date from the Sales table shown on the current row of the table on the drillthrough report page, and store it in the CurrentDateFromSales variable.
Constructs a table using the DatesYTD() function and the Date column of the Date table, which contains all of the dates from the beginning of the current year up to and including the last date in the filter context – which will be the last date in the month selected in the drillthrough.
Uses the Contains() function to see if the date saved in the CurrentDateFromSales appears in the table returned in the previous step.
If it does appear, return the value of the Sales Value measure. Once again, this needs to have the relationship between the Sales and Date table disabled using the CrossFilter() function.
This measure can be used in the table on the drillthrough page instead of the Sales Value measure. Last of all, since your users will not want to see a measure called SalesIgnoringDate in their report, you can rename the SalesIgnoringDate column on the table to Sales Value.
Here’s the end result (in this case I created a new drillthrough page called YTD Drillthrough with the new measure on):
This is just one example, and different types of calculation on your source page will require vastly different DAX measures on your drillthrough page to ensure that a meaningful set of rows is returned. The basic concepts will remain the same whatever the calculation, though: you need to create a measure that ignores the filter applied by the drillthrough and instead returns a value when you want a row to appear in your drillthrough table and returns a blank value when you don’t want a row to appear.
It’s a shame that drillthrough in the SSAS Tabular sense is not available in Power BI, because being able to set a the Detail Rows Expression property on a measure in Power BI would make this problem a lot easier to solve.