When the new “Refresh a dataset” action for Power Automate (formerly Flow) was released last year I couldn’t help thinking there must be something really cool you could do with it – I just didn’t know what. There are lots of great resources explaining the basics (see Jon Levesque’s video here for example) and Adam Saxton did a nice demo here using the “When an item is modified” trigger to show how to refresh a dataset when a value in a SQL Server table is updated, but that’s it. So I got thinking about the types of problem it could solve and the fun I could have with it…
While Power BI’s scheduled refresh functionality is great, it doesn’t give you as much flexibility as you might need. For example:
You might want to schedule your dataset to refresh only on weekdays, not weekends, and you might also want to cancel refresh on certain days like public holidays. You might also only want to refresh a dataset on a monthly basis or at less than the half-hourly granularity that the UI allows. Why? Perhaps because it’s important to minimise the load you put on your source systems; it’s also the case that for many cloud data sources the more data you read, the more you pay.
If you have a lot of datasets to refresh you might want to control which datasets are refreshing in parallel, again to reduce the load on your data sources and if you’re using Premium, to reduce the load on your capacity. It’s hard to get an overview of when all your refreshes are scheduled in the Power BI Portal and manage what’s happening when.
The ideal way to view when multiple events are scheduled is a calendar and we’ve got great calendar functionality in Outlook. What if you could schedule refresh of your datasets from a calendar in Outlook? It turns out to be easier than you might think! Here’s how.
The first thing I did was create a new calendar in Outlook called Power BI Refreshes:
In this calendar I created appointments (either recurring or one-off) for every dataset refresh:
For each appointment, I entered the unique identifier of the dataset in the Title and the unique identifier of the workspace in the Location like so:
You can find these unique identifiers by going to the Settings screen for your dataset in the Power BI Portal and looking at the url:
Cathrine Wilhelmsen has more details on finding these ids here.
Last of all, I created a very simple Flow in Power Automate:
The “When an upcoming event is starting soon” trigger is fired when each of the appointments on the Power BI Refreshes calendar is about to start. It then passes the Location and Subject from the event – which of course contain the ids the workspace and dataset to be refreshed – to the Refresh a dataset action, which does the refresh.
This isn’t something I recommend putting into production but I think it’s very interesting as a proof-of-concept. I guess Logic Apps would be a more robust alternative than Power Automate and I would want to be 100% sure that events fired when I was expecting them to fire, so some thorough testing would be needed. I’m not experienced enough with Power Automate/Logic Apps to know if I’m doing the right thing, to be honest. I also feel like using ids in the meeting title and location is a bit hacky and there must be a nicer way of handling this.
On the Power BI side, it’s worth remembering that when a refresh is kicked off in Power BI the actual refreshing only starts when the Power BI Service has the required resources available, and especially in Shared capacity this can involve a wait of several minutes. What’s more the “Refresh a dataset” action does not know whether the refresh it kicks off succeeds or fails; I guess if you wanted to handle retries or notifications on failure then you would need to call the Power BI API get the refresh history of a dataset – there’s no built in action to do it, but it’s possible with a Power Automate custom connector.
If you have any thoughts about this, ideas on how to make this better or if you do put something like this into production, let me know – I would love to hear from you!
Following on from the series of posts I wrote last month (starting here) about testing the performance of Power BI reports in the browser, I would now like to highlight one possible cause of slow report rendering performance: the over-use of custom visuals. Before we go any further, though, let me be absolutely clear about something:
I am NOT saying that custom visuals are bad and that you should never use them.
However if you are using an older browser to view reports, have a slow PC and/or have a slow network connection, then having a large number of custom visuals on the same page can be bad for performance. In fact having a large number of any kind of visual on the same report page can be bad for performance, as Mike Carlo shows here. Even so there is an extra performance overhead when you use a custom visual instead of one of the built-in visuals.
Take for example, the following report. The dataset it uses has one table with one column and one row containing the value 1, so retrieving data from it is going to be super-fast. This page has 26 Advance Card custom visuals on it displaying the value 1 from the dataset:
Using Chrome DevTools and the functionality on the Network tab to simulate a cold cache and a fast 3G network (as I show here), this page takes 45 seconds to render. Another page on the same report has 26 instances of the built-in card visual to show the same data:
Under the same simulated network conditions, Chrome DevTools shows this page takes 25 seconds to render. So the version of the page with the custom visuals is 20 seconds slower to render than the version with the built-in visuals, although this is obviously an extreme example. Without the network throttling (but with the cold cache) the version with the Advance Card visuals loads in 8.5 seconds and the version with the built-in cards takes 5.5 seconds; still a noticeable difference but maybe not one that your users would complain about.
Let me be clear about something else here:
I am NOT saying that the Advance Card visual is badly implemented – it’s a great visual and one that is used by a lot of people without any problems. What I am saying is that any custom visual comes with some kind of performance overhead and the more custom visuals you use on a page, the more the overhead will be.
How much overhead there is will depend on a lot of different factors so you should test to see if there is any noticeable impact from using custom visuals in your own reports – it’s very likely that there won’t be one. What’s more, there are improvements being made to the Power BI Service all the time and it’s highly probable that some optimisations for the loading of custom visuals will be made in the future.
One last tip for you. If you have imported a custom visual in Power BI Desktop but are not using it, you should always delete it (as shown here). There is another, smaller overhead associated with even having a custom visual imported into a report – although, again, this may well change in the future.
One new feature that was introduced in the December 2019 release of Power BI Desktop, but which wasn’t mentioned in the release blog post, is the “Diagnose Step” button on the Tools tab of the Power Query Editor ribbon:
As you might have guessed, it’s closely related to the Query Diagnostics functionality that was introduced back in October. Whereas the existing Query Diagnostics functionality allows you to see what happens inside the Power Query when a query is executed, this new feature does something similar but allows you to run a query up to a specific step. This is useful in scenarios where you want to reduce the diagnostics data you are collecting to a subset of the steps in the query without having to comment out a lot of M code.
To illustrate this, consider the following super-simple query with three steps:
If you click the Start Diagnostics button on the Tools tab of the ribbon of the Power Query Editor, refresh the query from the main Power BI window (as I show here), then click the Stop Diagnostics button, then the query will be run, the result of Step3 will be loaded into the dataset and the diagnostic data collected will be for the whole query.
However, if you select Step2 in the Applied Steps pane:
…and then click the new Diagnose Step button on the Tools tab of the ribbon, the query will only be executed up to Step2 (meaning any steps that Step2 is dependent on will also be evaluated) and the diagnostics data collected will reflect that.
In part 1 of this series I talked about why you might want to test the performance of your Power BI report in the browser; in part 2 I showed how you could test report performance in the browser; and in this blog post I’ll walk through an example of a report that is slow and show how to find out what the problem is.
Let’s say you are building reports for a chain of fast food restaurants and you have just created a new one showing sales of your products. It consists of one page with one rather smart-looking column chart visual on it:
…so you go ahead and publish. You view the report after publishing and it still seems fast. Then the complaints start coming in: the report is slow!?! It seems to be users who are viewing the report on their phone who are having the most problems. So, following the instructions in my last post, you open up Chrome DevTools and run an audit using a simulated slow 4G connection:
You get the following results:
They look pretty awful – 30.5 seconds for the report to render! Scrolling down on the audit results you also see the likely cause:
A large network payload, with the number one culprit a large jpg? So next you go to the Network tab and do a hard refresh of the report, disabling the cache and simulating a fast 3G connection:
Here’s what the report render looks like:
As you can see, it not only takes over 30 seconds to render the report, but worst of all the column chart is only visible right at the end. The waterfall shows something is being downloaded that is 1.2MB in size and that this takes 7.23 seconds:
Hovering over this request displays a tooltip that gives the full filename which, again, is a jpg file. The only image on the report is the background image used in the column chart and it turns out it is 1.2MB in size:
So it’s the background image used in the column chart that is the main problem! You remove the background image from the visual:
…and as expected, when you re-run the audit using the same settings as before the report renders seven seconds faster, taking 23.4 seconds:
Still not great, I know, but remember this is worst-case performance: not only does the audit recreate slow network and hardware, but it also recreates a cold cache – something your users will rarely encounter.
Hopefully this fairly simple example shows how useful the tools and techniques shown in the first two parts of this series are for troubleshooting certain types of report performance problem. It also goes to show how important it is to make sure any images you use in your report are as small as possible – something that, in my experience, many Power BI developers don’t always do.
In the first part of this series I talked about why you might want to test the performance of a Power BI report in the browser, as opposed to using Performance Analyzer in Power BI Desktop, and the different factors you have to take into account when doing this kind of testing. In this post I’ll describe how you can find out how long it takes for your report to run in a browser.
You can think of Power BI as a low code/no code tool for creating web applications for reporting and analytics, and as such you can use the same tools that web developers use for measuring web site performance to measure Power BI report render time. Now I am not a web developer (I guess most of you aren’t either) so I’ve had a lot to learn about this subject, and I’m all too aware of how much more there is still to learn, but I wanted to write up what I have learned so far because I think it’s so useful. Please excuse any mistakes, misconceptions or over-simplifications!
I’m going to use Google Chrome as my browser in this post. Yes, I know, I work for Microsoft but this is the new Microsoft and I have used Chrome as my browser of choice for the last few years; anyway, Edge is moving to be Chromium-based so everything I show in this post works in exactly the same way in the current Edge beta I have installed on my PC. As I said last time, it’s also important to test performance in whatever browser your end users use to understand what their experience is, but any optimisations you make in your report that work for Chrome or Edge beta are likely to work for other browsers too.
Here’s the methodology that I have come up with for measuring report performance in the browser:
Choose the page you want to test
You can only test the performance of one page in a report at a time, so choose the page that is either the most used, used by the most important users, or takes the longest time to render.
Add a blank page to your report
Before you publish your report, in Power BI Desktop add a blank page with no visuals on to it. It doesn’t need to be the page that is opened when the report opens and you will be able to delete it later. Why do this? When you’re testing how long it takes for your report page to render, you’re probably doing so because you want to improve performance. Some things in the report page that influence performance you have the power to change, such as the design of the dataset, the DAX in the measures, the number and type of visuals on a page; some things will always happen when a report runs and you have to accept that overhead. Testing how long a blank page takes to render will give you an idea of how long this latter category of “things that always happen” takes, and you can subtract this time from the time your chosen report page takes to run.
Get the embed url for your report page and the blank page
After you have published your report, get the secure embedding urls for your chosen report page and the blank page. You can find out how to do this here. Getting the embed url for the report takes a few clicks, but you will need to do a bit of copy-and-pasting to get the embed urls to open the report on the blank page and your chosen report page.
Why do this? Once again, it’s all down to focusing on what you have control over. When a user views a report they normally do so in the Power BI Service portal and the time this takes consists of the time taken to render the portal plus the time taken to render the report. Since you don’t have any control over how long the portal takes to render you might as well ignore this in your testing, so you can get a clearer picture of the amount of time taken to run the report. When you paste the embed url for the report into a browser tab the portal will not be displayed and you’ll only see the report.
Create a new user profile in Chrome
This is something I do anyway for each customer I work with, because it makes handling multiple Power BI logins much easier, but for performance testing you should create and use a new user profile in Chrome. This allows you to delete any Chrome extensions just for this profile (apparently they can affect page render times) and delete the cache if you want, without affecting your main profile. You may also need to test from an incognito window if there are some extensions you can’t delete.
Run your report pages to warm the SSAS/Power BI cache
Copy the embed urls for the blank page and the report page into new tabs and let them load before you do anything else, to warm the SSAS/Power BI cache. I discussed testing performance on a cold or warm SSAS/Power BI cache in my last post, and as I said then, while I believe it is extremely important to understand the impact of SSAS/Power BI caching on report rendering performance, it’s only really feasible to do the kind of DAX performance testing that requires this in Power BI Desktop. For testing report page rendering performance in the browser it’s easier to assume that you are running on a warm SSAS/Power BI cache.
Create an audit
The easiest way to measure report load time is to open Chrome DevTools (you can do this by clicking Control+Shift+j in Chrome) while on the browser tab with the report page you want to test and use the functionality on the Audits tab. To get an understanding of what the Audits tab does I recommend you read this page and watch the accompanying video, paying particular attention to the steps in the section on how to establish a baseline. To keep things simple I suggest you choose the Desktop option in the Device section (choosing Mobile does not seem to render the Phone layout of your report if you have created one); check the Performance box in the Audits section; and then choose to apply one of the throttling options if you want:
When you click the Run audits button various tests will be run, and after a short wait you’ll see the results:
You can read more about what these results mean here but there is one number that a Power BI developer should be particularly interested in: Time to Interactive. As far as I can see this is pretty much the amount of time it takes for the report page to render (because, of course, defining what “render” means is complicated). Subtract the amount of time it takes to render the blank page from the amount of time it takes to render your chosen report page and you will know how much of the report page render time you can actually influence with any changes you make. I strongly recommend you run tests both with and without network throttling applied, because applying throttling really highlights any inefficiencies in your report design.
Use the Network tab
If you want to peak behind the scenes and see more detail about what happens when your report page renders, as well as have more control over things like network throttling, you can use the Network tab in Chrome DevTools. You can find out how to use it here (with more details here) but all you really need to do is:
Open up a new tab with the embed url of the report page you want to test
Open Chrome DevTools and go to the Network tab
Make sure the Disable Cache box is checked and set up any network throttling options you want to use for your tests. The ability to add custom network configurations is particularly useful:
Press and hold the refresh button on the browser toolbar and a dropdown menu will appear; click on Empty Cache and Hard Reload to refresh the page and simulate a user’s first visit to the report. This will automatically start data capture in the Network tab.
When your report page has finished rendering, click on the red stop button in the top left hand corner of the Network tab. You need to be quick but not that quick, for reasons I’ll explain in the next bullet.
Down at the bottom of the Network tab you’ll see some useful metrics, including the number of requests made, the amount of data transferred and most importantly, the finish time. This last metric is not the time elapsed from the start of the refresh until the point you clicked the stop button; it’s the time elapsed from the start of the refresh until the last request before you clicked stop, so this is another good way of finding the render time of your report. You’ll see that requests do get made after the report has finished rendering which is why you need to be fairly quick clicking the stop button, but this is still a lot more accurate than using a stopwatch to time report page rendering.
The majority of the pane will be taken up with a list of requests made and a waterfall chart that visualises how long each request took.
Don’t get too clever!
You’ll see a lot of really interesting information on both the Audits tab and the Network tab and I’m sure most of you will be able to interpret what you see there and use it to tune your reports. I hesitated about mentioning the waterfall chart of the Network tab in this post but decided to do because I have found things on there that have helped me while performance tuning that I would not have found any other way. However, remember everything that you see on the waterfall chart in the Network tab is internal, undocumented and subject to change at any time. As a result please, please, please do not send me messages like “I’ve seen x happening on the Network tab, can you tell me what it means?” or “I’m a web developer and I can’t believe the stupid stuff I can see Microsoft doing here, why are you doing this?” because I’m not able to provide any answers.
As always, if you have any feedback or suggestions please leave a comment. In the next part of this series I’ll show you a worked example of how you can use all this to solve a performance problem that does not manifest in Power BI Desktop.
If your Power BI report is slow, most of the time the reason is something to do with the way you have modelled your data or the way you have written the DAX in your measures. There are plenty of resources out there to help you troubleshoot this kind of problem (Marco’s video “My Power BI report is slow: what should I do?” is a great place to start) but it’s important to understand that there are several other possible reasons why a Power BI report is slow, and these types of performance problem may only manifest themselves after the report has been published when a user is viewing it in the browser. As a result, it’s important to be able to test how long a report takes to render in a browser. If you can’t do this, how can you know if any changes you make help or hurt performance?
It turns out that testing performance of a report in the browser is not as straightforward as it seems. In this post I’m going to describe some of the factors you have to take into account when doing this type of testing; in the next post I’ll go into more detail about how you actually measure report rendering times in the browser and how to see what happens when the report is rendered.
Power BI is always changing
The first thing to remember is that the Power BI Service is always being updated and improved, which means that even if everything else in your environment stays the same the performance of any given report may vary (and hopefully get better) from day to day. Most changes to the Power BI Service are not publicised and there’s no way of knowing how much impact they will have on your report. You need to take this into account when you do testing: don’t create a benchmark and then wait a week to see if any changes you have made have improved performance. Instead create a new benchmark every day you are testing and compare the changes you make to it.
Use the browser your users are using
Report performance can vary a lot depending on the browser you’re using. Before you ask, I don’t have any recommendations for the best or fastest browser to use for Power BI but generally speaking the more ‘modern’ your browser the better; this tweet from my colleague Chris Hamill showing how much faster the new Chromium-based version of Edge is compared to regular Edge is really striking. If you’re going to test performance of a report, though, you need to use the browser(s) your users are using, especially if that browser is IE.
The spec of the PC or mobile device running the browser is also a major factor in performance. As a Power BI developer it’s likely your hardware will be far superior to whatever device your users have to view your report with. It may be a good idea to borrow a PC, laptop, phone or tablet that is equivalent to what your users use, just for testing purposes.
Users in different parts of the world with different internet connections will experience different report performance, and network conditions and therefore report performance may also vary by time of day. Power BI has some basic functionality to show the network latency that users are experiencing but it’s better to get someone who really knows what they are doing to help you understand what’s happening at the network level (I am absolutely not an expert in this). To a certain extent you can reproduce a different network conditions in the browser for testing purposes, but ideally you should to try to reproduce any problems and do any testing wherever the users are based and at the times of day they are using Power BI.
Power BI query caching
Datasets in Power BI Premium have an option to cache queries and improve performance for the initial page of a report. If you have this turned on, it means that the initial report load might be faster than subsequent interactions such as changing a filter or a slicer, so you should not just test the initial render time.
The Power BI/Analysis Services cache
Even if your dataset is perfectly modelled and your DAX is tuned to perfection, the caching that takes place within the Power BI engine (or Analysis Services, if you are using a Live connection) will almost always mean that the first time you run a query it will be slower than subsequent executions.
With Analysis Services Live connections you can clear the cache, and get worst-case cold-cache execution timings by running and XMLA ClearCache command from SQL Server Management Studio or by using DAX Studio’s Clear Cache button. You should also ensure that you are running your tests on an instance of Analysis Services that no-one else is connected to or doing anything on: this ensures that other users’ queries do not populate the cache, or use resources that affect the performance of your queries.
Things aren’t so simple with Power BI Import mode datasets – I’m pretty sure you can use XMLA to clear the cache of a Premium capacity via XMLA endpoints although I’ve not tried it, but even then it’s not likely you can ensure that you’re the only user on that capacity. With Shared capacity there’s no way to clear the cache. As a result you should instead accept that you can only test warm cache performance in the browser by loading your report once before you test performance. In any case, I think the kind of DAX-focussed testing that requires a cold cache can be done more easily in Power BI Desktop.
If you’re using a DirectQuery dataset then you also need to worry about caching, security and other users running queries in the underlying data source. Again, as far as possible you should try to ensure that no-one else is running queries or doing anything else in the data source that could have an effect on the performance of your queries.
Simulating activity from other users
With DirectQuery connections, Live connections and Import mode, it might be that other users’ activity or processing/refreshing of cubes/models/datasets is causing the performance problem. I still think it’s important to do any testing when there are no other real users around – if you want to simulate other users running queries while you do your testing there are tools that can do that, including the Premium capacity load testing tool.
Do what your users do
Sometimes performance problems only occur in very specific circumstances, for example when a specific item has been selected in a filter or more than one item is selected, so it’s really important to create tests that reflect real-world patterns of interaction. Don’t forget that users might also be subject to row-level security that you as a developer are not affected by.
As you will have realised by now, creating realistic, repeatable tests for Power BI reports in the browser is not an easy task because there are so many different things to take into account. I’m not going to pretend this is an exhaustive list (I haven’t mentioned factors such as using a dataset that has the same data volumes as your production dataset, because that’s not specific to testing reports in the browser) but if you think I have forgotten something please leave a comment below – I would be interested to hear your thoughts.
Click here for part 2 in this series, where I show how to measure report render time in the browser. Click here for part 3, where I show a worked example of how to use the tools and techniques described in part 2.
Back in 2018 I wrote a blog post about how to troubleshoot web service refresh issues using a third party tool called Fiddler. It’s still relevant but Fiddler is a complex tool and installing it and giving it the right permissions to see everything it needs to see isn’t easy. Luckily, the new Power Query Diagnostics feature in Power BI (released in October 2019) means you don’t need to use Fiddler for this kind of work any more. Full details on how to use this feature are given here, but I thought it would be useful to focus on one particular use case for it.
Let’s take one of the Power Query queries from my previous post as an example:
Source = OData.Feed("https://api.parliament.uk/odata"),
#"Filtered Rows" =
each ([GroupName] = "Cabinet Office")
[Note that this is a public web service which requires no authentication, so you will be able to run this query yourself]
This query connects to the UK Houses of Parliament OData API and returns the table of government organisations filtered down to the row where the GroupName column equals the text “Cabinet Office”. What requests are made to the web service when this query runs?
The first thing to point out is that different things will happen depending on whether you refresh the preview window in the Power Query Editor or refresh the dataset inside the main Power BI window. If you’re doing performance tuning you should ignore what happens when you refresh the preview in the Power Query Editor and always refresh from the main Power BI window.
The second is that, when you refresh a dataset (or a table within a dataset) from the main Power BI window, you might find that more work is being done than you expect as a result of “background refresh” taking place. This is something I blogged about here and you should always turn it off before doing any performance tests, because it’s not something that will happen in the service and it can also slow things down – potentially a lot if there are lots of queries and steps. You can turn it off from the Options dialog in the Data Load tab by deselecting “Allow data preview to download in the background”.
The third is that the Query Diagnostics feature I’m going to use is, at the time of writing, in Preview, so you may need to enable it in the Preview features pane of the Options dialog:
With that done, here are the steps to find out what requests are made to the web service when the above query is run.
First, let the query load data into the dataset as usual without doing anything else. You’ll see the table that the query has loaded data into in the Fields pane of the main Power BI window:
Next, reopen the Power Query Editor window, go to the Tools tab and click on the Start Diagnostics button.
Do not do anything else in the Power Query Editor window, instead minimise (do not close) it and then go back to the main Power BI window. There, right-click on the table that holds the output of the query in the Fields pane, right-click and select “Refresh data”:
Once the refresh has taken place, go back to the Power Query Editor window and click the Stop Diagnostics button:
When you have done that, you’ll see two new queries created in a group called Diagnostics:
These queries contain the diagnostics information for the refresh you’ve just done. The query with “Detailed” in its name has all the detailed information; the other query contains summarised data.
Looking at the “Detailed” query, each row represents a single operation that takes place within the Power Query engine when the refresh took place and while I don’t know what all this means (I’m working on it!), if you go to the Data Source Query column you’ll see a list of all the HTTP requests made to the web service:
Clicking on an individual cell in this column allows you to see the full text of the request and response made in the preview pane underneath the results pane:
While Fiddler may still be necessary for some more advanced scenarios, this is a much easier way of troubleshooting web service data sources in the Power Query Editor, for example when you need to check the exact request made to see if query folding is taking place on an OData data source.
Among all the exciting announcements made at Ignite last week, one you may have missed (even in all the Excel-related announcements here) was that dynamic arrays have finally reached GA. Ever since they were announced I’ve been interested in how they can be used with Excel cube functions, which allow you to get data from Analysis Services, Power BI and the Excel Data Model/Power Pivot into cells on the worksheet, and I’ve blogged about this once already. Even though right now the CubeValue function can’t be used with dynamic arrays – alas – there are still some interesting ways dynamic arrays and cube functions can be used together and in this post I’ll show you another one I’ve found.
Say you have the following set of tables in the Excel Data Model:
And the following measure that counts the number of customers who bought something:
Let’s also say you have on a worksheet the following table of data showing the number of distinct customers broken down by product, created using the CubeMember and CubeValue functions:
Here are the formulas for this table:
So far, all very straightforward. Now let’s say you want to use a dynamic array to filter the rows in this table so you only see the products that have more than a certain number of distinct customers, and you want to see a grand total for this filtered list of customers, like so:
Here are the formulas to achieve this:
To explain what’s going on here:
Cell C2 contains the text “ThisWorkbookDataModel”, the name of the connection to the Excel Data Model. Doing this makes the other Excel formulas here much easier to read!
Cell F2 – which I’ve turned into a named range called FilterThreshold – contains the number of customers which a product must exceed to be shown.
Cell E5 contains the dynamic array Filter function that filters the rows shown in the first two screenshots above and only returns the products that exceed the threshold. The formula is:
Cell E6 does something similar to E5 but returns the number of distinct customers for each product. If all I wanted was the products and the number of distinct customers I wouldn’t have needed two separate formulas, I could have used one, but I need to add the grand total too…
Cell E4 contains the formula:
CUBESET($C$2, E5#, “Total”)
This creates a set from the spill reference E5#, which contains the cells containing CubeMember formulas returned by the Filter function. For some reason CubeSet returns an error if you try to use a dynamic array formula in its second parameter but using a spill reference works ok; this is why I had to use separate formulas in E5 and E6.
Finally, cell F4 contains the formula that returns the total distinct count for all products returned in the filtered list using CubeValue and the set created in cell E4:
CUBEVALUE($C$2, “[Measures].[Distinct Customers]”, E4)
Since this is a distinct count the only way to get the correct value here is using CubeSet, CubeValue and the DAX measure – there would be no way to calculate the correct value using Excel formulas. The same could be said of almost any other DAX measure that wasn’t a simple aggregation.
You can download the sample workbook for this post here.
Back in 2016 I wrote the following blog post about changes to the way Excel 365 generated MDX queries for PivotTables connected to Analysis Services, Power Pivot/the Excel Data Model and Power BI datasets:
I know it sounds boring and not something you need to worry about but trust me, this is important – these changes solved the vast majority of Excel PivotTable performance problems that I encountered when I was a consultant so you should read the above post before continuing.
Unfortunately, earlier this year these changes had to be partially rolled back because in some rare cases the queries generated returned incorrect results; this means that you may find that values for subtotals and grand totals are again being returned even when they aren’t being displayed. The good news is that you should still be able to get the improved performance with a few minor tweaks.
Using the example from the previous post, a PivotTable connected to SSAS MD and the Adventure Works cube (the queries generated for SSAS Tabular, Power Pivot and Power BI may be slightly different and slightly better but the basic problem is the same), if you build the following in Excel:
…even though the subtotals in the PivotTable are not displayed in Excel, the MDX query generated by Excel not only returns them but also returns others that are not needed at all – in fact 36 rows (although the query returns them as columns) of data rather than the 13 rows that are displayed. Here’s a screenshot of the results returned when you run the MDX query in SQL Server Management Studio:
Any time you see a row or column containing an All Member (in this case All Customers or All Products) you know you are looking at a subtotal or grand total.
In this case, to get an efficient query, you need to explicitly turn off subtotals and grand totals for the PivotTable in Excel:
In the PivotTable the only difference you’ll see is that the grand totals are now not displayed:
But the query generated now only returns the values that are actually needed, and as a result will be a lot more efficient and potentially a lot faster. In this case, the query now only returns the 12 rows of data displayed:
Even with subtotals and grand totals turned off there are still some cases where unwanted values will be returned. Take the following PivotTable, where I have used the Calendar hierarchy from the Date dimension on columns and filtered it so I only see the three months in Q1 CY 2012:
The MDX query generated by Excel requests requests four extra columns with subtotals and a grand total that aren’t displayed:
There is a fairly simple workaround though. By changing how the PivotTable is constructed – in this case by not putting a hierarchy on columns but just the Month Of Year attribute, then adding slicers for Calendar Year and Calendar Quarter to control which months are displayed – you can get an efficient query. This version of the PivotTable:
…results in this query being generated, which only returns the required values:
To sum up, then, if you have a PivotTable that seems a bit slow:
Turn off subtotals and grand totals unless you really need to see them
Look at the MDX query being generated by Excel and see if it is still returning unnecessary subtotals and grand totals, and if it is try rebuilding the PivotTable to get the same results in a different way (for example by not drilling down on hierarchies as in the last example) to get a more efficient query
[All the queries in this post were generated by Excel 365 version 1910 build 12130.20238]
This week I was honoured to be a guest on the Guy In A Cube channel, and for my topic I decided to tackle one of life’s eternal questions: why does Power BI query my data source more than once when I refresh my dataset?
You can watch the video here:
Although I’ve never answered this question directly in a blog post before, nevertheless almost every technique I showed in the video is something I’ve blogged about so I thought it would be useful to collect all the links to these posts in one place to provide some background to what I show in the video.
The first thing to say is that this is really a Power Query question, not just a Power BI question; a lot of what I show will therefore be relevant to Excel’s Get&Transform, Power BI dataflows and every other manifestation of Power Query out there, although I can’t guarantee that all the options and behaviour will be the same in these places.
If you’re developing in Power BI Desktop and you think that refresh is taking a long time, you should definitely check whether the Power Query engine is hitting your data source more than once. There are lots of ways to do this. Some data sources have tools that show when they are queried, such as the Run History screen in Microsoft Flow that I show in the video or SQL Server Profiler. Other ways include using Fiddler for web services or Process Monitor for files.
Next, if you find that Power Query is indeed querying your data source multiple times, the most common explanation is that it’s doing a “background refresh”. This happens by default when you refresh a dataset and is Power Query refreshing all the data previews that you see when you click on a step in the Power Query Editor window. In most cases this is so quick you don’t notice it but when you have lots of queries with lots of steps it can be a gigantic problem. You can turn it off in the Options dialog in Power BI Desktop, as shown in the video, or in the Excel Query Options dialog. I blogged about this issue here.
Another possible reason is the data privacy settings you have applied. This is a gigantic, complex topic and something that I and other people have blogged about many times. I wrote a multi-part series of posts explaining data privacy settings starting here and there’s also a great explanation of what’s going on in behind the scenes written by the dev team that I link to here; I also wrote a post here showing an example of the performance impact of data privacy checks. As I say in the post, however, do not play around with these settings unless you really know what you’re doing.
If you’re ok with writing some M code, using the technique I blogged about here to implement basic query folding on a web data source can be important – in the example in the video, it was doing this that reduced the number of calls to the web service from six to three.
Some transformations can also lead to data being read from a data source more than once. For example, in this post I showed how a merge transformation leads to multiple reads from the same Excel file and how to stop this happening and drastically improve performance.
Moving on, another important lesson to learn is how referenced queries are evaluated. This is one of the most counter-intuitive things about Power Query! With the set of referenced queries used in my demo:
…where three queries called Referenced Query 1, Referenced Query 2 and Referenced Query 3 each reference a query called Call Web Service and are loaded into a dataset (and where Call Web Service is not loaded into the dataset), most people would assume that when a refresh takes place the following happens:
The Call Web Service query is run, getting the data from the web service
The data returned is then passed to Referenced Query 1, Referenced Query 2 and Referenced Query 3
This is wrong. In fact what happens is this:
Referenced Query 1 is refreshed. Because it references the Call Web Service query, that means the Call Web Service query is also evaluated.
Referenced Query 2 is refreshed. Because it references the Call Web Service query, that means the Call Web Service query is also evaluated.
Referenced Query 3 is refreshed. Because it references the Call Web Service query, that means the Call Web Service query is also evaluated.
This explains why the Call Web Service query is still being evaluated three times once the M code above has been added to it. You can read a detailed explanation of how referenced queries are evaluated here.
There is something we can do to help here though: the Power Query engine can cache the results returned from some types of data source, such as web services, but to take full advantage of this caching in this particular scenario you have to turn off the Enable Parallel Loading Of Tables option on the Options dialog as I show here. With that done, each of the three referenced queries run one after the other, instead of in parallel. When the first of these queries runs its calls the Call Web Service query, which gets data from the web service; when the next two queries run, though, and they trigger two more evaluations of Call Web Service, the data for the call to the web service has been cached. This caching can be turned off using the IsRetry option in Web.Contents as Curt Hagenlocher explains in a comment here.
The final thing shown in the video is that the infamous Table.Buffer M function does not improve performance in this case and in fact makes performance worse. If you use Table.Buffer inside the Call Web Service query it will indeed buffer the result of the call to the web service into memory, but the data that is buffered cannot be shared between separate query executions – so the buffering happens three times, once for each time Call Web Service is evaluated by the three referenced queries. Again this is something discussed in this post. Of course there are plenty of other scenarios where Table.Buffer will help performance, but these will only be in cases where the same query requests data from the same data source more than once.
Phew, that was a lot. I don’t pretend to know everything about when and why Power Query gets data from a data source multiple times but this is the sum total of my knowledge right now. I hope it’s useful!
[Update: it’s just been pointed out to me, quite rightly, that the names of my queries are confusing. For example, “Referenced Query 1” is the query doing the referencing, not the query being referenced – which is Call Web Service. I haven’t changed the names because I want the blog post to remain consistent with the video, and I hope the diagrams and the context make everything clear]