The most exciting (at least for me) feature in the new Enhanced Refresh API (blog announcement | docs) is the ability to cancel a dataset refresh that’s currently in progress. Up until now, as this blog post by my colleague Michael Kovalsky shows, this has been quite difficult to do: not only do you need to use the XMLA Endpoint but you also need to take into account that in many cases the Power BI Service will automatically restart the refresh even after you’ve cancelled it. Now, though, if (and only if) you start the refresh using the Enhanced Refresh API you can also cancel it via the Enhanced Refresh API too. This is important because I’ve seen a few cases where rogue refreshes have consumed a lot of CPU on a Premium capacity and caused throttling, even after all CPU smoothing has taken place, and Power BI admins have struggled to cancel the refreshes.
This and all the other great functionality the new API includes (the ability to refresh individual tables or partitions! control over parallelism!) means that it can handle many of the advanced scenarios that, in the past, you’d have had to write some complex TMSL commands for; in my opinion anyone working on an enterprise-level dataset in Power BI Premium should be using it for their refreshes.
But Chris, I hear you say, I’m a data person and find working with APIs confusing and difficult! Yeah, me too – which is why, when I saw this tweet by Stephen Maguire about .NET interactive notebook for Visual Studio Code he’s built for the Enhanced Refresh API, I was interested:
It’s a really great set of examples for learning how to use the Enhanced Refresh API through PowerShell and the notebook format makes it a lot more user-friendly than just another bunch of scripts. I highly recommend that you check it out.
In the first post in this series I showed how any Power BI dataset refresh started via the Power BI portal or API is limited to 2 hours in Shared capacity and 5 hours in Premium capacity, and how you could work around that by running a refresh via Premium’s XMLA endpoint feature. In the second post in this series I showed how some M functions allow you to set timeouts. However, even if you initiate a refresh via the XMLA endpoint you may still get a timeout error and in this post I’ll discuss another reason why: the External Command Timeout.
This property is a hangover from Analysis Services (you can see it documented here). It represents the amount of time the Analysis Services engine inside Power BI will wait to get data from a data source. How it behaves exactly depends on the data source: it may limit the amount of time it takes to get the first row of data from the source or it may limit the amount of time it takes to get all the rows of data. In Power BI Premium it is set to five hours, which means that no single partition can take more than about five hours to refresh. In the first post in this series I worked around this by creating a dataset with multiple partitions, each of which took about an hour to refresh, but when trying to refresh a dataset with a single partition that takes more than five hours I got the following error when trying to refresh from SQL Server Management Studio through the XMLA Endpoint:
The error message here is:
Timeout expired. The timeout period elapsed prior to completion of the operation.. The exception was raised by the IDbCommand interface.
In this case I saw the same error in the Error event in Profiler:
…and in other cases, when testing a different source, I got a different error in Profiler in the Progress Report Error event:
There’s no way to avoid the External Command Timeout. Instead, what you need to do is either change your partitioning strategy so each partition refreshes in under five hours or tune your data source, M code or gateway (if you’re using one) so that data is returned to Power BI faster.
In Shared capacity I believe the External Command Timeout is set to two hours (again, to match the overall refresh timeout) but it’s much less important there because you can’t create partitions manually (the only way a dataset in Shared can be partitioned is by setting up incremental refresh) and there’s no XMLA Endpoint so there’s no way to work around the two hour overall refresh limit anyway.
[Thanks, as always, to Akshai Mirchandani for a lot of the information in this post]
If you’re working with a large Power BI dataset and/or a slow data source in Import mode it can be very frustrating to run into timeout errors after you have already waited a long time for a refresh to finish. There are a number of different types of timeout that you might run into, and in this series I’ll look at a few of them and discuss some of the ways you can work around them.
In this post I’ll look at one of the most commonly-encountered timeouts: the limit on the maximum length of time an Import mode dataset refresh can take. As documented here these limits are:
Two hours for an import-mode dataset in Shared capacity
Five hours for an import-mode dataset in a Premium or PPU capacity
Here’s an example of the message you will see in the Refresh History dialog if your dataset refresh takes more than five hours in PPU:
Once again you get more details in a Profiler trace. When the refresh times out you’ll see an Error event with long message that starts with the following text:
Timeout expired. The timeout period elapsed prior to completion of the operation.. The exception was raised by the IDbCommand interface. The command has been canceled..
There is a way to work around this limit if you’re using Premium or PPU: it only applies to refreshes (scheduled or manual) that you start from the Power BI portal or via the REST API. If you start your refresh by running a TMSL script via the XMLA Endpoint (for example via SQL Server Management Studio or from PowerShell) then the limit does not apply. This is because you’re executing your own TMSL script rather than having the Power BI Service execute its own refresh command – with a timeout set – like the one shown in the screenshot above.
For example, here’s the same dataset from the screenshot above but refreshed successfully from SQL Server Management Studio (notice the Type column says “Via XMLA Endpoint”) and with a duration of just over five hours:
There are a couple of blog posts out there showing how you can implement a refresh strategy using the XMLA Endpoint; this post from Marc Lelijveld and Paulien van Eijk is a great example of how to do this using Azure Automation and Azure Data Factory.
You should also consider tuning any refresh that takes a long time and it could be that after tuning it you fall well within the two/five hour limit. There are a lot of things to consider when tuning dataset refresh; I did a conference session on this topic last year (you can watch the recording here) covering issues like data modelling, query folding in Power Query and the use of dataflows but it doesn’t cover everything and I have learned a lot even since then. In my opinion one of the most important things you can do to improve refresh performance for very large Import datasets is to partition your fact tables and increase the amount of parallelism, as I describe here. One customer I worked with was able to reduce their refresh time from four hours down to one hour using this technique:
As I said, though, there are many different types of timeout that you may encounter – so even if you refresh via the XMLA Endpoint it may still time out for another reason.
In my last post I showed how to connect SQL Server Profiler up to a Power BI Premium dataset but I didn’t give you any examples of why this might be useful. In this post I’ll show you how you can use a Profiler trace to visualise all the queries run by a Power BI report, see when they start to run, see which ones run in parallel with each other and see what the overall time taken to run all the queries is.
Why is this important? When you’re tuning the performance of a Power BI report the first thing to do is to look at the performance of the individual DAX queries run and make them run as fast as possible. However when a Power BI report is rendered any one query is likely to be run at the same time as several other queries run for the same report, and this will have an impact on its performance. How much of an impact there is will depend on how many queries need to be run and the number of back-end v-cores available on your Premium capacity, or the number of QPUs available on your Azure Analysis Services instance if you’re using a Live connection to AAS. The more v-cores/QPUs you have available, the more of the work needed for a query that can be run in parallel; you can see a table listing the number of v-cores for each Premium SKU here, and the number of QPUs for each Azure Analysis Services SKU here. As a result of this if you have reports with a large number of visuals that generate slow DAX queries, scaling up your Power BI Premium capacity or AAS instance may improve overall report performance. Reducing the number of visuals on your report and/or reducing the number of visuals needed to display the same information will also reduce the number of queries that need to be run and therefore improve overall performance.
As I showed last week, SQL Server Profiler can be used to create a trace that logs all the queries run against a Power BI Premium dataset in the same way as it can be used with Azure Analysis Services. Assuming that you have a trace running that uses only the Query End event, this will give you a list of all the queries that are being run along with their start time, end time, duration and a lot of other interesting information. A table with all this data in can still be difficult to interpret though, so I built a Power BI template for a report that visualises all these queries and helps you understand the amount of parallelism that is taking place. You can download the template file here.
To use it, first you need a trace file. Make sure that no-one else is running reports on the Premium capacity you want to test (creating a Power BI Embedded capacity for testing purposes is a good idea) and then, when the trace is running, refresh your report using the technique I described in the “Use the network tab” section of this blog post. This will also allow you to correlate what you see in the trace with the information you see in the DevTools tab in the browser.
Then save the trace file you can created to XML by going to File/Save As/Trace XML File:
Next, open the Power BI template file and when prompted, enter the full path of the trace XML file you just created:
A new Power BI report will then be created. If you want to point the report to a different trace XML file all you need to do is change the value of the TraceXMLFile Power Query parameter.
On the first page you’ll see the name of the trace XML file you connected to plus a bar chart showing each Query End event (with each query identified by a number) on the y axis and the duration of each query on the x axis:
It’s not quite a simple bar chart though. What I’ve done is:
Found the start time of the first query run
Calculated the start time of every other query in the file relative to this first start time (although, unfortunately, Profiler only gives you start times rounded to the nearest second which means you can’t know exactly when a query starts)
Created a stacked bar chart where the first value in the stack is this relative start time and the second value is the duration of the query in seconds
Made the colour of the relative start time transparent, so you only see the blue sections of the bar for the query durations. This gives you a waterfall-like effect and allows you to see which queries are run in parallel. This also makes it easy to see the total amount of time taken to run your queries, from the start of the first query to the end of the last query, which is just as useful to know as the duration of any single query.
There’s also a drillthrough page so you can right-click on a bar and see a table with the DAX query for the query you clicked on, as well as its start time and duration.
It’s a very basic report, I know, and I would be interested to know if you have any ideas about other ways of visualising this data. What’s more, a visual like this raises more questions than I know how to answer… yet. For example, one thing I want to investigate is the effect that query interleaving has on this graph and both perceived and actual report performance. So stay tuned for more blog posts on this subject!
Back in December when I wrote a series of posts on testing the performance of Power BI reports in the browser, I mentioned that it was important to test in the browser because some aspects of the performance of a report may be different there compared to in Power BI Desktop. Following on from this, if you’re testing performance of a report in the browser you are also going to want to take a closer look at the DAX queries generated by your report, even if it is just to check that what you see there is the same as what you see in Performance Analyzer in Power BI Desktop. If your report uses a Live Connection to Analysis Services this is easy to do using either SQL Server Profiler, Azure Analysis Services’s diagnostic logging feature or XEvents. If you’re using a dataset stored in Power BI we have a range of options for monitoring what’s going on including Usage Metrics and the Premium Capacity Metrics Apps and of course there’s also DAX Studio, but for an old-school guy like me, connecting to a Power BI Premium workspace using SQL Server Profiler is a great way to go to get detailed information about what’s going on when queries run. In this blog post I’ll show you how to connect Profiler to Power BI Premium.
First you’ll need to install the latest version of SQL Server Management Studio, which includes SQL Server Profiler – you can get it here. If you have an older version installed you’ll probably need to upgrade. And before anyone leaves a comment about Profiler being deprecated, let me point you to the note on this page:
The feature that allows you to connect Profiler to a Power BI Premium workspace is XMLA Endpoints: it’s in preview right now but basically this allows you to connect any tool that works with Azure Analysis Services up to Power BI Premium. SQL Server Profiler wants to connect to an instance of Analysis Services; XMLA Endpoints mean that you can connect it to a dataset in a Power BI Premium workspace but for this to happen you need to know the url for Profiler to connect to. You can find this by going to your workspace, clicking on Settings:
and then going to the Premium tab and copying the Workspace Connection string:
You can then open up Profiler, go to the File menu and select New Trace and a connection dialog will appear:
In this dialog:
Set the Server type to Analysis Services
In Server name paste the Workspace Connection that you copied from Power BI earlier
In Authentication select Azure Active Directory – Universal with MFA and enter your username
Next click the Options button and go to the Connection Properties tab and on the Connect to database dropdown select <Browse server…>:
Click Yes on the dialog that appears and then choose the name of the dataset in your workspace that you want to connect to in the Connect to database dropdown. If you don’t do this you’ll get errors later on.
Next you’ll see the Properties dialog:
The default template is Blank, which means no events are selected to monitor; to select events go to the Events Selection tab:
…select the events you want, and click Run to start tracing.
Which events should you choose? That’s a big topic and not one that I have time to go into here, but the Query End event is perhaps the one I look at most – it’s fired every time a query finishes executing and gives you a lot of important information such as the start time, end time and duration of the query in milliseconds. Books such as “The Definitive Guide To DAX” have a lot of information on using Profiler with Analysis Services and a lot of that information is relevant to Power BI Premium too. In future blog posts I dare say I’ll show you some interesting things you can do using Profiler and Power BI too…
Very occasionally you may find that a report that is slow to render – but still renders successfully – in Power BI Desktop shows the error “Visual has exceeded the available resources” for some visuals when published to the Power BI Service:
Clicking on “See details” will show a dialog that looks something like this:
This visual has exceeded the available resources. Try filtering to decrease the amount of data displayed.
Please try again later or contact support. If you contact support, please provide these details.More details Resource Governing: The query exceeded the maximum memory allowed for queries executed in the current workload group (Requested 1048580KB, Limit 1048576KB).
The official Power BI documentation has similar advice to what’s shown in this dialog about what to do here, but what’s really going on?
The information in the “More details” section of the section dialog gives you a clue: in this case it’s resource governance. When you run a DAX query in Power BI it will always use a certain amount of memory; inefficient DAX calculations can cause a query to try to grab a lot of memory. In Power BI Desktop these queries may run successfully but be slow, but the Power BI Service can’t just let a query use as many resources as it wants (if it did, it may affect the performance of other queries being run by other users) so there is a resource governor that will kill queries that are too resource hungry. In the case of the visual above the query behind it tried to use more than 1GB of memory and was killed by the resource governor.
The limits enforced by the resource governor cannot be changed in Power BI shared capacity. The limits are higher in a Premium capacity and vary depending on the capacity size, and if you are a Premium Capacity Admin there are a number of different settings on a capacity you can change that will affect this, described here. For example the Query Memory Limit (%) setting controls the amount of memory that any given query can use before the resource governor kicks in.
To demonstrate this I moved the workspace containing the report in the screenshot above to a Power BI Embedded A4 capacity and then changed the Query Memory Limit % setting to 1.
This resulted in the error dialog showing that the resource governor killed the query when it tried to use 1% of the 25GB of RAM available in an A4, which is 262144KB:
While it may be useful to change this setting if you’re the admin of an overloaded Premium capacity, the real solution is to tune the DAX calculations used in this visual (I suggest you read the section on materialisation starting on p568 of “The Definitive Guide to DAX” or this older article which covers the same ground) so they use less memory and return faster or change what the visual displays.
You may see the “Visual has exceeded the available resources” error in other scenarios where the resource governor is not involved. For example, setting the Query Timeout setting on the capacity to 5 seconds like so:
…means that the visual now times out after five seconds instead, and the “See details” dialog shows the following error instead:
More details The XML for Analysis request timed out before it was completed. Timeout value: 5 sec.
It is very difficult for an organisation to decide whether to buy Power BI Premium or to stick with Power BI Pro. Power BI Premium represents a big financial commitment, but how do you know whether the benefits will be worth the costs involved? This was a question that Melissa Coates and I dealt with in great detail in the second version of our white paper “Planning A Power BI Enterprise Deployment”, and I strongly suggest you read the relevant section (pages 37-42) if you are considering buying Premium.
However, it’s one thing to read a white paper and another to actually test Premium yourself – and right now there is no way of trying Premium before you buy it. As more and more features like SSRS paginated reports or linked entities in dataflows get added to Premium, the more likely it is you are going to want to test these features before buying Premium. So what do you do? The answer is to use Power BI Embedded instead. It’s not exactly the same thing as Premium (the setup experience is different, for one thing), but there are three significant facts to point out:
I have been assured by several senior Microsoft employees that all new Premium features will be available in both the Premium (EM and P) SKUs and the Embedded (A) SKUs. So, for example, when the SSRS paginated report feature is released it will be available in both Premium and Embedded.
The resources available (in terms of the number of v-cores and memory) in the various Premium SKUs mirror those available in the Embedded SKUs. For example a P1 Premium SKU has the same resources available as an A4 Embedded SKU.
With Power BI Embedded, unlike Power BI Premium, you only need to pay for what you use: you can pause a Power BI Embedded capacity when you are not using it and pay nothing.
Therefore, to sum up, if you want to test Premium features before you buy, all you need to do is create a new Power BI Embedded capacity in the Azure portal and assign a Workspace to it – and you’ll get access to all the Premium features. When you’ve finished just pause the capacity. You’ll still need to pay while you’re testing but it will be a fraction of the cost of buying Premium.