A few weeks ago I wrote a post showing how you can use the new EvaluateAndLog DAX function to diagnose performance problems relating to the use of the Switch function. Did you know that calculation groups can experience similar performance problems though? In some scenarios limited evaluation takes place for all calculation items, not just the one you expect; luckily you can use EvaluateAndLog to diagnose this too. In this post I’ll show you a simple example.
Let’s say you have a dataset containing the following table, called Sales:
There’s also a calculation group with three calculation items:
The Measure Value calculation item returns the value of SelectedMeasure but also uses EvaluateAndLog:
EVALUATEANDLOG( SELECTEDMEASURE() )
The Measure Value * 2 and Measure Value * 3 calculation items do the same thing but multiply the value returned by SelectedMeasure by two and three respectively, for example:
EVALUATEANDLOG( SELECTEDMEASURE() * 2 )
There’s another table in the dataset, with no relationship to the Sales table, called Calc Item Names containing a single column listing the three calculation items’ names:
Last of all, there is a measure called Sales Amount:
Sales Amount = SUM(Sales[Sales])
…and a measure called Dynamic Sales Amount Unoptimised that takes the selection made on the Calc Item Names column of the Calc Item Names table and uses it to dynamically select the calculation item with that name and apply it to the Sales Amount measure:
All this allows you to build the following report where a slicer controls the calculation item applied in the Dynamic Sales Amount Not Optimised measure:
As the name suggests, though, there’s a problem with the Dynamic Sales Amount Not Optimised measure that EvaluateAndLog can help uncover. Using Jeffrey Wang’s DAXDebugOutput tool to capture what happens when Measure Value is selected in the slicer, you can see three DAXEvaluationLog events. The first is the one you’d expect: since Measure Value is selected in the slicer, the Measure Value calculation item is evaluated:
The other two events are for the other two, unselected calculation items though – and similar to the Switch problem, some evaluation taking place for all calculation items can cause performance problems:
How do you avoid this problem? It turns out that if you use TreatAs to apply the calculation item inside Calculate, like so:
Last week, while I was at the always-excellent dataMinds Connect conference in Belgium, I was interviewed by Heini Ilmarinen and Alexander Arvidsson for the ‘Knee-Deep In Tech’ podcast. You can listen to it here:
We had a very interesting chat about my career, the Power BI CAT team and what it does, why prioritising features is a lot harder than you think, technical training and the genius of Miss Excel, and more.
If you’re monitoring activity in Power BI, for example using Log Analytics, you’ll know that you can capture the DAX queries generated by your published Power BI reports. How do you make sense of them though? You may want to know which tables, columns or measures are being used by a query – maybe so you can work out which ones aren’t being used and can be deleted. I always thought the only way to do this would be to parse the DAX query, which would be incredibly difficult to do. Yesterday, though, Igor Cotruta pointed out on Twitter that there’s a really easy way to do this that I didn’t know about, using the DISCOVER_CALC_DEPENDENCY DMV. I’ve used this DMV before and blogged about it here, but what was new to me is that you can pass a DAX query into the QUERY restriction (it is all documented here). To see how it works let’s look at a simple example.
Consider the following dataset consisting of a single table with a calculated column and a few measures:
Now consider a table visual built on this dataset with a slicer linked to it:
You can get the DAX query generated by this visual using Performance Analyzer’s Copy Query feature (described in this article). You can then pass this DAX query into the WHERE clause of a query against the DISCOVER_CALC_DEPENDENCY DMV like so:
SELECT
REFERENCED_OBJECT_TYPE,
REFERENCED_TABLE,
REFERENCED_OBJECT,
REFERENCED_EXPRESSION
FROM
$SYSTEM.DISCOVER_CALC_DEPENDENCY
WHERE
QUERY =
'
DEFINE
VAR __DS0FilterTable =
TREATAS({"Apples"}, ''Sales''[Product])
VAR __DS0Core =
SUMMARIZECOLUMNS(
ROLLUPADDISSUBTOTAL(''Sales''[Country], "IsGrandTotalRowTotal"),
__DS0FilterTable,
"Sales_Amount", ''Sales''[Sales Amount],
"Effective_Tax_Rate", ''Sales''[Effective Tax Rate]
)
VAR __DS0PrimaryWindowed =
TOPN(502, __DS0Core, [IsGrandTotalRowTotal], 0, ''Sales''[Country], 1)
EVALUATE
__DS0PrimaryWindowed
ORDER BY
[IsGrandTotalRowTotal] DESC, ''Sales''[Country]
'
[Note that the DAX query I’m passing in contains single quotes that have to be escaped, which explains why all the table names are surrounded by two single quotes]
The query can be run from DAX Studio connected to either Power BI Desktop or a dataset published to Power BI Premium via the XMLA Endpoint. Here’s the output:
As you can see, this gives you the table, columns (including the hidden RowNumber column), measures and calculated column directly referenced by the query as well as the DAX definitions of the measures and calculated column. It does not mention the [Tax Amount] measure, which the [Effective Tax Rate] measure references; you can get the full list of measures and calculated columns in the dataset and the dependencies between them by querying the DMV without the WHERE clause though. Neither does the output tell you that the Product column is being sliced by the value “Apples”, which would also be useful. Overall, though, I think this is a really powerful feature.
It’s very easy to get confused between a Live connections and DirectQuery mode in Power BI: the customers I work with frequently get them mixed up, partly because they are similar concepts, partly because other BI tools use the term “live connection” to refer to what Power BI calls DirectQuery mode. In this post I will explain the differences between these two terms.
Power BI Datasets, Azure Analysis Services and SQL Server Analysis Services
The first thing to understand is that when a Power BI report runs it can only get data from one of three places: a Power BI dataset; an Azure Analysis Services (AAS) database; or an on-premises instance of SQL Server Analysis Services(SSAS). There are some exceptions to this rule but we can ignore them for the purposes of this post. Each of these three sources run variations of the same engine, ie the Analysis Services engine, and can be queried in DAX. These sources may in turn get data from other sources when they are queried but as far as the Power BI report is concerned the only sources it can talk to directly are Power BI datasets, AAS databases and SSAS databases.
Live connections
Normally, when you open up Power BI Desktop and start developing, your .pbix file contains the definition of a Power BI dataset and a report that connects to it. When you publish this .pbix file they are split up and you see separate dataset and report objects in your Power BI workspace.
However, in Power BI Desktop you may instead choose to connect to a Power BI dataset that has already been published to the Power BI Service, or an instance of AAS or SSAS, and use that as the source for the report. The main advantage of doing this is that different developers can work on the dataset/AAS database/SSAS database definition and the report at the same time. When you connect to an already-published dataset or an AAS or SSAS database from Power BI Desktop you are creating a Live connection.
To connect to a published Power BI dataset and create a Live connection choose the “Power BI datasets” option under the Get Data menu:
If you want to create a Live connection to AAS or SSAS choose the Analysis Services source and then make sure the “Connect live” option is selected:
After you’ve created the Live connection you’ll see a message telling you this in the bottom right-hand corner of the screen in Power BI Desktop:
The key thing to understand is that when you create a Live connection your .pbix file no longer contains a dataset definition – it only contains a report, and that report connects to a dataset, AAS database or SSAS database that already exists elsewhere.
The concept of a Live connection only really makes sense in Power BI Desktop: as I said, in the Power BI Service reports and datasets are separate objects anyway. A report can only have one Live connection because a report can only ever connect to a single dataset, AAS database or SSAS database.
If you publish a report with a Live connection to an already-published Power BI dataset, then the only new thing that will appear in the Power BI Service is the report . If you publish a report with a Live connection to AAS or SSAS then you will see a new dataset appear in the Power BI Service but this isn’t really a dataset, it’s really just a pointer to the AAS or SSAS database where the data is really being held.
DirectQuery mode
There are two fundamental ways a Power BI dataset, AAS database or SSAS Tabular database can work with data: Import mode and DirectQuery mode. In Import mode the data needed for your dataset is copied into the dataset and stored inside it. In DirectQuery mode, when a Power BI report requests data from the dataset by running a DAX query against it, the dataset in turn generates queries that are run on your data source to get the data it needs on demand. If your data source is a SQL Server database then Power BI will generate SQL queries to get the data it needs; for other sources it may generate queries in other query languages. A Power BI dataset is always a semantic layer, an easily consumable interface for your BI data: in Import mode it is a semantic layer and it also stores the data it needs; in DirectQuery mode it is just a semantic layer.
The only data sources that Power BI can use in DirectQuery mode are those that have some kind of query language that Power BI can generate. You can’t connect to a CSV file in DirectQuery mode, for example, because there’s no way of running queries against it unless you load the data in it into some other form of database. Other data sources may have rich query languages but Power BI will only be able to connect to them in DirectQuery mode if a suitable connector exists for that source. Most commonly-used relational databases can be used in DirectQuery mode. If your data source supports DirectQuery mode you’ll see that as an option on the dialog when you first connect:
The key thing to understand here is that the term “DirectQuery mode” describes where the data for a dataset is kept, ie outside the dataset as opposed to inside it in Import mode.
Composite models
Different tables in a single Power BI dataset can have different storage modes: some can be in Import mode and some can be in DirectQuery mode. Furthermore, different DirectQuery tables in the same Power BI dataset can connect to different data sources. When you have different tables using different storage modes or different tables using different DirectQuery datasets then you have what is called a “composite model“.
DirectQuery on Power BI datasets and Analysis Services
One final twist is that you can create Power BI datasets in DirectQuery mode and use other Power BI datasets, AAS databases or SSAS databases, as a data source. This means you can have DirectQuery datasets that connect to other datasets that can potentially connect to other datasets. It’s useful because, for example, you can have a single, centrally-developed dataset containing all of your company’s sales data and then other Power BI developers can create their own datasets that connect to this central dataset and extend it by adding extra tables of data and new measures without ever duplicating the data in the central dataset.
Although the way things behave behind the scenes is somewhat different, a Power BI dataset in DirectQuery mode that connects to another Power BI dataset as its source is conceptually very similar to a Power BI dataset in DirectQuery mode that connects to a relational database like SQL Server: it just generates DAX queries to get the data it needs when it is queried rather than SQL queries.
Conclusion
Live connections and DirectQuery mode are two distinct concepts. Live connections occur when you create reports in Power BI Desktop that connect to already-published Power BI datasets, AAS databases or SSAS databases. DirectQuery mode is one of the two fundamental ways that a Power BI dataset can work with data (the other being Import mode) and is the mode where a dataset does not contain any data but instead queries other data sources to get the data it needs when it is itself queried.
[Thanks are due to my friend Peter Myers – this blog post expands on some explanations we developed last year in a session we co-presented at various Power BI user groups called “Demystifying Power BI Datasets”. Several recordings of this session exist on YouTube, for example here.]
Something I do all the time when performance tuning Power BI is use SQL Server Profiler to monitor query and refresh activity. There’s a handy external tool that lets you open up Profiler with a connection to Power BI Desktop; if you’re using Power BI Premium you can also connect Profiler up to a published dataset using the XMLA Endpoint. Profiler is a bit old-school though and likely to lead to snarky comments from SQL Server DBAs (it’s not deprecated for Analysis Services though!) who’ll say you should be using Extended Events (aka xEvents) instead. And guess what – did you know you can use Analysis Services xEvents to monitor activity in Power BI Desktop?
Doing this is quite simple. The only challenge to overcome is how to connect SQL Server Management Studio to an instance of Power BI Desktop, and the easiest way to do this is to install DAX Studio as an external tool, open it connected to Power BI Desktop, and then look in the bottom right-hand corner of the screen for the address of the instance of the Analysis Services engine inside Power BI Desktop that DAX Studio is connected to:
You can then use this address in the connection dialog of SQL Server Management Studio:
Once you’ve connected, under the Management folder in the Object Explorer pane you’ll find there’s already an xEvent session running you can look at:
You can also create a new session by following the instructions here.
All very interesting, but what’s the point of doing this? I admit that I find Profiler much quicker and easier to use but xEvents have two slight advantages over Profiler:
Frustratingly, all the timestamps in Profiler for Analysis Services events are rounded to the nearest second, which makes answering questions like “which event started or finished first?” much harder to answer. In contrast, all the timestamps for xEvents are very granular:
2. There are some events that are only available in xEvents and not in Profiler traces. The ExecutionStatistics event shown in the screenshot above is a great example: the text column for it a lot of detailed information that I don’t know how to interpret (yet) and which isn’t documented but apparently can be used to determine the amount of time spent in the Formula Engine among other things. This is useful because the widely used method of doing this – add up the amount of time spent in the Storage Engine and subtract from the overall query duration – is unreliable and will become even more unreliable in the future.
I love this kind of detailed information and I’m sure there are a lot of practical problems it can be used to solve. The bad news is that you can’t use this technique against Power BI Premium, at least at the time of writing. Look out for more blog posts on this subject in the future!
Over the years I’ve seen a few examples of how issues with an organisation’s corporate network can affect Power BI report performance. I’ve never blogged about them because, to be honest, I know next to nothing about networks and I’m not sure I could describe them properly. However, recently, I have seen a few instances of what I think could be a widespread but little-diagnosed problem – so I would like you to check if it’s happening to you and tell me what the cause is, if you can find out.
Some time ago I wrote a post about how Power BI report performance could be worse in Internet Explorer 11 and older Windows operating systems because some report performance optimisations we do are only available with HTTP/2. You can read it here:
I hope you’re not still on Windows 8.1 or using IE11. Some traffic could still be being forced to use HTTP/1.1 though and this could have a big impact on report performance – in some cases making reports 5-10 seconds slower to render. How can you tell? You will need to use the Developer Tools available in Chrome and Edge. I blogged about how to use the Network tab in Developer Tools to measure Power BI report performance here:
You will also need to make sure the Protocol column is visible by right-clicking in the central pane where the traffic is displayed:
Here’s an example of what you’ll see in the Network tab when the report renders and you have the problem:
There are two things to look out for:
In the Protocol column you’ll see the value “http/1.1” rather than “h2” for some events, most importantly the query events
You’ll see a long grey bar before the green bar in the Waterfall diagram for these events showing that the connection is stalled for several seconds
If possible you should compare the performance of running your report inside your corporate network and outside it, using the technique described in the blog post mentioned above to see how long the report takes to render in both cases. If the report is faster outside your corporate network, and you only see traffic being forced to use http/1.1 and connections stalled inside your corporate network, then you have a problem and need to talk to your networking team to find out what’s going on. I’ve heard ZScaler could be the culprit but I’m not sure, so if you do have this problem and do find out the cause and the solution, please leave a comment below.
The Switch function is often a cause of performance problems in DAX. Last year a number of optimisations were made that fixed the most common issues (and as a result you should ignore older blog posts that you may find on this subject) but some scenarios still remain where performance can be bad; Marco and Alberto wrote a great post recently outlining some of them that I encourage you to read. How do you even know whether your measure is slow because of your use of the Switch function though? Trying to read query plans, as Marco and Alberto do in their post, isn’t usually practical and commenting out branches in a Switch to see if there is a performance change can be very time-consuming. Luckily, the new EvaluateAndLog DAX function can help you diagnose Switch-related performance issues.
To illustrate this I built a very simple Power BI dataset with a fact table called Sales and a table called DisconnectedTable used for choosing the measure to display in a report (I know you can use field parameters for this but this is just a demo).
The Sales table looks like this:
…and the DisconnectedTable table looks like this:
There are two basic measures that sum up the values in the Sales and Tax columns:
There are also two measures that display either Sales Amount or Tax Amount based on the selection made on DisconnectedTable. One is called Dynamic Measure Optimised:
Now let’s assume you have a report with a slicer linked to the Measure column on DisconnectedTable and a table showing the Product column and either the Dynamic Measure Optimised or Dynamic Measure Unoptimised measure:
Both measures return the same value, but as Marco and Alberto show in their post since Dynamic Measure Optimised looks for the selected value on the column that is used in the slicer – the column called Measure on DisconnectedTable – it will perform better than Dynamic Measure Unoptimised, which looks for the selected value on the Key column of DisconnectedTable, which is not used in the slicer.
How can we prove this though? If you wrap the expressions used in each branch of Switch with EvaluateAndLog, like so:
…and then run Jeffrey Wang’s DAXDebugOutput tool and connect it up to your Power BI Desktop file, you can see the difference between the two measures for the DAX query that populates the table visual in the report.
When using Dynamic Measure Optimised and with Sales Amount selected in the slicer, the DAX query for the table visual generates two DAXEvaluationLog events, one for the body of the table, one for the grand total row:
This is what you would expect and this is the optimal pattern: both DAXEvaluationLog events only mention the Sales Amount measure, the measure whose value is displayed in the slicer. However, if you use Dynamic Measure Unoptimised with Sales Amount selected in the slicer, you’ll see two extra DAXEvaluationLog events being generated where Power BI is paying the cost for analysing and executing the unselected branch for Tax Amount just to result an empty, unused result in the end:
It’s these evaluations of unused Switch branches that can cause query performance problems. So my advice is if you’ve got a lot of complex measures that use Switch you should study Jeffrey’s blog carefully use EvaluateAndLog to see if any optimisation is necessary.
[Thanks to Jeffrey Wang for his help writing this post]
The thing that got me excited about the Power BI Enhanced Refresh API, and which inspired me to start this series of posts, was the fact that for the first time it gives you a way of cancelling Power BI dataset refreshes. In this last post in the series I’ll show you how you can add an Action to your Power Automate custom connector to cancel a refresh and how you can use it in a Power Automate Flow.
Adding a new Action is straightforward and very similar to what I’ve shown in previous posts in this series. As before, click on the New action button on the Definition stage of the custom connector wizard, fill in the General section with appropriate names:
…then use the Import from sample option to populate the Request section using the DELETE verb and the following URL for the Cancel Refresh In Group API endpoint:
The refreshId, which uniquely identifies the refresh operation you which to cancel can be obtained from the Action I described in part 4 of this series that can be used to kick off a refresh.
Using this new Action in a Flow is more interesting. Here’s an example Flow that kicks off a dataset refresh but cancels it if it goes on for more than 120 seconds; it’s similar to, but more complex than, the Flow I described in this post that checks the status of a refresh. At the top level it looks like this:
The first two actions kick off a full refresh of a dataset:
The next two Actions initialise two variables:
ElapsedRefreshDurationSeconds is an integer used to hold the amount of time in seconds that the refresh has been running for each time it is checked
StopCheckingRefreshStatus is a boolean variable used to determine whether the upcoming Do until loop should be terminated or not
The contents of the Do until loop look like this:
After waiting 30 seconds the status of the refresh is checked. If the status is “Unknown” then the refresh is ongoing and the next step is to check how long it has been running for; otherwise the refresh has completed at the StopCheckingRefreshStatus variable can be set to true so the Do until loop can be terminated. Drilling into the “Is refresh ongoing” conditional Action:
If the refresh is still in progress then the current elapsed duration in seconds of the refresh is calculated using the following expression (this blog post showed me how to calculate durations in Power Automate):
If that elapsed duration is more than 120 seconds then the refresh should be cancelled; otherwise no action is taken. Drilling into the Yes branch of the conditional Action:
The refresh is cancelled by calling the Action created at the beginning of this post; after the cancellation the StopCheckingRefreshStatus variable is set to true. Here are details of the Actions from the Yes branch that do this:
That’s it for this post and this series. Since I started the series over a month ago a few other articles on similar subjects have been published on LinkedIn that you might want to check out, by my colleagues Romain Casteres and Rui Romano; as more enterprise BI solutions are built on Power BI, I know there will be a lot more people implementing complex refresh scenarios using Power Automate/Logic Apps and the Enhanced Refresh API.
In the last post in this series I showed how to get the status of an individual Power BI dataset refresh. In this post I’ll show how to get the history of a dataset’s refreshes and save it to a file in OneDrive for Business – useful if you want to do some analysis on refresh performance or the causes of failures.
This is one of the most straightforward things to do with the Power BI Enhanced Refresh API , especially once you’ve already got a custom connector created in the way I’ve shown in the previous posts in this series. You just need to add a new Action to your custom connector, fill in the General section:
…then in the Request section use Import from sample with the GET verb on the following URL (for Get Refresh History In Group):
The only thing to do after this is edit the optional $top query parameter (which allows you to limit the number of refreshes whose history you return) so it takes an integer rather than a string:
Update the custom connector and you can now use it in a simple Flow to save the JSON returned by the call to the API into a JSON file:
You can then connect to this data very easily via Power Query, for example:
Rather than trigger a Flow like this manually, in the real world you’d want to run it on a schedule, perhaps every day. You’d then end up with a folder full of JSON files you could analyse and as much history stored as you wanted. It should be possible to load and dedupe this data (multiple JSON files will contain overlapping refresh history) using Power BI dataflows and/or datamarts but that’s something I’ll leave for another blog post…
So far in this series (see part 1, part 2 and part 3) I’ve looked at how you can create a Power Automate custom connector that uses the Power BI Enhanced Refresh API to kick off a dataset refresh. That’s only half the story though: once the refresh has been started you need to know if it has finished and, if so, whether it finished successfully or not. In this post I’ll show how to do this.
When you start a refresh using the Enhanced Refresh API Power BI returns a unique identifier for that refresh operation and you will need to modify your Power Automate custom connector to make it easy to capture and use that identifier. You can do this on the Definition stage of the custom connector wizard for an Action that kicks off a refresh (ie any of the Actions I built in the previous posts in this series) by going to the Response section, clicking on the Add default response button and pasting the following sample response (from here in the docs) into the Headers box and clicking Import:
It’s the x-ms-request-id custom header that will contain the unique identifier for the refresh.
The next step is to create a new Action to check the status of the refresh using the Get Refresh Execution Details In Group API endpoint. To do this, click on the New action button on the Definition stage of the custom connector wizard and fill in the details in the General section:
Then fill in the Request section. Using Import from sample, select the verb GET, enter the following URL:
Then in the Response section click Add default response and paste in the (rather long) sample JSON response from the docs found here. The response should look like this:
You can now use this new Action in a Flow. Here’s a simple example:
The first two Actions here, “Manually trigger a flow” and “Incremental Refresh”, kick off a refresh in the way I’ve shown in the previous posts in this series. The “Do until” Action is where the interesting stuff happens:
What this does is:
First of all, the Delay Action waits for 30 seconds. There’s no point checking the status of a refresh immediately after it has been started, and in this case 30 seconds is a reasonable amount of time to wait. Depending on how long your refresh takes you may want to use a longer delay.
Then call the new Action created above to check the status of the refresh that has just been started, using the x-ms-request-id value returned by the Incremental Refresh Action.
If the status (returned by the Get status of one dataset refresh Action) is not Unknown then the refresh has completed and the Do until loop can be terminated. If the status is Unknown then the refresh has either not started or is still in progress so the loop should be run again, ie it will wait for another 30 seconds and then check the status again.
Finally, once the refresh has completed the Condition Action sends an email telling me the outcome: if the status is Completed then the refresh has succeeded; if it is not then the status tells you at a high-level what went wrong (you can look at the extendedStatus and the contents of the objects JSON array for more details).