[This post was originally published on the official Power Query blog, which has now been taken down. I’m republishing all my posts there to this blog to ensure the content remains available.]
You may have noticed a new M function was added to Power Query recently: Web.Headers. This function allows you to make HEAD requests to web services – the existing Web.Contents function, which it resembles closely, only allows you to make GET and POST requests. You can use the Web.Headers function to return the HTTP headers that would be returned if the URL passed to it was instead used to make a GET request.
Sometimes, when you’re analysing data, you need to be able to change variables and see what the impact is: for example you might want to see what your profit margin looks like if tax rates are set at different levels. Power BI’s what-if parameter feature can do this but it has the limitation that you can’t enter any value you like – you need to create a table containing all possible variable values in advance. The advantage the Excel Data Model/Power Pivot has over Power BI for this type of what-if analysis is that you have the Excel worksheet available, which is not only a place to display your report but which can also be used as a data source for tables in the Excel Data Model, making it easy for users to enter whatever variable they want. Up until recently, though, I assumed that if you were importing data from the worksheet into the Excel Data Model you would need to take some form of manual action, for example clicking a refresh button, to load the new data from the worksheet into the Excel Data Model when the data there changed. In this blog post I’ll show you how you can use VBA to solve this problem and build an elegant what-if analysis solution in Excel where no extra clicks are needed.
Let’s look at a very simple example. Here’s a very simple model in Excel with two tables, Sales (containing sales data) and TaxRate (which has just one row and column, a number representing a tax rate):
The source for the Sales table doesn’t matter, but let’s assume that it’s so large that we don’t want to wait to reload the data if we don’t have to. The source for the TaxRate table is a named range on the worksheet, also called TaxRate:
The data from this named range is loaded into the Excel Data Model using a Power Query also called TaxRate:
let
Source = Excel.CurrentWorkbook(),
TaxRate = Source{[Name="TaxRate"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(TaxRate,{{"Column1", type number}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"Column1", "Tax Rate"}})
in
#"Renamed Columns"
Here’s the fun bit. What we want to do is automatically refresh just the TaxRate table in the Excel Data Model when someone changes the value in the TaxRate named range on the worksheet, and you can do that with the following VBA (shamelessly adapted from this example in the docs) on the worksheet with the named range on:
Private Sub Worksheet_Change(ByVal Target As Range)
Dim KeyCells As Range
' The variable KeyCells contains the cells that will
' cause an alert when they are changed.
Set KeyCells = Range("TaxRate")
If Not Application.Intersect(KeyCells, Range(Target.Address)) _
Is Nothing Then
'Refresh the TaxRate Power Query query
ActiveWorkbook.Queries("TaxRate").Refresh
End If
End Sub
As I mentioned in this post from earlier this year, you can now refresh individual Power Query queries in VBA; what I’ve just realised is that this means you can refresh individual tables in the Excel Data Model from VBA, without having to refresh all the tables (which could be very slow). Actually, it looks like it has been possible to refresh individual tables in the Excel Data Model for a long time using the ModelTable.Refresh method but I guess I didn’t think about this problem until I knew about the Power Query refresh change.
Here’s the end result:
As you can see, as soon as the tax rate is changed in the worksheet it’s loaded automatically into the Excel Data Model and the new value is used in the measures shown in the PivotTable almost immediately. This makes what-if analysis much easier and I can think of several scenarios where this kind of automatic refresh would be useful too, but I’ll leave them for a future blog post.
Do you have a a large dataset in Power BI Premium or Premium Per User? Do you have more than six tables that take a significant amount of time to refresh? If so, you may be able to speed up the performance of your dataset’s refresh by increasing the number of tables that are refreshed in parallel, using a feature that was released in August 2022 but which you may have missed.
Some of you may be thinking: haven’t you blogged about refresh parallelism before? Yes: last year I wrote a post about setting the maxParallelism property when refreshing through TMSL scripts, a few months ago I also showed how you could change the same property when refreshing using the Enhanced Refresh API. That was all fairly complex though and what I’m going to show you here is, in contrast, very easy to implement.
To illustrate this I created an Import dataset containing nine tables, each of which loaded data from different CSV files. Each table contained a couple of million rows and took 20-30 seconds to refresh. I then published to a PPU workspace in the Power BI Service and used the technique described in Phil Seamark’s “Visualise your Power BI refresh” blog post to capture what happened during refresh. The refresh took 44 seconds as as you can see from this screenshot of Phil’s report, only six of the tables were refreshed in parallel at any one time:
[Ignore the y axis on this graph – I have no idea what’s going on with it]
This is as you would expect: the default number of objects that can be refreshed in parallel in Premium or Premium Per User is 6. This is also the maximum number of objects that can be refreshed in parallel in Shared capacity.
I then went to my original .pbix file, opened the Options dialog, went to the Current File/Data Load pane and changed the new “Parallel loading of tables” setting on my dataset from Default to Custom and entered the value 9:
This setting sets the maxParallelism property in the refresh commands generated by Power BI in the Power BI Service; it is fully documented here. Rerunning the refresh again in the Power BI Service I found that the overall time taken was down to 29 seconds and that all the tables were refreshed in parallel:
Here’s the Refresh command from the Profiler trace showing maxParallelism set to 9:
In summary: refresh performance got1/3 faster by making this simple change, so if you’re using Power BI Premium and not already doing advanced things with TMSL scripts or the Enhanced Refresh API, this is definitely something you should experiment with. As I said last year you can try setting this setting up to a value of 30 but there’s no guarantee you’ll get the amount of parallelism you ask for (it depends on the Premium capacity SKU and a few other factors). What’s more, setting this property too high could result in slower refresh because it might result in Power BI overloading your data source with queries.
[Thanks to Akshai Mirchandani for answering my questions about this subject]
A few weeks ago I wrote a post showing how you can use the new EvaluateAndLog DAX function to diagnose performance problems relating to the use of the Switch function. Did you know that calculation groups can experience similar performance problems though? In some scenarios limited evaluation takes place for all calculation items, not just the one you expect; luckily you can use EvaluateAndLog to diagnose this too. In this post I’ll show you a simple example.
Let’s say you have a dataset containing the following table, called Sales:
There’s also a calculation group with three calculation items:
The Measure Value calculation item returns the value of SelectedMeasure but also uses EvaluateAndLog:
EVALUATEANDLOG( SELECTEDMEASURE() )
The Measure Value * 2 and Measure Value * 3 calculation items do the same thing but multiply the value returned by SelectedMeasure by two and three respectively, for example:
EVALUATEANDLOG( SELECTEDMEASURE() * 2 )
There’s another table in the dataset, with no relationship to the Sales table, called Calc Item Names containing a single column listing the three calculation items’ names:
Last of all, there is a measure called Sales Amount:
Sales Amount = SUM(Sales[Sales])
…and a measure called Dynamic Sales Amount Unoptimised that takes the selection made on the Calc Item Names column of the Calc Item Names table and uses it to dynamically select the calculation item with that name and apply it to the Sales Amount measure:
All this allows you to build the following report where a slicer controls the calculation item applied in the Dynamic Sales Amount Not Optimised measure:
As the name suggests, though, there’s a problem with the Dynamic Sales Amount Not Optimised measure that EvaluateAndLog can help uncover. Using Jeffrey Wang’s DAXDebugOutput tool to capture what happens when Measure Value is selected in the slicer, you can see three DAXEvaluationLog events. The first is the one you’d expect: since Measure Value is selected in the slicer, the Measure Value calculation item is evaluated:
The other two events are for the other two, unselected calculation items though – and similar to the Switch problem, some evaluation taking place for all calculation items can cause performance problems:
How do you avoid this problem? It turns out that if you use TreatAs to apply the calculation item inside Calculate, like so:
Last week, while I was at the always-excellent dataMinds Connect conference in Belgium, I was interviewed by Heini Ilmarinen and Alexander Arvidsson for the ‘Knee-Deep In Tech’ podcast. You can listen to it here:
We had a very interesting chat about my career, the Power BI CAT team and what it does, why prioritising features is a lot harder than you think, technical training and the genius of Miss Excel, and more.
If you’re monitoring activity in Power BI, for example using Log Analytics, you’ll know that you can capture the DAX queries generated by your published Power BI reports. How do you make sense of them though? You may want to know which tables, columns or measures are being used by a query – maybe so you can work out which ones aren’t being used and can be deleted. I always thought the only way to do this would be to parse the DAX query, which would be incredibly difficult to do. Yesterday, though, Igor Cotruta pointed out on Twitter that there’s a really easy way to do this that I didn’t know about, using the DISCOVER_CALC_DEPENDENCY DMV. I’ve used this DMV before and blogged about it here, but what was new to me is that you can pass a DAX query into the QUERY restriction (it is all documented here). To see how it works let’s look at a simple example.
Consider the following dataset consisting of a single table with a calculated column and a few measures:
Now consider a table visual built on this dataset with a slicer linked to it:
You can get the DAX query generated by this visual using Performance Analyzer’s Copy Query feature (described in this article). You can then pass this DAX query into the WHERE clause of a query against the DISCOVER_CALC_DEPENDENCY DMV like so:
SELECT
REFERENCED_OBJECT_TYPE,
REFERENCED_TABLE,
REFERENCED_OBJECT,
REFERENCED_EXPRESSION
FROM
$SYSTEM.DISCOVER_CALC_DEPENDENCY
WHERE
QUERY =
'
DEFINE
VAR __DS0FilterTable =
TREATAS({"Apples"}, ''Sales''[Product])
VAR __DS0Core =
SUMMARIZECOLUMNS(
ROLLUPADDISSUBTOTAL(''Sales''[Country], "IsGrandTotalRowTotal"),
__DS0FilterTable,
"Sales_Amount", ''Sales''[Sales Amount],
"Effective_Tax_Rate", ''Sales''[Effective Tax Rate]
)
VAR __DS0PrimaryWindowed =
TOPN(502, __DS0Core, [IsGrandTotalRowTotal], 0, ''Sales''[Country], 1)
EVALUATE
__DS0PrimaryWindowed
ORDER BY
[IsGrandTotalRowTotal] DESC, ''Sales''[Country]
'
[Note that the DAX query I’m passing in contains single quotes that have to be escaped, which explains why all the table names are surrounded by two single quotes]
The query can be run from DAX Studio connected to either Power BI Desktop or a dataset published to Power BI Premium via the XMLA Endpoint. Here’s the output:
As you can see, this gives you the table, columns (including the hidden RowNumber column), measures and calculated column directly referenced by the query as well as the DAX definitions of the measures and calculated column. It does not mention the [Tax Amount] measure, which the [Effective Tax Rate] measure references; you can get the full list of measures and calculated columns in the dataset and the dependencies between them by querying the DMV without the WHERE clause though. Neither does the output tell you that the Product column is being sliced by the value “Apples”, which would also be useful. Overall, though, I think this is a really powerful feature.
It’s very easy to get confused between a Live connections and DirectQuery mode in Power BI: the customers I work with frequently get them mixed up, partly because they are similar concepts, partly because other BI tools use the term “live connection” to refer to what Power BI calls DirectQuery mode. In this post I will explain the differences between these two terms.
Power BI Datasets, Azure Analysis Services and SQL Server Analysis Services
The first thing to understand is that when a Power BI report runs it can only get data from one of three places: a Power BI dataset; an Azure Analysis Services (AAS) database; or an on-premises instance of SQL Server Analysis Services(SSAS). There are some exceptions to this rule but we can ignore them for the purposes of this post. Each of these three sources run variations of the same engine, ie the Analysis Services engine, and can be queried in DAX. These sources may in turn get data from other sources when they are queried but as far as the Power BI report is concerned the only sources it can talk to directly are Power BI datasets, AAS databases and SSAS databases.
Live connections
Normally, when you open up Power BI Desktop and start developing, your .pbix file contains the definition of a Power BI dataset and a report that connects to it. When you publish this .pbix file they are split up and you see separate dataset and report objects in your Power BI workspace.
However, in Power BI Desktop you may instead choose to connect to a Power BI dataset that has already been published to the Power BI Service, or an instance of AAS or SSAS, and use that as the source for the report. The main advantage of doing this is that different developers can work on the dataset/AAS database/SSAS database definition and the report at the same time. When you connect to an already-published dataset or an AAS or SSAS database from Power BI Desktop you are creating a Live connection.
To connect to a published Power BI dataset and create a Live connection choose the “Power BI datasets” option under the Get Data menu:
If you want to create a Live connection to AAS or SSAS choose the Analysis Services source and then make sure the “Connect live” option is selected:
After you’ve created the Live connection you’ll see a message telling you this in the bottom right-hand corner of the screen in Power BI Desktop:
The key thing to understand is that when you create a Live connection your .pbix file no longer contains a dataset definition – it only contains a report, and that report connects to a dataset, AAS database or SSAS database that already exists elsewhere.
The concept of a Live connection only really makes sense in Power BI Desktop: as I said, in the Power BI Service reports and datasets are separate objects anyway. A report can only have one Live connection because a report can only ever connect to a single dataset, AAS database or SSAS database.
If you publish a report with a Live connection to an already-published Power BI dataset, then the only new thing that will appear in the Power BI Service is the report . If you publish a report with a Live connection to AAS or SSAS then you will see a new dataset appear in the Power BI Service but this isn’t really a dataset, it’s really just a pointer to the AAS or SSAS database where the data is really being held.
DirectQuery mode
There are two fundamental ways a Power BI dataset, AAS database or SSAS Tabular database can work with data: Import mode and DirectQuery mode. In Import mode the data needed for your dataset is copied into the dataset and stored inside it. In DirectQuery mode, when a Power BI report requests data from the dataset by running a DAX query against it, the dataset in turn generates queries that are run on your data source to get the data it needs on demand. If your data source is a SQL Server database then Power BI will generate SQL queries to get the data it needs; for other sources it may generate queries in other query languages. A Power BI dataset is always a semantic layer, an easily consumable interface for your BI data: in Import mode it is a semantic layer and it also stores the data it needs; in DirectQuery mode it is just a semantic layer.
The only data sources that Power BI can use in DirectQuery mode are those that have some kind of query language that Power BI can generate. You can’t connect to a CSV file in DirectQuery mode, for example, because there’s no way of running queries against it unless you load the data in it into some other form of database. Other data sources may have rich query languages but Power BI will only be able to connect to them in DirectQuery mode if a suitable connector exists for that source. Most commonly-used relational databases can be used in DirectQuery mode. If your data source supports DirectQuery mode you’ll see that as an option on the dialog when you first connect:
The key thing to understand here is that the term “DirectQuery mode” describes where the data for a dataset is kept, ie outside the dataset as opposed to inside it in Import mode.
Composite models
Different tables in a single Power BI dataset can have different storage modes: some can be in Import mode and some can be in DirectQuery mode. Furthermore, different DirectQuery tables in the same Power BI dataset can connect to different data sources. When you have different tables using different storage modes or different tables using different DirectQuery datasets then you have what is called a “composite model“.
DirectQuery on Power BI datasets and Analysis Services
One final twist is that you can create Power BI datasets in DirectQuery mode and use other Power BI datasets, AAS databases or SSAS databases, as a data source. This means you can have DirectQuery datasets that connect to other datasets that can potentially connect to other datasets. It’s useful because, for example, you can have a single, centrally-developed dataset containing all of your company’s sales data and then other Power BI developers can create their own datasets that connect to this central dataset and extend it by adding extra tables of data and new measures without ever duplicating the data in the central dataset.
Although the way things behave behind the scenes is somewhat different, a Power BI dataset in DirectQuery mode that connects to another Power BI dataset as its source is conceptually very similar to a Power BI dataset in DirectQuery mode that connects to a relational database like SQL Server: it just generates DAX queries to get the data it needs when it is queried rather than SQL queries.
Conclusion
Live connections and DirectQuery mode are two distinct concepts. Live connections occur when you create reports in Power BI Desktop that connect to already-published Power BI datasets, AAS databases or SSAS databases. DirectQuery mode is one of the two fundamental ways that a Power BI dataset can work with data (the other being Import mode) and is the mode where a dataset does not contain any data but instead queries other data sources to get the data it needs when it is itself queried.
[Thanks are due to my friend Peter Myers – this blog post expands on some explanations we developed last year in a session we co-presented at various Power BI user groups called “Demystifying Power BI Datasets”. Several recordings of this session exist on YouTube, for example here.]
Something I do all the time when performance tuning Power BI is use SQL Server Profiler to monitor query and refresh activity. There’s a handy external tool that lets you open up Profiler with a connection to Power BI Desktop; if you’re using Power BI Premium you can also connect Profiler up to a published dataset using the XMLA Endpoint. Profiler is a bit old-school though and likely to lead to snarky comments from SQL Server DBAs (it’s not deprecated for Analysis Services though!) who’ll say you should be using Extended Events (aka xEvents) instead. And guess what – did you know you can use Analysis Services xEvents to monitor activity in Power BI Desktop?
Doing this is quite simple. The only challenge to overcome is how to connect SQL Server Management Studio to an instance of Power BI Desktop, and the easiest way to do this is to install DAX Studio as an external tool, open it connected to Power BI Desktop, and then look in the bottom right-hand corner of the screen for the address of the instance of the Analysis Services engine inside Power BI Desktop that DAX Studio is connected to:
You can then use this address in the connection dialog of SQL Server Management Studio:
Once you’ve connected, under the Management folder in the Object Explorer pane you’ll find there’s already an xEvent session running you can look at:
You can also create a new session by following the instructions here.
All very interesting, but what’s the point of doing this? I admit that I find Profiler much quicker and easier to use but xEvents have two slight advantages over Profiler:
Frustratingly, all the timestamps in Profiler for Analysis Services events are rounded to the nearest second, which makes answering questions like “which event started or finished first?” much harder to answer. In contrast, all the timestamps for xEvents are very granular:
2. There are some events that are only available in xEvents and not in Profiler traces. The ExecutionStatistics event shown in the screenshot above is a great example: the text column for it a lot of detailed information that I don’t know how to interpret (yet) and which isn’t documented but apparently can be used to determine the amount of time spent in the Formula Engine among other things. This is useful because the widely used method of doing this – add up the amount of time spent in the Storage Engine and subtract from the overall query duration – is unreliable and will become even more unreliable in the future.
I love this kind of detailed information and I’m sure there are a lot of practical problems it can be used to solve. The bad news is that you can’t use this technique against Power BI Premium, at least at the time of writing. Look out for more blog posts on this subject in the future!
Over the years I’ve seen a few examples of how issues with an organisation’s corporate network can affect Power BI report performance. I’ve never blogged about them because, to be honest, I know next to nothing about networks and I’m not sure I could describe them properly. However, recently, I have seen a few instances of what I think could be a widespread but little-diagnosed problem – so I would like you to check if it’s happening to you and tell me what the cause is, if you can find out.
Some time ago I wrote a post about how Power BI report performance could be worse in Internet Explorer 11 and older Windows operating systems because some report performance optimisations we do are only available with HTTP/2. You can read it here:
I hope you’re not still on Windows 8.1 or using IE11. Some traffic could still be being forced to use HTTP/1.1 though and this could have a big impact on report performance – in some cases making reports 5-10 seconds slower to render. How can you tell? You will need to use the Developer Tools available in Chrome and Edge. I blogged about how to use the Network tab in Developer Tools to measure Power BI report performance here:
You will also need to make sure the Protocol column is visible by right-clicking in the central pane where the traffic is displayed:
Here’s an example of what you’ll see in the Network tab when the report renders and you have the problem:
There are two things to look out for:
In the Protocol column you’ll see the value “http/1.1” rather than “h2” for some events, most importantly the query events
You’ll see a long grey bar before the green bar in the Waterfall diagram for these events showing that the connection is stalled for several seconds
If possible you should compare the performance of running your report inside your corporate network and outside it, using the technique described in the blog post mentioned above to see how long the report takes to render in both cases. If the report is faster outside your corporate network, and you only see traffic being forced to use http/1.1 and connections stalled inside your corporate network, then you have a problem and need to talk to your networking team to find out what’s going on. I’ve heard ZScaler could be the culprit but I’m not sure, so if you do have this problem and do find out the cause and the solution, please leave a comment below.
The Switch function is often a cause of performance problems in DAX. Last year a number of optimisations were made that fixed the most common issues (and as a result you should ignore older blog posts that you may find on this subject) but some scenarios still remain where performance can be bad; Marco and Alberto wrote a great post recently outlining some of them that I encourage you to read. How do you even know whether your measure is slow because of your use of the Switch function though? Trying to read query plans, as Marco and Alberto do in their post, isn’t usually practical and commenting out branches in a Switch to see if there is a performance change can be very time-consuming. Luckily, the new EvaluateAndLog DAX function can help you diagnose Switch-related performance issues.
To illustrate this I built a very simple Power BI dataset with a fact table called Sales and a table called DisconnectedTable used for choosing the measure to display in a report (I know you can use field parameters for this but this is just a demo).
The Sales table looks like this:
…and the DisconnectedTable table looks like this:
There are two basic measures that sum up the values in the Sales and Tax columns:
There are also two measures that display either Sales Amount or Tax Amount based on the selection made on DisconnectedTable. One is called Dynamic Measure Optimised:
Now let’s assume you have a report with a slicer linked to the Measure column on DisconnectedTable and a table showing the Product column and either the Dynamic Measure Optimised or Dynamic Measure Unoptimised measure:
Both measures return the same value, but as Marco and Alberto show in their post since Dynamic Measure Optimised looks for the selected value on the column that is used in the slicer – the column called Measure on DisconnectedTable – it will perform better than Dynamic Measure Unoptimised, which looks for the selected value on the Key column of DisconnectedTable, which is not used in the slicer.
How can we prove this though? If you wrap the expressions used in each branch of Switch with EvaluateAndLog, like so:
…and then run Jeffrey Wang’s DAXDebugOutput tool and connect it up to your Power BI Desktop file, you can see the difference between the two measures for the DAX query that populates the table visual in the report.
When using Dynamic Measure Optimised and with Sales Amount selected in the slicer, the DAX query for the table visual generates two DAXEvaluationLog events, one for the body of the table, one for the grand total row:
This is what you would expect and this is the optimal pattern: both DAXEvaluationLog events only mention the Sales Amount measure, the measure whose value is displayed in the slicer. However, if you use Dynamic Measure Unoptimised with Sales Amount selected in the slicer, you’ll see two extra DAXEvaluationLog events being generated where Power BI is paying the cost for analysing and executing the unselected branch for Tax Amount just to result an empty, unused result in the end:
It’s these evaluations of unused Switch branches that can cause query performance problems. So my advice is if you’ve got a lot of complex measures that use Switch you should study Jeffrey’s blog carefully use EvaluateAndLog to see if any optimisation is necessary.
[Thanks to Jeffrey Wang for his help writing this post]