dataCoverageDefinition: A New Optimisation For Hybrid Tables In Power BI

Hybrid tables – tables which contain both Import mode and DirectQuery mode partitions to hold data from different time periods – have been around for a while. They are useful in cases where your historic data doesn’t change but your most recent data changes very frequently and you need to reflect those changes in your reports; you can also have “reverse hybrid tables” where the latest data is in Import mode but your historic data (which may not be queried often but still needs to be available) is in DirectQuery mode. Up to now they had a problem though: even when you were querying data that was in the Import mode partition, Power BI still sent a SQL query to the DirectQuery partition and that could hurt performance. That problem is now solved with the new dataCoverageDefinition property on the DirectQuery partition.

You can find full documentation here:
https://learn.microsoft.com/en-us/analysis-services/tom/table-partitions?view=asallproducts-allversions

What dataCoverageDefinition does is tell Power BI what data is present in the DirectQuery partition so it knows whether to generate a SQL query to get data from your DirectQuery source or not. It takes the form of a simple DAX expression that returns True for rows that are stored in the DirectQuery partition, for example:

RELATED('DimDate'[CalendarYear]) IN {2017,2018,2019}

If you’re working with large fact tables in DirectQuery mode and don’t have a religious objection to some of that data being stored in Import mode, you should check out hybrid tables because they can really improve report performance!

Query Caching In Power BI Premium

The more you monitor something the more likely it is that you’ll see something strange you can’t explain. This is true of several customers I’ve spoken to recently who saw DAX queries run in the middle of the night by people who were definitely not online at the time and who were worried about the resulting load on their Premium capacities. What is the cause – hackers? Ghosts? In a lot of cases the culprit is actually Power BI’s query caching feature.

Query caching is a feature that is available in Premium and Fabric capacities for Import mode semantic models. It allows Power BI to pre-cache the data needed by some report visuals and dashboard tiles so the reports and dashboards open faster. The documentation here is pretty good but, as always, a few examples are useful to help understand how it works and there are a couple of extra things to point out that are not on the docs page I just linked to.

Let’s start with a simple pbix file called FruitSalesModel.pbix containing a single blank report page and one table in the Import mode semantic model:

The different fruit names will make it easy to link different visuals to different DAX queries. There’s also one measure called Sales Amount that sums the values in the Sales column.

If you publish this pbix to an otherwise empty workspace hosted on a Premium capacity and you’ll see that a semantic model and a blank report are created from it.

Query caching can be enabled on the semantic model by going to the model’s Settings page, expanding the Query Caching section and selecting “On”:

Enabling query caching now means that the Power BI Service may execute DAX queries to populate the cache immediately after a semantic model refresh takes place. If a query has been cached then, when an end user opens a report, Power BI can use the cached resultset for visuals which means that the report will open faster.

You can see how long all these queries take to run by looking at the Refresh History dialog and the details for an individual refresh (as mentioned in this blog post). The Query Cache line tells you when this query activity took place and how long it took:

At this point in our example, however, no queries will be executed because the only report connected to this semantic model is completely blank.

Which queries get cached? This is documented but I always prefer to run my own tests to see what happens; you can see the DAX queries being run after a refresh by running a Profiler trace on the semantic model.

If you edit the currently-blank report in the workspace so that there are two pages, add a single card to the first page that shows the value for the Sales Amount measure for the product Apples and a single card to the second page that shows the value for the Sales Amount measure for the product Grapes, make sure the first page is the page that is displayed when a user opens the report, and then close the report without going back to Reading View, and then refresh the semantic model, then a Profiler trace shows… no queries being executed after the refresh. This is because only visuals that are viewed by an end user in Reading View (not Editing mode) in the browser have their queries cached.

If you then open the report in Reading View and view both pages of the report and then refresh the semantic model again, you’ll see one query appear in the trace:

This is the query for the card on the first page of the report, for Sales Amount for Apples. The query for the card on the second page is not cached – only the queries for the visuals on the page that the report opens on are cached.

It’s also interesting to note that the ApplicationContext column in the trace has a value for the DatasetId but not ReportId or VisualId for these queries. This makes sense because these queries are not executed as a result of a report rendering.

If you then edit the report so it opens on the second page (which shows Sales Amount for Grapes) and not the first page, then refresh the semantic model again, the Profiler trace shows queries for both cards on both pages. In fact, even if you delete the first page from the report and refresh, both queries are still run:

The query for Sales Amount for Grapes is run because it’s now the query on the page of the report that opens first. The query for Sales Amount for Apples is still run because, once a query has been added to the list of queries to be cached, it will stay there until you either delete the semantic model or you turn the Query Caching setting on the semantic model off and on again.

It’s not just the visuals on the opening page of a report that are cached. If you pin any visual, from any page of a report, to a dashboard and then view that dashboard, the query for that visual will be cached. Also, if you create a personal bookmark and make that your default view when you open a report, then the queries for that personal bookmark will also be cached. Finally, if an end user opens a report, changes a filter or slicer on the opening page of a report, then closes the report and reopens it, the report will show the filter or slicer selection that was made when the report was last closed (this behaviour, known as “persistent filters”, is explained here). Queries for each user’s persistent filters on the opening page of a report will also be cached.

The behaviour above applies to all reports connected to a semantic model.

As a result, if your end users like creating dashboards or personal bookmarks, or if you have a lot of filters or slicers on the opening page of your report, or if you have lots of reports connected to a single semantic model, you can end up with a lot of queries being generated after the semantic model has refreshed when query caching is turned on. The good news for Premium capacity admins is that the queries run to populate the cache are classed as background operations and not interactive operations (as is the case for queries generated when a report renders). The CU cost of background operations is smoothed over 24 hours which means that you won’t get big spikes in usage on your capacity when a refresh takes place and the cache is populated, although if a lot of expensive queries are run this is definitely something you still need to monitor. Here’s a screenshot from the Capacity Metrics App showing DAX queries appearing as background operations:

In conclusion, query caching is a very powerful feature. Turning it on can make a big difference to the performance of your reports but you do need to understand how caching works to get the full benefit of it. You also need to make sure you watch out for it when monitoring a Premium capacity to make sure that it doesn’t increase your background utilisation too much.

Performance Implications Of Using Calculated Columns In Composite Models On Power BI Semantic Models

I don’t have anything against the use of calculated columns in Power BI semantic models in general but you do need to be careful using them with DirectQuery mode. In particular when you have a DirectQuery connection to another Power BI semantic model – also known as a composite model on a Power BI semantic model – it’s very easy to cause serious performance problems with calculated columns. Let’s see a simple example of why this is.

Let’s say you have an Import mode semantic model called Source Model containing sales data:

Here’s the contents of the Sales table:

And here’s the definition of the Sales Amount measure:

Sales Amount = SUM(Sales[Sales])

Now, let’s say this semantic model gets published to the Service and you open Power BI Desktop, create a Live connection to this model and then click the “Make changes to this model” button to create a composite model. You now have a local composite model in Power BI Desktop that is connected to the Source Model semantic model.

Next, you decide to create a calculated column on the Sales table and after reading Marco and Alberto’s article on circular dependencies (because you’ll definitely run into circular dependency errors) come up with the following code:

Tax =
CALCULATE (
[Sales Amount] * 0.1,
ALLEXCEPT ( Sales, Sales[TransactionNumber] )
)

Great! You can now build a report that looks like this:

This is where the first warning comes though. In Import mode semantic models calculated columns are evaluated when the model is refreshed, but in DirectQuery mode they are evaluated at query time if the query needs to use them. Performance Analyzer shows two DAX queries generated for the visual above. The first is the query that the visual runs against the local semantic model in Power BI Desktop, which looks like this:

DEFINE
VAR __DS0Core =
SUMMARIZECOLUMNS(
ROLLUPADDISSUBTOTAL(ROLLUPGROUP('Country'[Country], 'Product'[Product]), "IsGrandTotalRowTotal"),
"Sales_Amount", 'Sales'[Sales Amount],
"SumTax", CALCULATE(SUM('Sales'[Tax]))
)

VAR __DS0PrimaryWindowed =
TOPN(502, __DS0Core, [IsGrandTotalRowTotal], 0, 'Country'[Country], 1, 'Product'[Product], 1)

EVALUATE
__DS0PrimaryWindowed
ORDER BY
[IsGrandTotalRowTotal] DESC, 'Country'[Country], 'Product'[Product]

There’s nothing much interesting here. However, there’s a second DAX query generated: the one generated by the local model to get the data it needs from the Source Model semantic model in the Service. This query looks like this:

Define
COLUMN 'Sales'[ASDQ_Tax] = CALCULATE([Sales Amount] * 0.1, ALLEXCEPT(Sales, Sales[TransactionNumber]))

var ASDQ___DS0Core = SUMMARIZECOLUMNS( ROLLUPADDISSUBTOTAL(ROLLUPGROUP('Country'[Country], 'Product'[Product]), "IsGrandTotalRowTotal"),
"Sales_Amount", [Sales Amount],
"SumTax", CALCULATE(SUM('Sales'[ASDQ_Tax]))
)
var ASDQ___DS0PrimaryWindowed = TOPN(502, ASDQ___DS0Core, [IsGrandTotalRowTotal], 0, 'Country'[Country], 1, 'Product'[Product], 1)

EVALUATE
ASDQ___DS0PrimaryWindowed
ORDER BY
[IsGrandTotalRowTotal] DESC, 'Country'[Country], 'Product'[Product]

Notice, right at the top of the query, a DEFINE statement that defines the Tax calculated column. Every time this visual is rendered in your Power BI report the calculated column will be re-evaluated and that will have a performance overhead. If the calculated column had been created in Source Model (which is Import mode remember) it would have been evaluated when the model was refreshed and any queries that used it would probably be faster as a result.

There’s something else to watch out for though. Let’s say you define a second calculated column in the local model like so:

Apple Sales =
CALCULATE (
[Sales Amount],
FILTER ( ALLNOBLANKROW ( 'Sales'[Product] ), 'Sales'[Product] = "Apples" )
)

You can now create a table that looks like this, which does include the Apple Sales column but does not include the Tax column:

The DAX query sent by the visual to the local model is not worth looking at. However, the DAX query sent by the local model to Source Model does have something surprising in it:

Define
COLUMN 'Sales'[ASDQ_Apple Sales] = CALCULATE([Sales Amount], FILTER(ALLNOBLANKROW('Sales'[Product]), 'Sales'[Product]="Apples"))

COLUMN 'Sales'[ASDQ_Tax] = CALCULATE([Sales Amount] * 0.1, ALLEXCEPT(Sales, Sales[TransactionNumber]))

var ASDQ___DS0Core = SUMMARIZECOLUMNS( ROLLUPADDISSUBTOTAL(ROLLUPGROUP('Country'[Country], 'Product'[Product]), "IsGrandTotalRowTotal"),
"Sales_Amount", [Sales Amount],
"SumApple_Sales", CALCULATE(SUM('Sales'[ASDQ_Apple Sales]))
)

var ASDQ___DS0PrimaryWindowed = TOPN(502, ASDQ___DS0Core, [IsGrandTotalRowTotal], 0, 'Country'[Country], 1, 'Product'[Product], 1)

EVALUATE
ASDQ___DS0PrimaryWindowed
ORDER BY
[IsGrandTotalRowTotal] DESC, 'Country'[Country], 'Product'[Product]

There’s a DEFINE statement for the Apple Sales column as you would expect. However there is also a DEFINE statement for the Tax column, which means it is evaluated too even though it’s not shown in the visual and it seems as though the Apple Sales column doesn’t reference it – or does it? I’m not going to try to explain what’s going on here (again, this is best left to Marco and Alberto) but the important point is that both calculated columns are now being evaluated at query time which means the query will be even slower. If you have large data volumes and/or your calculated columns contain some complex DAX you could end up with some very slow reports. If you’re using Premium then you could also be putting a lot of extra load on your capacity.

Composite models are an important part of Power BI’s self-service story and it’s inevitable that at some point your developers will want to use build calculated columns in them. As always you need to monitor the performance of your reports carefully, both in terms of duration and CPU usage, while you are building them to make sure they are well optimised and keep the use of calculated columns in composite models to a minimum.

New Limits For The “Maximum Connections Per Data Source” Property In Power BI DirectQuery Mode

One of the most important properties you can set in a Power BI DirectQuery semantic model is the “Maximum connections per data source” property, which controls the number of connections that can be used to run queries against a data source. The good news is that the maximum value that you can set this property to has just been increased in Premium.

This property is important because the number of open connections to a data source acts as a limit on the number of concurrent queries that can be run by the semantic model against the source: each connection can only have one query running on it at any one time. If you have Power BI reports that have a large number of visuals on a page and/or a large number of users running reports at the same time then it is very easy for a DirectQuery semantic model to need to send lots of queries back to your data source at the same time. If some of the queries that your semantic model runs against your data source are slow – more than one or two seconds even – then the number of queries that need to be run at a given time will increase. The same is true if you have increased the Max Parallelism Per Query property to increase the number of parallel queries that can be generated by a single DAX query.

This property is documented in a number of places, including the DirectQuery guidance documentation and in data source-specific best practice documents such as this one for Snowflake. You can set the property in Power BI Desktop in the Current File/DirectQuery section of the Options dialog:

If you are not using Power BI Premium (ie you are using Power BI Shared capacity, also known as Power BI Pro) then the maximum value that you can set this property to is 10. If you are using Power BI Premium then the maximum value up to today was 30 but now that limit has been increased. The table on this page shows what the new limits per SKU are:

As you can see, for a P1/F64 the maximum limit is now 50 rather than 30 and this limit goes all the way up to 200 for a P4/F512 and higher.

I’ve seen plenty of cases where increasing the value of this property makes Power BI reports run a lot faster. However, this will only be true if your data source is able to handle the number of queries that Power BI is trying to run against it. As I showed in this post, if your data source can’t handle the number of queries you’re trying to run then performance will get worse and not better, so you should try different values to see which one works best.

Extracting Power BI Import Mode Job Graph Data To A Table

In my last post but one I showed how you could create a DGML file that contains detailed information on what happens during a Power BI Import mode refresh – the “job graph” – and visualise that in Visual Studio or VS Code, to help with performance tuning. In my last post, I explained the concepts of blocking and waiting which are key to understanding this data. In this post I’ll share and describe a Dataflow template that extracts the data contained in the DGML file to a table for other types of analysis – which turns out to be quite simple, because DGML is an XML-based format which is easy to work with in Power Query and Dataflows.

You can download the .pqt template file here (for more information on how to use templates see here). To use it, create a new Dataflow Gen2 and import the template file. You’ll see the following queries:

Next, change the values of the three parameters to contain the IDs of the refresh whose DGML file you want to load (the notebook in this post that creates the DGML file uses the refresh ID as the file’s name) and the workspace and lakehouse where that file is stored. Finally, set output destinations on either the RefreshJobs and/or RefreshJobsPivoted queries. The RefreshJobs query returns a table with one row per refresh job:

This gives you all the data in the DGML file that was visible using a DGML viewer but in table form. For each job you get the CreatedAt, RunnableAt, StartedAt and FinishedAt datetimes, the Blocked, Wait and Running durations, the Slot ID, the CPU used and whether the job is on the critical path.

The RefreshJobsPivoted query gives you exactly the same data but instead of giving you one row per job, you get three: one row for the blocked, waiting and running phases of each job, with the start and end times for each phase and the duration. This may be a more convenient format for visualisation and analysis:

The Links query gives you the dependencies between the jobs:

Having loaded all this data into a Lakehouse you can now build a report on it. As I said, I’m sure there’s a way of visualising all of this data in a way that shows all the durations and dependencies, but even a basic table report on the RefreshJobs table like this is really useful because it shows you which jobs were blocked, which ones had to wait, and which ones took a long time to run:

In this example (the Dependencies example from my previous post, but where the refresh has maxParallelism set to 1, so there is blocking as well as waiting) you can see that refreshing table X took 60 seconds, that the refresh for table Y had to wait for 60 seconds before it could start and took 10 seconds, and that refresh for table XY was blocked for 70 seconds before it could start. So, if you wanted to make this refresh run faster, you would want to understand why table X was so slow and also look at increasing the amount of parallelism so Y did not have to wait.

Understanding Blocking And Waiting In Power BI Import Mode Refreshes

Following on from my previous post showing how you can visualise the job graph for a Power BI Import mode semantic model refresh, I this post I will look at how you can interpret what the job graph tells you – specifically, explaining the concepts of blocking and waiting. As always, simple examples are the best way of doing this.

Blocking

Blocking occurs when one job can only start when one or more other jobs have completed because there is a dependency between them. Consider the following semantic model:

X and Y are tables that contain a single numeric column. X takes 1 minute to load while Y takes 10 seconds to load (I forced this delay using technique I blogged about here). XYUnion is a DAX calculated table that unions all the rows from X and Y with the following definition:

XYUnion = UNION(X,Y)

As you can imagine, the job that refreshes XYUnion can only start once the jobs that refresh both X and Y have finished; XYUnion will be blocked until both X and Y have refreshed. Here’s what the job graph for the refresh of this semantic model looks like:

At the centre is the job “Process Partition XYUnion[XYUnion]” which refreshes the calculated table XYUnion. The arrows going from this job to the jobs “Process Partition X[X]” and “Process Partition Y[Y]”, which refresh tables X and Y, show that this job depends on those two other jobs.

Hovering over the “Process Partition XYUnion[XYUnion” job shows the following popup:

There are four datetime values here: CreatedAt, RunnableAt, StartedAt and FinishedAt. There are also three durations:

  • Block is the elapsed time between CreatedAt and RunnableAt, and is the amount of time elapsed before all the jobs this job depends on were completed. Anything other than a zero here means that blocking has occurred.
  • Wait is the elapsed time between RunnableAt and StartedAt. A job becomes runnable when all the jobs it depends on have completed but even it still may not be able to start because Power BI only allows a certain number of jobs to refresh in parallel (see this post for more details and how to control the amount of parallelism). Waiting is described in the example below.
  • Run is the elapsed time between StartedAt and FinishedAt and is the amount of time the job itself took to run.

In this case you can see that the value for Block for XYUnion is 1 minute: X and Y have no preceding jobs so they kick off at the same time, X takes 1 minute to run and Y takes 10 seconds, so it is 1 minute before XYUnion can run. The popups for X and Y show 1 minute and 10 seconds respectively for Run, as you would expect:

One last thing to mention: in the full job graph diagram above you’ll see that certain nodes are highlighted in red. That’s because they are on the critical path, which is documented here; it’s the chain of jobs that dictates the overall length of the refresh, so if you want to make your refresh faster then you need to tune the jobs on the critical path. In this case the critical path goes through X to XYUnion: if you wanted the whole semantic model to refresh faster you would need to tune the refresh for either X or XYUnion; tuning the refresh for Y would make no difference to the overall semantic model refresh time.

Waiting

As I already mentioned, there is a limit on the number of jobs within a refresh that can run in parallel. The maximum number of jobs is represented by the number of “slots” – one slot can only run one job at a time – and in the screenshots of the popups above you can see each job has a slot number. If there are more jobs that could be run than there are slots available at a given time then some jobs have to wait for a slot.

Here’s another example: a semantic model with three tables, A, B and C, which each take 30 seconds to refresh.

There are no dependencies between the tables so in theory each of these three tables could refresh in parallel. However if you refresh with maxParallelism set to two then only two of the three can refresh at any one time. Here’s the job graph for a refresh that does that:

As you can see the critical path goes through the refresh jobs for table A, and hovering over the “Process Partition A[A]” job shows the following:

While this job was not blocked at all because there are no jobs it depends on, it had to wait 30 seconds for a slot to become available; it eventually ran in slot 0. Hovering over the nodes for “Process Partition B[B]” and “Process Partition C[C]” shows that they ran in slots 0 and slot 1 respectively and neither of them were blocked or had to wait.

The job graph isn’t always the best way of visualising this type of parallelism; Phil Seamark’s original Power BI report for visualising refreshes has a page which shows the slots and the jobs in them but I think there’s probably a better way of visualising all of this data that shows slots as well as dependencies. Maybe a Deneb visual is the answer? If anyone has ideas I’d be interested to hear them! In any case, the first step to doing this is to extract all of this data from the .DGML file into a table and that’s what I’ll demonstrate how to do in the next post in this series.

Visualising Power BI Import Mode Refresh Job Graphs

A few years ago a new pair of Profiler events was added for Power BI Import mode datasets (and indeed AAS models): the Job Graph events. I blogged about them here but they never got used by anyone because it was extremely difficult to extract useful data from them – you had to run a Profiler trace, save the trace file, run a Python script to generate a .dgml file, then open that file in Visual Studio – which was a shame because they contain a lot of really interesting, useful information. The good news is that with the release of Semantic Link in Fabric and the ability to run Profiler traces from a Fabric notebook it’s now much easier to access Job Graph data and in this blog post I’ll show you how.

Quick recap: what are the Job Graph events and why are they useful? Let’s say you have a Power BI Import mode semantic model and you want to optimise refresh performance. When you refresh a semantic model, that refresh is made up of multiple jobs which themselves are made up of multiple jobs: refreshing a semantic model involves refreshing all the tables in that model, refreshing a table involves refreshing all the partitions in that table, refreshing a partition involves loading the data and building attribute hierarchies, and so on. Some of these jobs can happen in parallel but in some cases there are dependencies between jobs, so one job can only start when another has completed. The Job Graph events give you information on these refresh jobs and the dependencies between them so you can work out which jobs you need to optimise. In order to capture information from them you need to run a trace while the semantic model is being refreshed; the data from some of these Job Graph events can be reconstituted into a Directed Graph Markup Language (DGML) file, which is an XML-based format, and once you’ve got that you can either visualise the DGML file using a suitable viewer or extract the data from it and analyse it further.

[Before I carry on I have to acknowledge that I’m extremely new at Python and a lot of the code in this post is adapted from the code in my colleague Phil Seamark’s excellent recent post on visualising Power BI refresh information with Semantic Link. Any feedback on ways to optimise the code is gratefully received.]

Here’s some Python code that you can use in a Fabric notebook to run a refresh and generate a DGML file. Each code snippet can be used in a separate code cell or combined into a single cell.

First of all you need to install Semantic Link:

%pip install semantic-link

Next you need to define the events you want in your trace, which in this case are just the Job Graph events:

import sempy.fabric as fabric
import pandas as pd
import time
import warnings

base_cols = ["EventClass", "EventSubclass", "TextData", "IntegerData"]

# define events to trace and their corresponding columns

event_schema = {
"JobGraph": base_cols
}

warnings.filterwarnings("ignore")

You then need to start a trace using this definition, refresh the semantic model, stop the trace and filter the events captured so you only have those with the EventSubclass GraphFinished, remove the event which contains the metadata (which has a value of 0 in the IntegerData column) and then finally sort the rows in ascending order by the values in the IntegerData column:

WorkspaceName = "Insert workspace name here"
SemanticModelName = "Insert semantic model name here"

with fabric.create_trace_connection(SemanticModelName,WorkspaceName) as trace_connection:
# create trace on server with specified events
with trace_connection.create_trace(event_schema, "Simple Refresh Trace") as trace:

trace.start()

# run the refresh
request_status_id = fabric.refresh_dataset(SemanticModelName, WorkspaceName, refresh_type="full")
print("Progress:", end="")

while True:
status = fabric.get_refresh_execution_details(SemanticModelName, request_status_id, WorkspaceName).status
if status == "Completed":
break

print("â–‘", end="")
time.sleep(2)

print(": refresh complete")
# allow ending events to collect
time.sleep(5)

# stop Trace and collect logs
final_trace_logs = trace.stop()



# only return GraphFinished events
final_trace_logs = final_trace_logs[final_trace_logs['Event Subclass'].isin(["GraphFinished"])]
# ignore metadata row
final_trace_logs = final_trace_logs[final_trace_logs['Integer Data'].ne(0)]
# sort in ascending order by Integer Data column
final_trace_logs = final_trace_logs.sort_values(by=['Integer Data'], ascending=True)

Finally, you need to take all the text from the EventText column of the remaining events and concatenate it to get the contents of the DGML file and then save that file to the Files section of the Lakehouse attached to your notebook:

# concatenate all text in TextData column
out = ''.join(final_trace_logs['Text Data'])
# change background colour of critical path nodes so it's easier to see in VS Code
out = out.replace("#263238", "#eba0a7")

# write dgml file
dgmlfile = open("/lakehouse/default/Files/" + request_status_id + ".dgml", 'x')
print (out, file=dgmlfile)
dgmlfile.close()

#dispose of trace connection
trace_connection.disconnect_and_dispose()

I found a nice Visual Studio Code extension called DGMLViewer which makes viewing DGML files easy. Rather than manually downloading the file, OneLake Explorer makes it easy to sync files in OneLake with your PC in a very similar way to OneDrive, which makes working with these DGML files in VS Code very straightforward because you can simply open the local copy when it syncs.

Here’s what one of thse DGML files, generated from the refresh of a very basic semantic model, looks like when viewed in DGML Viewer:

If you have Visual Studio you can also use it to view DGML files (you need to install the DGML Editor first); I found a VS extension called DgmlPowerTools 2022 which adds some advanced features. Here’s what a DGML file for a refresh looks like when visualised in Visual Studio 2022:

OK, so this looks cool but it also looks very complicated. What does it all mean? How can you interpret all this information and use it to optimise a refresh? That’s something for a future blog post!

[In my next post I look at how you can interpret this data and understand the concepts of blocking and waiting, and in the post after that show how you can extract the data in this DGML file to a table using a Dataflow]

IF, SWITCH And The Effect Of DAX Variables On Strict/Eager Evaluation

A lot of DAX performance problems relate to the use of strict and eager evaluation with the IF or SWITCH functions. It’s an incredibly complex, “it depends”, black-box type of topic and naturally Marco and Alberto have a great article on it here. Rather than describe any general rules – which would be pretty much impossible – in this blog post I want to show a specific scenario that illustrates how the use of DAX variables can influence whether strict or eager evaluation takes place.

Consider a Power BI semantic model that has the following three tables:

TableA and TableB are fact tables with numeric measure columns; Choice is a disconnected table containing text values that is intended for use in a slicer, so a report user can select an item in that slicer and that in turn influences what values a measure returns:

Here’s the definition of one such measure:

With Variables =
VAR a =
    SUM ( TableA[A] )
VAR b =
    SUM ( TableB[B] )
RETURN
    IF ( SELECTEDVALUE ( Choice[Choice] ) = "TableA", a, b )
)

And here’s my test report that includes a slicer and a card visual displaying the output of this measure:

Let’s look at the DAX query generated by the card visual containing the measure when “TableA” is selected in the slicer:

DEFINE
    VAR __DS0FilterTable =
        TREATAS ( { "TableA" }, 'Choice'[Choice] )

EVALUATE
SUMMARIZECOLUMNS (
    __DS0FilterTable,
    "With_Variables", IGNORE ( 'Choice'[With Variables] )
)

…and in particular what DAX Studio’s Server Timings feature shows us:

Even though only TableA is selected in the slicer and the query only returns the sum of the values in the A column of TableA, we can see that the Storage Engine is also querying TableB and getting the sum of the B column. It’s a great example of eager evaluation: both branches of the IF are being evaluated. Is this a bad thing? For this particular report it may be if the Storage Engine query for TableB is expensive.

How can you force strict evaluation to take place? You can force eager evaluation using the IF.EAGER function but there is no equivalent function to force strict evaluation. However you maybe be able to rewrite the measure to get strict evaluation to take place.

The key factor in this case is the use of variables in the measure definition. If you rewrite the measure to not use variables, like so:

No Variables =
IF (
    SELECTEDVALUE ( Choice[Choice] ) = "TableA",
    SUM ( TableA[A] ),
    SUM ( TableB[B] )
)

…then the query for the card visual query behaves differently:

DEFINE VAR __DS0FilterTable = 
TREATAS({"TableA"}, 'Choice'[Choice])

EVALUATE
SUMMARIZECOLUMNS(__DS0FilterTable, "No_Variables", IGNORE('Choice'[No Variables]))

Notice that there are now only two Storage Engine queries and that TableB is not now being queried, which will probably result in better performance. This is strict evaluation.

Why does the use of variables result in the use of eager evaluation here? Because it does, and it’s complicated. I need to stress that DAX uses lazy evaluation for variables which means that variables are not evaluated if they are not used – in the first measure above the IF is deliberately evaluating both branches. There are certainly other optimisations that may kick in and result in strict evaluation even when variables are used in IF/SWITCH. Indeed, the two measures in this example being from different fact tables is extremely important: if they had been from the same fact table then the behaviour would have been different and strict evaluation would have been used. In summary, though, if you are using variables and you want to try to force strict evaluation with IF/SWITCH then it’s worth rewriting your code to remove those variables to see if it makes a difference.

I also need to stress that these two measures will perform better or worse depending on how and where they are used. Consider these two table visuals that use the values from the Choice table on rows along with the two measures above:

Running the DAX query for the table on the left, which uses the measure with no variables and strict evaluation, gives the following in DAX Studio’s Server Timings:

Note that there are two Storage Engine queries for TableB, which is not a good thing.

On the other hand, the table on the right which uses the [No Variables] measure gives the following:

Note that there is only one Storage Engine query for TableB now, so in this case the tables have turned and the [With Variables] measure is likely to perform better.

When you use variables in the branches of IF/SWITCH, as in the [With Variables] measure, the variables are evaluated in a context at the place of the variable definition; if you don’t use variables in this way, as in the [No Variables] measure, the two SUM expressions used in the two branches are evaluated in the context of the branch, which adds a hidden filter context corresponding to the condition. This has two consequences which may hurt query performance:

  1. During evaluation, the hidden filter may be materialised into a large in-memory table
  2. Fusion cannot happen across common subexpressions in different branches because they have different contexts

In contrast the use of variables means the expressions used in them are evaluated in a context without the hidden filter corresponding to the branch, which can encourage fusion and discourage the materialisation of large in-memory tables.

[Thanks to Marius Dumitru and Jeffrey Wang for much of the information in this post]

Excel Workbook Layout And The Performance Of Reading Data With Power Query In Power BI

Excel workbooks are one of the slowest data sources you can use with Power Query in Excel or Power BI. Reading small amounts of data from small workbooks is usually fast; reading large amounts of data from large workbooks can be very slow. But what about reading small amounts of data from large Excel workbooks? I did some tests and it turns out that performance can vary a lot depending on where your data is in the workbook and how that workbook is structured.

[Note: in this post I’ll be looking at .xlsx files, rather than other Excel formats like .xls and .xlsb; Excel files stored on a local disk and accessed via the File.Contents M function rather than stored in SharePoint or any other location; data read from Excel tables rather than direct from the worksheet; and Power Query in Power BI. Other scenarios may perform differently.]

Let’s see a simple example to illustrate what I found out. I created a new Excel workbook with one worksheet in and put a small table of data on it:

At this point the workbook’s size was 11KB. I then opened Power BI Desktop and created a Power Query query that read this table of data from the Excel workbook:

let
    Source = Excel.Workbook(File.Contents("C:\MyWorkbook.xlsx"), null, true),
    Sales_Table = Source{[Item="Sales",Kind="Table"]}[Data],
    #"Changed Type" = Table.TransformColumnTypes(Sales_Table,{{"Product", type text}, {"Sales", Int64.Type}})
in
    #"Changed Type"

Then I used this technique to measure how long it took to load the data from Excel into Power BI. Unsurprisingly, it was extremely fast: 63ms.

Then I added a new worksheet to the workbook, copied the same table onto it, added a large amount of random numbers underneath using the following Excel formula, and then copied and pasted the values returned by the formula over the output of the formula:

=RANDARRAY(9999,300)

Doing this meant the size of the workbook grew to 43MB. I then created a new Power Query query in Power BI Desktop, identical to the one above except that it connected to the new table. This time the query took 4918ms – almost 5 seconds.

Interestingly, even with the second worksheet with all the data on was added, the first query above (on the worksheet with no other data on) was still fast. I also tested refreshing a Power BI dataset that connected to two identical small tables on different worksheets in the same workbook, both with large amounts of other data on as in the second scenario above, and the performance of both queries was only slightly slower: it was clear two Power Query queries can read data from the same Excel workbook in parallel.

So: reading a small amount of data from a table on a worksheet with a large amount of other data on it is very slow.

What can we learn from this? Well, if you can influence the structure and layout of the Excel workbooks you are using as a data source – and that’s a big if, because in most cases you can’t – and you only need to read some of the data from them, you should put the tables of data you are using as a source on separate worksheets and not on the same worksheet as any other large ranges or tables of data.

It turns out that when the Power Query Excel connector reads data from an .xlsx file it can deserialise just some of the data in it rather than the whole thing, but what it can and can’t avoid deserialising depends a lot on the structure of the workbook and how the data is stored within the workbook .xlsx file. If you’re quick you can even see how much data is being read in Power BI Desktop in the refresh dialog:

You can also use Process Monitor, as I describe here, to see how much data is being read from any file used by Power Query.

Performance also depends on which application generated the .xlsx file (it’s not just Excel that creates .xlsx files, because other applications export data to .xlsx format without using Excel) or even which version of Excel saved the .xlsx file. This is because the same data can be stored in an .xlsx file in different ways, some of which may be more efficient to read than others. I found this blog post by Brendan Long on the .xlsx file format was very clear and it helped me understand how Power Query might go about reading data from an .xlsx file.

[Thanks to Curt Hagenlocher of the Power Query team for answering some of my questions relating to this post]

Exploring Power BI Run-Length Encoding With DMVs

Recently I was involved in an interesting discussion on Twitter X about how partitioning a table in a Power BI dataset semantic model can affect compression and therefore its size and query performance. You can read the thread here. It got me thinking: is there a way to get more detail on how well compression, and in particular run-length encoding (RLE from hereon), is working for a column when you’re using Import mode or Direct Lake? After a bit of research I found out there is, so let’s see some examples that illustrate what I learned.

First of all, consider the following M query that returns a table with one numeric column called MyNumbers:

let
    Source = {1..DistinctValues},
    Repeat = List.Transform(Source, each List.Repeat({_}, TotalRows/DistinctValues)),
    Combine = List.Combine(Repeat),
    #"Converted to Table" = Table.FromList(Combine, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Renamed Columns" = Table.RenameColumns(#"Converted to Table",{{"Column1", "MyNumbers"}}),
    #"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"MyNumbers", Int64.Type}})
in
    #"Changed Type"

It references two M parameters: TotalRows, which determines the total number of rows in the table, and DistinctValues, which determines the number of distinct numeric values in the MyNumbers column. With TotalRows=9 and DistinctValues=3, it returns the following output:

Note that in this case it returns 3 rows with the value 1 repeated, 3 rows with the value 2 repeated and 3 three rows with the value 3 repeated; three sets of repeated values in all. It’s fair to assume that repeated sets of values like this are a good candidate for RLE.

I created a semantic model in Import mode containing only this table and published it to the Power BI Service. Initially TotalRows was set to 1,000,000 and DistinctValues was set to 100 – so the table consisted of just 100 sets of 10,000 repeated values. I chose 1,000,000 rows because that’s the size of a segment in the Power BI Service with the “small semantic model format” setting and any compression that takes place always takes place within a segment.

When the Analysis Services engine inside Power BI compresses data it looks for sequences of repeated values to see if RLE can be used. If it finds them, these sequences result in “pure” RLE runs; if it doesn’t find these sequences they are called “impure” RLE runs and the values are stored using bitpack compression. Pure runs are generally a good thing, impure runs generally a bad thing. You can see how many pure and impure runs there are using the TMSCHEMA_COLUMN_STORAGES DMV, for example with the following DMV query:

select 
[Name], Statistics_DistinctStates, Statistics_RowCount, 
Statistics_RLERuns, Statistics_OthersRLERuns 
from $SYSTEM.TMSCHEMA_COLUMN_STORAGES

Running this query in DAX Studio on my published semantic model returned the following table:

[You can ignore all the rows except the one for the MyNumbers column in this table]

The Statistics_RLERuns column shows the number of pure RLE runs; the Statistics_OthersRLERuns column shows the number of impure RLE runs. In this case you can see, for the MyNumbers column, there were 100 pure RLE runs and no impure runs, so as expected RLE is working well.

Here’s what Vertipaq Analyzer showed for this table:

Unsurprisingly the size of the MyNumbers column is very small.

Then I changed DistinctValues to 100,000 (keeping TotalRows at 1,000,000), giving me 100,000 sets of 10 values, and refreshed the dataset. Here’s what the DMV query on TMSCHEMA_COLUMN_STORAGES returned:

And here’s what Vertipaq Analyzer showed:

As you can see, the column was a lot larger than before; there were no pure RLE runs and one impure RLE run. In this case the large number of distinct values in the column prevented RLE from taking place and this had a negative impact on the size of the column.

These are two extreme cases. What about a scenario that’s somewhere in between? I modified my M query as follows:

let  
    RepeatedNumbers = 
    let
    Source = {1..DistinctValues},
    Repeat = List.Transform(Source, each List.Repeat({_}, ((TotalRows/2)/DistinctValues))),
    Combine = List.Combine(Repeat),
    #"Converted to Table" = Table.FromList(Combine, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Renamed Columns" = Table.RenameColumns(#"Converted to Table",{{"Column1", "MyNumbers"}}),
    #"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"MyNumbers", Int64.Type}})
in
    #"Changed Type",

    RandomNumbers = 
    let
    Source = {1..TotalRows/2},
    #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Added Custom" = Table.AddColumn(#"Converted to Table", "MyNumbers", each Number.Round(Number.RandomBetween(TotalRows+1, TotalRows*2))),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"MyNumbers"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Removed Other Columns",{{"MyNumbers", Int64.Type}})
in
    #"Changed Type",

    Output = Table.Combine({RepeatedNumbers, RandomNumbers})
in
    Output

What this version of the code does is return a table where the first 50% of the rows are repeated numbers and the second 50% are random numbers. With TotalRows set to 12 and DistinctValues set to 2 it produces the following output:

With this version published to the Power BI Service I set TotalRows to 1,000,000 again and set DistinctValues to 2000, resulting in a table with 2000 sets of 250 repeating values followed by 500,000 random values. Here’s what the DMV query against TMSCHEMA_COLUMN_STORAGES returned:

As you can see there are now 2000 pure runs (I assume for the first 50% of rows with repeated values) and 1 impure run (I assume for the second 50% of rows with random values).

Here’s the output of Vertipaq Analyzer:

The column is now almost as large as in the second scenario above.

You can get a bit more detail about what’s happening in the impure runs with the DISCOVER_STORAGE_TABLE_COLUMN_SEGMENTS DMV. Running the following query against the latest version of the table:

select 
column_ID, partition_name, segment_number,
records_count, bits_count, used_size
from $SYSTEM.DISCOVER_STORAGE_TABLE_COLUMN_SEGMENTS

…returns the following values:

To get a rough idea of the number of rows in the impure runs you can use the following formula:

(used_size * 8)/bits_count

In this case for the MyNumbers column (1349720 * 8)/21 = 514,179 which makes sense since my code returns 500,000 rows of random numbers. The records_count column in this query returns the total number of rows in the segment, so the higher the numberof rows in impure runs relative to the total, the worse compression you’re getting.

What practical use is this information? Probably not much as you might think, interesting as it is. It can tell you how well RLE is working for a column but it doesn’t tell you much about how to optimise it, or if it is possible to optimise it, or if optimising it is a good idea – that’s a subject for another blog post.

[Thanks to Marius Dumitru and Akshai Mirchandani for the information in this post]