Which Columns Are Loaded Into Memory When In Power BI Direct Lake Mode?

As you probably know, in Power BI Direct Lake mode column data is only loaded into memory when it is needed by a query. I gave a few examples of this – and how to monitor it using DMVs – in this blog post from last year. But which columns are loaded into memory in which circumstances? I was thinking about this recently and realised I didn’t know for sure, so I decided to do some tests. Some of the results were obvious, some were a surprise.

Test semantic model and methodology

For my tests I loaded the following tables of data into a Fabric Lakehouse:

…and created a Direct Lake custom semantic model that looked like this:

Before every test I refreshed the model to make sure there was no data in memory (be aware that in the future it may be that refreshing a model does not purge all column data from memory). Then, for each test, I ran a single DAX query from DAX Studio and afterwards ran the following query to see whether the dictionaries for each column were resident in memory (again, see my previous post for background):

EVALUATE 
SELECTCOLUMNS(
INFO.STORAGETABLECOLUMNS(),
"TABLE_ID", [TABLE_ID],
"COLUMN_ID", [COLUMN_ID],
"DICTIONARY_ISRESIDENT", [DICTIONARY_ISRESIDENT])

Note that I’m using one of the new DAX Info functions, INFO.STORAGETABLECOLUMNS(), instead of the older DMV syntax I was using last year; I think the DAX Info functions are a lot more convenient to use.

[I also looked at whether column segments were paged into memory using the INFO.STORAGETABLECOLUMNSEGMENTS() but found that its results were consistent with INFO.STORAGETABLECOLUMNS() so I didn’t include its results in this post and assumed that if a column’s dictionary was resident in memory, so were the associated column segments. It’s also worth mentioning that the latest versions of DAX Studio have some great new functionality in the Model Metrics for telling you which columns are resident in memory in a Direct Lake model]

Here’s what the query above returned immediately after a refresh, before any other queries had been run:

The DICTIONARY_ISRESIDENT column tells you whether a column’s dictionary is resident in memory. Each table in a Direct Lake model (as in an Import mode model) has a hidden column called RowNumber, and for a Direct Lake model this column is always resident in memory. As you can see, all other columns are not resident in memory at this point.

Counting rows in a table

For my first test I created a measure that counted the rows in the Sales table:

Order Count = COUNTROWS('Sales')

I then ran a query that returned just the result of this measure:

EVALUATE
CALCULATETABLE(
    ROW(
    "Order Count", [Order Count]
    )
)

I then ran my query to see what had been loaded into memory, and…

…nothing had changed! No new column data had been loaded into memory at all. I assume this is because Power BI can resolve this query using the RowNumber column from the Sales table.

I then added the Order_Status column from the Sales table to the query:

EVALUATE
SUMMARIZECOLUMNS(
    Sales[Order_Status],
    "Order Count", [Order Count]
)
ORDER BY 
    Sales[Order_Status] ASC

After this query, the Order_Status column was resident in memory as you might expect, but no other column was:

Relationships

Instead of using the Order_Status column, I then looked at the impact of using a column from a different table. I created a query that showed Order Count by Customer Name:

EVALUATE
SUMMARIZECOLUMNS(
    Customer[Customer_Name],
    "Order Count", [Order Count]
)
ORDER BY 
    Customer[Customer_Name] ASC

After this, the Customer_Name column was in memory along with the two CustomerID columns used in the relationship between the Customer and Sales table:

Any time your query references columns in different tables, the columns used in the relationships between those tables must also be resident in memory.

Measures

I then defined another measure:

Sales Amount = SUM(Sales[Sales_Amount])

And ran the following query:

EVALUATE
SUMMARIZECOLUMNS(
    Sales[Order_Status],
    "Sales Amount", [Sales Amount]
)
ORDER BY 
    Sales[Order_Status] ASC

After this, the Order_Status and Sales_Amount columns were resident in memory:

No surprises here: as you would expect, if a column is referenced by a measure then it needs to be resident in memory.

Measures that use relationships also work as expected. I created the following measure:

Order Count Citrus = 
CALCULATE([Order Count], 'Product'[Category]="Citrus")

…and the following query:

EVALUATE
SUMMARIZECOLUMNS(
    Customer[Customer_Name],
    "Order Count Citrus", [Order Count Citrus]
)
ORDER BY 
    Customer[Customer_Name] ASC

After this query, all the columns used in the measure and all the relationships needed by the query and the measure were resident in memory:

I was curious to know what the effect of efficient versus inefficent DAX in measures might be on which columns were loaded into memory though. I created the following two measures to see the impact of the antipattern of filtering on an entire table in the second parameter of Calculate():

Order Count Status X V1 = 
CALCULATE([Order Count], 'Sales'[Order_Status]="X")

Order Count Status X V2 = 
CALCULATE([Order Count], FILTER('Sales', 'Sales'[Order_Status]="X"))

My first query used the first of these measures, the efficient version:

EVALUATE
SUMMARIZECOLUMNS(
    Sales[Order_Status],
    "Order Count Status X V1", [Order Count Status X V1]
)
ORDER BY 
    Sales[Order_Status] ASC

After this query the Customer_Name column, the two Customer_ID columns and the Order_Status column were resident in memory:

But what about the second, inefficient version of the measure?

EVALUATE
SUMMARIZECOLUMNS(
    Customer[Customer_Name],
    "Order Count Status X V2", [Order Count Status X V2]
)
ORDER BY 
    Customer[Customer_Name] ASC

The same columns were resident in memory:

I don’t think this means this version of the measure is any less efficient than it is in Import mode, just that the two versions of the measure need the same columns to be resident in memory to run.

Conclusion

Knowing which columns your query needs to have resident in memory is important for two reasons in Direct Lake mode: it helps understand query performance, because loading column data into memory takes time (Teo Lachev published some test results in a blog post a few days ago which show this clearly); it also helps you understand model memory usage and where you stand relative to the memory limits for the capacity SKU you’re using. As these results show it’s better to test to see which columns your Power BI reports need to have resident in memory rather than rely on guesswork.

It is also possible to use the DISCOVER_CALC_DEPENDENCIES DMV (or the INFO.CALCDEPENDENCY function) to see which tables, columns, measures and relationships are referenced by a DAX query as I described here; I’ve been playing around with this for Direct Lake too, and will report my findings in a future blog post.

Monitoring The Opening Of DirectQuery Connections In Power BI

In the past I have blogged about how important the number of connections from a Power BI DirectQuery semantic model to a data source is for performance. It’s also true that in some cases opening a connection, or some of the operations associated with opening a connection, can be very slow. As a result it can be useful to see when your semantic model opens connections to a data source, and you can do this with Log Analytics.

To show this I published a DirectQuery semantic model to a workspace with Log Analytics enabled and opened a report with one table visual on it. Because the model and report had only just been published I could be sure there were no existing connections back to the data source used by the DirectQuery model. I then ran the following KQL query:

PowerBIDatasetsWorkspace
| where TimeGenerated>ago(1hr)
| where OperationName in ('Notification', 'QueryBegin', 'QueryEnd', 'DirectQueryBegin', 'DirectQueryEnd')
| project TimeGenerated, OperationName, OperationDetailName, EventText, DurationMs
| order by TimeGenerated asc

Here’s the output:

Some things to note:

  • I have highlighted in red the events of type Notification and subtype OpenedConnection. These events are fired when the semantic model opens connections back to the data source. There are three of these events so three connections are being opened.
  • There is one DAX query generated for the single visual on the report page. As you would expect there is a single pair of QueryBegin/QueryEnd events.
  • After the QueryBegin event there are two DirectQueryBegin/DirectQueryEnd events, which mean that for this DAX query Power BI ran two SQL queries against the source.
  • Immediately before these two DirectQueryBegin/DirectQueryEnd pairs, and after the QueryBegin event, there are two Notification events which indicate that two connections are being opened to run these queries.
  • Before the DAX query runs, another connection is opened and there is a DirectQueryBegin/DirectQueryEnd pair that is not associated with a DAX query that I have highlighted in blue. The EventText column contains M code rather than SQL code, and a look at this M code shows that it uses the DirectQueryCapabilities.From function which is marked for “internal use only” in the docs. The DirectQueryEnd event from this pair has a duration of 3 seconds which shows that in this case it’s quite expensive. This query is the Power Query engine understanding the DirectQuery capabilities of the source and one of the side effects of this is that it needs to read metadata from the source, which can be quite slow sometimes. Doing things like disabling relationship columns in the connector and, for some sources like Snowflake, connecting with an identity or with a security role that can only see the tables or views used by the model (as mentioned here), can make this a lot faster.

For some reason the Notification events associated with opening connections do not appear in Profiler traces run on models in the Service, although they do appear if you run a trace on a model in Power BI Desktop. This is why I’ve used Log Analytics here.

Power BI pools connections once they are opened and keeps them alive for a certain amount of time and it also caches data source metadata after it has retrieved it, so as a result you may not see the events I have highlighted here when you run a report.

[Thanks to Curt Hagenlocher and Akshai Mirchandani for some of the information in this post]

Power BI DirectQuery, Gateways And SQL Queries That Return Lots Of Rows

Recently I was working with a customer using DirectQuery mode and where all traffic to the data source had to go through an on-premises data gateway for security reasons. They noticed that report performance got worse when traffic went through the gateway and this was particularly true when Power BI generated SQL queries that returned hundreds of thousands of rows. Looking in the gateway logs we found that spooling (see here and here for more details on spooling, and here for information on monitoring spooling in the logs) was taking place, so I advised the customer to set the StreamBeforeRequestCompletes property on the gateway to true. This eliminated the delays from spooling on the gateway (I’m told it helps performance inside the model too) and resulted in some substantial improvements in report performance. As a result, if you’re using DirectQuery and a gateway, you should experiment with the StreamBeforeRequestCompletes property to see if it can help you too.

That said, if you’re using DirectQuery mode you should try to avoid situations where Power BI generates SQL queries that return a large number of rows such as table visuals with scrollbars. Regardless of whether you’re using a gateway or not, moving large amounts of data from your data source to your semantic model can be slow – even if your data source tells you the SQL query itself is fast. You can use Performance Analyzer in Power BI Desktop (see here) to find out how many rows the SQL queries Power BI generates return and how long it takes to read that data.

Power BI/AI Book Roundup

Here’s another one of my occasional posts about books I’ve been sent free copies of. Full disclosure: as always, these aren’t reviews as such, they’re more like free publicity in return for the free books and I don’t pretend to be unbiased; also the Amazon UK links have a affiliate code in that gives me a kickback if you buy any of these books.

The AI Value Playbook, Lisa Weaver-Lambert

What am I doing covering an AI book here? Lisa is an ex-colleague of mine at Microsoft and I respect her opinions. Also, I suspect like a lot of you, I have mixed feelings about the current AI boom: I can see the value in AI but I can also see the vast amount of hype and the obviously ridiculous claims being made. More than anything I see senior executives talking confidently about a subject I’m sure they don’t understand, and that is clearly a big problem. This book aims to help solve that problem by providing a practical guide to AI for non-technical leaders, in the form of a series of case studies and interviews with entrepreneurs and C-level people in the AI space. This is a very readable book – Lisa has talked to a lot of interesting, knowledgeable people – and the format makes it a lot more palatable for the target audience of your boss’s boss’s boss than your average tech book. As a technical person who isn’t by any means an AI expert I also enjoyed reading it.

The Complete Power BI Interview Guide, Sandielly Ortega Polanco, Gogula Aryalingam and Abu Bakar Nisa Alvi

Spend any time on public Power BI forums and you’ll see a lot of questions from people who want to know how to start a career in Power BI or get tips for Power BI interviews; as a result I’m sure there’s a big market for a book like this. It’s a mix of technical topics (the type that you might be asked about in a technical interview for a Power BI job) and non-technical advice such as how to network on LinkedIn, negotiate salaries and acecpt or reject a job offer. That might seem a bit of a strange combination but it works and the advice is both detailed and very sensible, so as a result I would have no hesitation in recommending this to anyone trying to get a job as a Power BI developer.

Power BI Refresh, Memory Usage And Semantic Model Scale Out

In my recent posts on the Command Memory Limit error and the partialBatch mode for Power BI semantic model refresh, I mentioned that one way to avoid memory errors when refreshing large semantic models was to run use refresh type clearValues followed by a full refresh – but that the downside of doing this was that your model would not be queryable until the full refresh had completed. Immediately afterwards some of my colleagues (thank you Alex and Akshai) pointed out that there was in fact a way to ensure a model remained queryable while using this technique: using Semantic Model Scale Out. How? Let me explain…

Problem Recap

As I described here, when you run a full refresh on a semantic model that already has data loaded into it, the original version of the model stays in memory while the new data is loaded in, which means that the overall amount of memory required for the refesh is around double the amount needed to store the model. Since there are limits on the amount of memory that a model can use which vary depending on whether you’re using a capacity or not and what size capacity you’re using, if you have a large model then you might find refresh fails with a memory error. Running a refresh of type clearValues first removes all the data from the model, reducing its memory usage, and improves the chances of a full refresh using less than the allowed maximum amount of memory and therefore succeeding. Removing all the data from your model, though, means that your reports won’t show any data until you do a full refresh.

How can Semantic Model Scale Out help?

Semantic Model Scale Out (also known as Query Scale Out) is primarily a performance feature: by creating multiple replicas of your semantic model, one which is used for refreshes and one or more that are used to answer queries, it prevents refreshes from interfering with report performance and allows report queries to be distributed over multiple physical nodes. All of these replicas of the semantic model need to be kept in sych, which means that when the model that is used for refresh has new data loaded into it, all the other versions of the model need to be replaced by this new version. By default this synchronisation happens automatically but you can also turn this off and control the synchronisation manually.

As a result, if you turn on Semantic Model Scale Out and turn off automatic synchronisation, you can run a refresh of type clearValues to clear the data from your model and then run a full refresh without affecting your end users – who won’t see the new data until you do a manual synchronisation.

Example

Let’s see a simple illustration of how this works. Using the same semantic model I used in this post, hosted on an F64 capacity, I created a report with a single card visual:

With Scale Out turned off, I ran a refresh of type clearValues. This was very fast and after it had finished – as you would expect – the card visual showed a blank value because all of the data had been cleared from the model:

I then ran a full refresh and when it completed the card showed data again.

Next, I turned on Scale Out in the settings pane of the semantic model:

…and turned off automatic replica synchronisation using the script here.

My colleague Michael Kovalsky has very kindly added functionality to Semantic Link Labs to handle manual replica synchronisation, which means it was extremely simple to run the refreshes and synchronise replicas from a Fabric notebook. After installing Semantic Link Labs:

%pip install semantic-link-labs

…I used the following Python code in a notebook cell to run a refresh of type clearValues, followed by a full refresh, followed by a manual replica sychronisation:

import sempy_labs as labs
WorkspaceName = "Model Memory Tests"
SemanticModelName = "ModelMemoryDemo3"
# run a refresh of type clearValues first
labs.refresh_semantic_model(dataset=SemanticModelName, workspace=WorkspaceName, refresh_type="clearValues")
# then a refresh of type full
labs.refresh_semantic_model(dataset=SemanticModelName, workspace=WorkspaceName, refresh_type="full")
# then a manual replica sync
labs.qso_sync(dataset=SemanticModelName, workspace=WorkspaceName)

And as expected the card visual shown above showed data all through the refreshes.

SQLBits Power BI/Fabric Session Recordings Now Available

One of the great things about the SQLBits conference in the UK is the way that all the session recordings are made available for for anyone to view for free on their YouTube channel. All the recordings from the 2024 conference have now been posted and it’s a goldmine for anyone who wants to deepen their Power BI or Fabric knowledge.

Here are the recordings for my three sessions:

Fabric Direct Lake Deep Dive

What’s new in Power Query in Power BI, Fabric and Excel?

Connect Excel to Power BI data with Live Connected Tables

There are so many other great sessions that it’s hard to pick out any highlights, but there are two that you might easily miss which I think are ones you should watch.

First, this session on DAX fusion by my colleague Phil Seamark is one I learned a lot from. It sounds like an obscure topic but it’s extremely important for anyone trying to optimise their measures (including for DirectQuery models):

Second, this session by another colleague of mine, Kasper de Jonge, on how security works with Fabric – not just data security but security of the platform as a whole – is a must-watch because so many of the assumptions people have are not relevant for a SaaS platform like Fabric:

Why Power BI Table Visuals With Scrollbars Can Cause Problems

Concluding my series of blog posts on seemingly harmless things you can do in a Power BI report that can lead to performance problems and the “This visual has exceeded the available resources” error (see also “Calculate(), Filter() and DAX memory usage” and “DAX measures that never return blank“), in this post I’ll show how table visuals with vertical scrollbars that potentially show thousands of rows can be a Bad Thing.

Using the Import mode semantic model I’ve used in the last two posts, consider this table visual:

The two measures displayed are trivial counts and averages. However there are 538753 combinations of County and Postcode that have data, and so in theory the table can display 538753 rows if you scroll down far enough. That can’t be good, can it?

Those of you who know how Power BI generates DAX queries for table visuals like this might disagree, though. Here’s the DAX query for this table visual:

DEFINE
	VAR __DS0Core = 
		SUMMARIZECOLUMNS(
			'Property Transactions'[County],
			'Property Transactions'[Postcode],
			"Count_Of_Sales", 'Property Transactions'[Count Of Sales],
			"Average_Price_Paid", 'Property Transactions'[Average Price Paid]
		)

	VAR __DS0PrimaryWindowed = 
		TOPN(501, __DS0Core, 'Property Transactions'[County], 1, 'Property Transactions'[Postcode], 1)

EVALUATE
	__DS0PrimaryWindowed

ORDER BY
	'Property Transactions'[County], 'Property Transactions'[Postcode]

As you can see, the query contains a TOPN filter which means that it actually only returns the first 501 rows out of those potential 538753. It’s only if you scroll down far enough that another query is triggered to get the next 501 rows. So maybe it’s not so bad?

Actually it can be bad, even with the TOPN filter. Here are the Execution Metrics for the query above:

{
	"timeStart": "2024-07-10T22:54:30.330Z",
	"timeEnd": "2024-07-10T22:54:31.064Z",

	"durationMs": 734,
	"vertipaqJobCpuTimeMs": 219,
	"queryProcessingCpuTimeMs": 516,
	"totalCpuTimeMs": 734,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 66599,

	"commandType": 27,
	"queryDialect": 3,
	"queryResultRows": 501
}

Notice the durationMS and approximatePeakMemConsumptionKB metrics, which tell you how long the query takes to run and how much memory it uses.

Now consider the following version of a report with a slicer for the County column and the county “Bath and North East Somerset” selected.

Here’s the DAX query for the table visual now:

DEFINE
	VAR __DS0FilterTable = 
		TREATAS({"BATH AND NORTH EAST SOMERSET"}, 'Property Transactions'[County])

	VAR __DS0Core = 
		SUMMARIZECOLUMNS(
			'Property Transactions'[County],
			'Property Transactions'[Postcode],
			__DS0FilterTable,
			"Count_Of_Sales", 'Property Transactions'[Count Of Sales],
			"Average_Price_Paid", 'Property Transactions'[Average Price Paid]
		)

	VAR __DS0PrimaryWindowed = 
		TOPN(501, __DS0Core, 'Property Transactions'[County], 1, 'Property Transactions'[Postcode], 1)

EVALUATE
	__DS0PrimaryWindowed

ORDER BY
	'Property Transactions'[County], 'Property Transactions'[Postcode]

As you would expect, this query now contains a filter on the county selected in the slicer. It also contains a TOPN filter but the interesting thing is that since there are more than 501 postcodes in the selected county, the query returns exactly the same rows as before. Of course you can’t now scroll down and see data for other counties than the one selected in the slicer but you can always select a different county in the slicer.

Here are the Execution Metrics for the new query:

{
	"timeStart": "2024-07-10T22:58:29.725Z",
	"timeEnd": "2024-07-10T22:58:29.741Z",

	"durationMs": 16,
	"vertipaqJobCpuTimeMs": 0,
	"queryProcessingCpuTimeMs": 16,
	"totalCpuTimeMs": 16,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 2564,

	"commandType": 27,
	"queryDialect": 3,
	"queryResultRows": 501
}

Notice that both the duration and peak memory usage of the query are both much, much lower: duration has gone from 734ms to 16ms, while peak memory usage has gone from 66599KB to 2564KB. If you have more measures which are more complex and/or inefficiently written the difference could be much bigger.

Why is there such a big difference? Well if you look at the physical query plan generated for both queries you can see that before the TOPN filter is applied, the Vertipaq engine still has to spool the entire, unfiltered table. In the first query this unfiltered table has 538753 rows, which is why it’s relatively slow and memory hungry, whereas in the second query this table, with the County slicer, has only 1869 rows (the number of postcodes in the selected county). As a result slicing on County, even though it doesn’t affect which rows are returned after TOPN filter is applied, leads to a more efficient query. If you’re using DirectQuery mode something similar happens because, as I noted here, the TOPN filter can only be pushed back to the relational source in a few basic scenarios which means the SQL queries generated can return a very large number of rows and which can hit the Max Intermediate Row Set Count limit.

What can you learn from this? Table visuals that can display a large number of rows should be avoided. I’m not report design guru but I would go so far as to say that any table with a scrollbar on is a mistake: if your user needs to scroll down to see all the rows or scroll right to see all the columns, why not use a slicer to allow them to select the rows they want to see more easily or use field parameters to allow them to select the columns they want to see? Not only will this be better from a usability point of view, performance will be a lot better and memory usage will be lower. And yes, I know that your end users ask for tables with lots of rows and columns because that’s what they are used to seeing in Excel, but giving them what they want risks slow performance and memory errors. So they need to make a choice.

DAX Measures That Never Return Blank

Following on from my earlier post on the Query Memory Limit in Power BI, and as companion to last week’s post on how a DAX antipattern using Calculate() and Filter() can lead to excessive memory consumption by queries (and therefore lead to you hitting the Query Memory Limit), in this post I want to look at the effects of another DAX antipattern on performance and memory usage: measures that can never return a blank value.

Adding zero to the result returned by a measure

What do I mean by a measure that never returns blank? Well using the same model that I used in last week’s post, let’s say I have a simple measure that counts the rows in my fact table:

Count Of Sales = COUNTROWS('Property Transactions')

I can use this measure in a matrix visual with Property Type on columns and Postcode on rows, like so:

Here’s what Execution Metrics says about the DAX query for this visual:

{
	"timeStart": "2024-07-03T10:16:16.251Z",
	"timeEnd": "2024-07-03T10:16:19.345Z",

	"durationMs": 3094,
	"vertipaqJobCpuTimeMs": 188,
	"queryProcessingCpuTimeMs": 2656,
	"totalCpuTimeMs": 2844,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 179404,

	"commandType": 27,
	"queryDialect": 3,
	"queryResultRows": 230
}

And here’s what DAX Studio’s Server Timings shows:

It’s not fast (there are a lot of Postcodes even though the query has a Topn N filter and only returns 230 rows – that’s a subject for another blog post though) but the thing to look at is the peak memory consumption which is 179404KB or 175MB and the fact that there is just one Storage Engine query.

Now let’s say your boss says she hates all that white space in the matrix and wants to see zeroes instead. The obvious way to fix this is to add a zero to the result of the count in the measure, like so:

Count Of Sales = COUNTROWS('Property Transactions')+0

It’s a minor change so what could go wrong? Here’s what Execution Metrics shows after making that change:

{
	"timeStart": "2024-07-03T10:22:01.343Z",
	"timeEnd": "2024-07-03T10:22:10.390Z",

	"durationMs": 9047,
	"vertipaqJobCpuTimeMs": 313,
	"queryProcessingCpuTimeMs": 8578,
	"totalCpuTimeMs": 8891,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 379432,

	"commandType": 27,
	"queryDialect": 3,
	"queryResultRows": 612
}

And here are the Server Timings:

The query is now three times slower, has three Storage Engine queries, and uses 370MB of memory – twice as much as before! This increase in memory can easily push you over the Query Memory Limit and result in the “This visual has exceeded the available resources” error, as this real-life example shows.

Why does this change make such a big difference? Adding zero to the result of the count results in the creation of a “dense” measure, that’s to say a measure that never returns a blank value. The white space in the first visual was the result of the measure not returning any values, whereas in the second visual the measure is returning values for every cell in the matrix. What is a blank value? I strongly recommend you read Marco’s article here for a detailed introduction to this subject; blanks are somewhat similar to nulls in SQL but there are some very important differences, and in this case the thing to point out is that the expression BLANK()+0 returns 0 in DAX. As a result a measure which used to return a blank value in some cases now never returns a blank. This can be extremely bad for DAX performance and memory usage because the Vertipaq engine inside Power BI can do a lot of optimisations when it knows a measure will return blank in some circumstances. Nikola Illic has a great post here going into more detail about this.

Using FORMAT() in measures

There are a few other common ways you can run into the same problem. One is the use of the FORMAT() function to return a formatted string from a measure, for example:

Count Of Sales = FORMAT(COUNTROWS('Property Transactions'),"Standard")

Here’s what Execution Metrics shows for the visual above with the new version of the measure:

{
	"timeStart": "2024-07-04T09:48:43.655Z",
	"timeEnd": "2024-07-04T09:48:53.499Z",

	"durationMs": 9844,
	"vertipaqJobCpuTimeMs": 281,
	"queryProcessingCpuTimeMs": 9359,
	"totalCpuTimeMs": 9656,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 380480,

	"commandType": 27,
	"queryDialect": 3,
	"queryResultRows": 612
}

As you can see, the duration and memory usage is very similar to the version of the measure above that added a zero to the result of the COUNT(): slow and memory hungry.

The FORMAT() function returns a text value that represents a formatted version of the value passed in; blank values are converted to empty strings, so again this new version of the measure can never return a blank value. My recommendation is to always try to avoid using FORMAT() in measures because of the impact it can have on performance. Instead, try to use format strings (for more examples see my posts here and here) or dynamic format strings (you should also read the SQLBI article on dynamic format strings to be aware of the implications of using them) wherever possible. It’s a shame that format strings don’t have built in support for formatting blank values; I’m told there are some good technical reasons why this hasn’t happened yet.

IFERROR() and DIVIDE()

Some other DAX functions can also trigger this performance issue. The IFERROR() or ISERROR() functions are one example, since they allow you to trap errors in DAX expressions and return a value instead. However I can say quite confidently that you should never, ever use IFERROR() or ISERROR() when writing DAX anyway and there is some guidance documentation to back me up on this.

Another example which is less obviously bad is the use of the third parameter of the DIVIDE() function. As you probably know, the DIVIDE() function allows you to handle division by zero and division by blank safely, and by default it will return a blank value when division by zero or division by blank is detected. However there is an optional third parameter which allows you to return a value instead of a blank when dividing by zero or blank, and this can also trigger the problem we’re looking at.

Consider the following measure:

Division Test = 
DIVIDE(
    1,
    COUNTROWS('Property Transactions')
)

Here it is used in our visual:

Here are the Execution Metrics:

{
	"timeStart": "2024-07-04T11:42:29.636Z",
	"timeEnd": "2024-07-04T11:42:32.386Z",

	"durationMs": 2750,
	"vertipaqJobCpuTimeMs": 234,
	"queryProcessingCpuTimeMs": 2516,
	"totalCpuTimeMs": 2750,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 210960,

	"commandType": 27,
	"queryDialect": 3,
	"queryResultRows": 230
}

Now, here’s the same measure with the third parameter of DIVIDE() set:

Division Test = 
DIVIDE(
    1,
    COUNTROWS('Property Transactions'),
    0
)

And here are the Execution Metrics:

{
	"timeStart": "2024-07-04T11:44:24.997Z",
	"timeEnd": "2024-07-04T11:44:32.247Z",

	"durationMs": 7250,
	"vertipaqJobCpuTimeMs": 359,
	"queryProcessingCpuTimeMs": 6891,
	"totalCpuTimeMs": 7250,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 379436,

	"commandType": 27,
	"queryDialect": 3,
	"queryResultRows": 612
}

Once again, the second version of the measure that never returns a blank is slower and uses more memory.

Summary

As you have seen, Power BI measures that can never return a blank value can be very bad for performance and query memory usage – so as a result, when you are writing DAX you should always be careful to avoid doing this.

Calculate(), Filter() And DAX Memory Usage

Following on from my last post on the Query Memory Limit in Power BI, in this post I want to look at one of the most common DAX antipatterns and its effect on query memory usage: filtering on a whole table, rather than an individual column, in the filter parameters of the Calculate() function. A lot has already been written on this particular antipattern from the point of view of query performance – for example see here and here – but it’s only with the recent addition of the Execution Metrics Profiler/Log Analytics event that you can see how bad it is for memory usage too.

Here’s a simple example. Using a semantic model based on the UK Land Registry’s Price Paid data, with two dimension tables (Date and Property Type) and a fact table called Property Transactions with just over a million rows:

…let’s say that you write a measure that counts the number of property transactions for detached properties, ie counting the rows in the Property Transactions table where the Property Type column contains the value “D”. Here’s a first attempt at that:

Detached Count V1 =
CALCULATE (
    COUNTROWS ( 'Property Transactions' ),
    FILTER ( 'Property Transactions', 'Property Transactions'[Property Type] = "D" )
)

Notice that this measure follows the antipattern of filtering on the whole Property Transactions table with the Filter() function just to filter on the Property Type column.

Here’s a DAX query that shows the results of this measure for all Postcodes. It returns around 153000 rows to show the memory impact of the way the measure is written.

EVALUATE
SUMMARIZECOLUMNS(
    'Property Transactions'[Postcode],
    "Detached Count", [Detached Count V1]
)

Here’s what the Execution Metrics event returns for this query:

{
	"timeStart": "2024-06-28T15:41:59.951Z",
	"timeEnd": "2024-06-28T15:42:00.388Z",

	"durationMs": 438,
	"vertipaqJobCpuTimeMs": 47,
	"queryProcessingCpuTimeMs": 391,
	"totalCpuTimeMs": 438,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 22246,

	"commandType": 27,
	"queryDialect": 3,
	"queryResultRows": 152583
}

In this case the memory usage of the query, as given by the approximatePeakMemConsumptionKB metrci, is 22246KB or about 23MB.

Now consider this version of the measure:

Detached Count V2 =
CALCULATE (
    COUNTROWS ( 'Property Transactions' ),
    'Property Transactions'[Property Type] = "D"
)

Running the same DAX query as before but with this new measure you get the same results but with a much lower memory usage of 8734KB or 9MB:

{
	"timeStart": "2024-06-28T15:56:09.431Z",
	"timeEnd": "2024-06-28T15:56:09.602Z",

	"durationMs": 172,
	"vertipaqJobCpuTimeMs": 31,
	"queryProcessingCpuTimeMs": 125,
	"totalCpuTimeMs": 172,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 8734,

	"commandType": 27,
	"queryDialect": 3,
	"queryResultRows": 152583
}

Query duration and CPU usage is reduced too, but as I said above this highlights how different DAX patterns can result in very different memory footprints. Filtering on just the column you want to filter, rather than the whole table, is a lot more memory efficient! If you’re getting the “This visual has exceeded the available resources error because you’ve hit the Query Memory Limit then it’s worth checking to see if you have used this antipattern in your measures and rewriting accordingly.

Footnote: it’s worth mentioning that two other variations on the rewritten measure, one using the Property Type column on the Property Type dimension table instead of the Property Type column on the fact table:

Detached Count V3 =
CALCULATE (
    COUNTROWS ( 'Property Transactions' ),
    'Property Type'[Property Type] = "D"
)

…and one that added the KeepFilters() function (which is commonly used in this type of measure):

Detached Count V4 =
CALCULATE (
    COUNTROWS ( 'Property Transactions' ),
    KEEPFILTERS ( 'Property Type'[Property Type] = "D" )
)

…both had the same memory usage as the V2 version of the measure, 9MB.

Power BI Semantic Model Memory Errors, Part 4: The Query Memory Limit

Continuing my series on Power BI memory errors (see part 1, part 2 and part 3), in this post I’ll look at the query memory limit which controls the amount of memory that an individual query can consume. This is a subject which I’ve talked about in a few blog posts in the past: hitting this limit is one of the reasons you’ll see the “This visual has exceeded the available resourceserror which I blogged about here and and here. There’s also some official documentation here which is pretty good.

As I mentioned in part 1 of this series, when you run a DAX query against a Power BI semantic model it will use memory. How much memory it uses will depend on your data, how you have modelled it, what data your query is requesting and how you have written any DAX calculations used by your query. For obvious reasons we at Microsoft can’t let queries use an infinite amount of memory in the Power BI Service so there is a limit – the query memory limit – on how much memory an individual query can use. This limit varies depending on whether you are using Shared (also known as Power BI Pro) or Premium/Fabric capacity, and by the size/SKU of the capacity you’re using. The different limits for each SKU are documented here.

While you can’t increase the query memory limit over the published thresholds, capacity admins can reduce the query memory limit in the admin portal by setting the “Query Memory Limit %” property on the capacity.

The default value of this property is 0, which means the maximum value for the capacity SKU you’re using will be applied; any other value will be interpreted as a percentage of that maximum value. Reducing this value can be a good way to stop your developers from using inefficient DAX in their reports and consuming too many resources on your capacity – it won’t directly reduce a query’s CU usage but DAX expressions that are very memory-intensive often use a lot of CUs too.

If the DAX query generated by a visual in a Power BI report hits the query memory limit you’ll see an error in your report. The first reaction many Power BI developers have when they hit the limit is to ask if there is a way to increase it – which you can only do by buying a larger capacity – but to be brutally honest, if you are hitting the query memory limit you have done something wrong and you need to fix your model, your DAX calculations or your report to reduce memory usage. Doing this is likely to improve report performance too. If you’re encountering the error in Power BI Desktop you can raise the limit but this is only possible in Desktop and if you do this you may end up getting the error again after you publish.

Here’s the error you’ll see in Power BI Desktop if you hit the query memory limit:

This visual has exceeded the available resources. Try filtering to decrease the amount of data displayed.

If you hit the query memory limit in a published report you’ll see this error message:

In this case you not only get the “Visual has exceeded the available resources” error but going to the details shows a more helpful message that tells you what the current query memory limit is:

Resource Governing: This query uses more memory than the configured limit. The query — or calculations referenced by it — might be too memory-intensive to run. Either simplify the query or its calculations, or if using Power BI Premium, you may reach out to your capacity administrator to see if they can increase the per-query memory limit. More details: consumed memory 1 MB, memory limit 1 MB. See https://go.microsoft.com/fwlink/?linkid=2159752 to learn more.

The error number associated with this error is 0xC13E0004 (-1052901372).

How do you know how much memory your query is using to check how near you are to the query memory limit? The new Execution Metrics event in Profiler and Log Analytics will tell you. If you find the event associated with the Query End event for the query (something I discussed here) the approximatePeakMemConsumptionKB metric gives the peak memory usage of the query reached while it was running, and it is this value that must not exceed the Query Memory Limit.

{
	"timeStart": "2024-06-21T16:24:16.608Z",
	"timeEnd": "2024-06-21T16:24:18.326Z",

	"durationMs": 1719,
	"vertipaqJobCpuTimeMs": 1078,
	"queryProcessingCpuTimeMs": 531,
	"totalCpuTimeMs": 1609,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 46662,

	"commandType": 27,
	"queryDialect": 3,
	"queryResultRows": 502
}

If you can’t use Profiler or Log Analytics there’s no direct way of knowing how much memory is using. You can however set a custom memory limit in Power BI Desktop and keep changing it until you see, or don’t see, the query memory limit and therefore estimate how much memory your query is using.

How can you reduce the amount of memory that your queries use? That’s a question that’s too big to answer in any one blog post and this one is already long enough. In my next few posts I will look at some common causes of excessive memory usage and what you can do to fix them.