Memory Overhead Of Distinct Count Measures In Power BI

The series of blog posts I wrote last year on semantic model memory usage, in particular this post on the query memory limit and the “This query uses more memory than the configured limit” error in Power BI, gets a lot of traffic. Since writing that post on the query memory limit I’ve written a few follow-ups on common mistakes that lead to increased query memory usage, such as this one on measures that never return a blank. Today’s post is sort of in that series but it isn’t about a design mistake – it’s just to point out that distinct count measures can be surprisingly memory-hungry.

To illustrate this I built a semantic model consisting of a single table with two columns and 99,999,000 rows, published it and ensured the Large Semantic Model format was enabled:

I created two measures:

Number Of Rows = COUNTROWS('MyTable')
Distinct Customers = DISTINCTCOUNT(MyTable[CustomerID])

Here’s what the model metrics looked like in DAX Studio:

The total model size in memory was 255MB.

I then ran the following DAX query to get the number of rows in the table for each of the 1800 dates in the Date column:

EVALUATE
SUMMARIZECOLUMNS(
    MyTable[Date],
    "Number Of Rows", [Number Of Rows]
)

The approximatePeakMemConsumptionKB from the Execution Metrics Profiler event showed that this query consumed 2109KB – so not much.

Then I ran a similar query to get the number of distinct customer IDs by date:

EVALUATE
SUMMARIZECOLUMNS(
    MyTable[Date],
    "Distinct Customers", [Distinct Customers]
)

The approximatePeakMemConsumptionKB metric for this query was 800325KB – so a lot more than the previous query. In fact even though this model was well under the 1GB size limit for a model not in Premium capacity, the query here used a lot more memory (782MB) than the size of the model itself in memory and it came close to the 1GB limit on the amount of memory a query can consume when the model is not in Premium capacity.

Is there something wrong here? Can the query or model be tuned to reduce memory usage? Not really, no – distinct count queries are almost always more memory intensive than other types of measures. I tested a number of different things such as forcing the use of hash encoding on the CustomerID column, partitioning (Phil Seamark suggested creating one partition for each of the 1800 dates and actually that did reduce memory consumption but it also made the queries extremely slow), changing the ordering of the source data to change how well each column was compressed, calculating the distinct count using the SUMX method, and nothing resulted in lower query memory usage.

What I did find for the model above was that the number of rows returned by the query influenced the memory consumption of the query. So reducing the number of dates returned on rows in my DAX query from 1800 to 366 resulted in approximatePeakMemConsumptionKB going down to 200278KB. So if you’re running into memory errors when running queries with distinct count measures the first thing you should ask yourself is whether you need to show so many distinct counts: I recently ran into this problem with a customer that wanted to plot a line chart of distinct values with dates on the x axis, and we solved the problem by only plotting one day per week for the time period shown on the chart instead of every date. The chart looked almost identical, the DAX query was a lot faster and the memory usage of the DAX query was a lot lower. Distinct count measures combined with table visuals with lots of rows can be dangerous.

The other thing you can do is see if you can remodel your data to turn a distinct count into a count because, as shown above, counts are a lot faster and memory efficient than distinct counts. For example, if you have a fact table containing line items for orders and you need to find the distinct count of order ids, then consider creating a second fact table at the order granularity so you can count the number of rows in it to find the number of distinct orders. This may increase the size of your model but it should certainly reduce your query memory consumption for many queries because you won’t need to do a distinct count.

Limit The Impact Of Expensive Power BI Queries On Your Capacity By Reducing The Query Timeout

The recent announcement of Surge Protection gives Fabric/Power BI capacity admins a way to restrict the impact of background operations on a capacity, preventing them from causing throttling. However, at the time of writing, Surge Protection does not prevent users that are running expensive DAX or MDX queries – which are interactive operations – from causing problems on your capacity. Indeed, right now, there is no direct way to stop runaway queries from consuming a lot of CUs, although there is something you can do which will help a lot: reducing the query timeout.

Surge Protection doesn’t address the problem of expensive queries yet because Power BI only knows the CU usage of a DAX or MDX query when it has finished running – by which time it’s too late to do anything about it. In many cases, though, DAX or MDX queries that consume a lot of CUs are also slow. Therefore reducing the query timeout, which will kill any query that runs longer than a specified duration, will stop these queries from consuming so many CUs.

There are two default query timeouts that you should be aware of in Power BI. First, all DAX queries generated by a Power BI report have a 225 second timeout applied by the report itself. This timeout can be changed in Power BI Desktop but it cannot be changed on a published report in the Power BI Service. Second, you can set a timeout at the capacity level by changing the Query Timeout property in the admin portal. The default setting here is 3600 seconds (one hour). Unlike the first timeout, which only applies to the DAX queries generated by a Power BI report, this timeout applies to all queries run on any semantic model associated with the capacity, including the MDX queries generated by Excel PivotTables via Analyze In Excel. Setting this second timeout to less than 225 seconds means that it will take precedence over the first timeout. Therefore it’s the Query Timeout property on your capacity that you should set.

Hitting a timeout in a Power BI report will give the user a “Query has exceeded the available resources” error; clicking See Details/More Details will give you a message like this:

The XML for Analysis request timed out before it was completed. Timeout value: 10 sec.

Hitting the query timeout in an Excel PivotTable will give you the same message:

What value should you set the Query Timeout to? In my opinion no query should ever run for more than 30 seconds because anything slower will result in a poor experience for your end users – no-one wants to sit around for ages waiting for a report to render. I also think it should be possible to tune any semantic model so all queries run under 30 seconds if you know what you’re doing. That said, in the real world, setting a timeout of 30 seconds may be unrealistic: developers may not have the skills to tune their semantic models. As a result I find a timeout of 100 seconds is often a good compromise but you should experiment with different timeouts to see what the minimum value you can get away with is.

It’s important to note that reducing the query timeout will not stop every expensive query. This is because it’s perfectly possible to have very fast queries that consume a lot of CUs – for example when distinct count measures are used, and/or when there are very large data volumes and/or when there are complex but highly-optimised measures. Also there relatively rare cases where a query will carry on running beyond the duration specified by the timeout, because the Vertipaq engine only checks if the timeout has been exceeded at certain points in the code and depending on the query there could be several seconds (sometimes more) between these checks. Equally, some very slow queries may not use a lot of CUs and having them time out might cause unnecessary disruption. Overall, though, in my experience setting a timeout will stop enough expensive queries to make doing so worthwhile.

[Update: my colleague Akshai Mirchandani has just reminded me that you can also set the Query Timeout at the workspace level as a Server Property using SQL Server Management Studio, as detailed here. The property is called ServerTimeout. This gives you more flexibility than setting it for the whole capacity.]

Why DAX Is Better Than MDX For Bulk Extracts Of Data From Power BI

This is a post I’ve avoided writing for many years, and before I carry on let me make one thing clear:

Doing bulk extracts of data from a Power BI semantic model is a **really** bad idea

My colleague Matthew Roche wrote a great post on this topic a couple of years ago that is still relevant: using Power BI (or Analysis Services) as a data source for other systems, including other Power BI Import mode semantic models, is an anti-pattern. Power BI is optimised for small, analytical queries that return the amount of data that can be visualised on a single page. It is not optimised for queries that return millions of rows. Running this kind of query on a Power BI semantic model will be slow, is likely to run into timeouts and memory errors, and is also likely to cause CU spikes – and perhaps throttling – on a Premium capacity. If you want the data from a semantic model it’s much better to go back to the original data sources that the semantic model uses.

But

People still use Power BI semantic models as data sources all the time. This is either because they don’t know any better, because they can’t get access to the underlying data sources, or because they want to get the result of any DAX calculations on the model.

If you do need to extract large amounts of data from a semantic model I have one important piece of advice: write a DAX query to get the data and not an MDX query. There are two reasons for this:

Writing a DAX query to get granular data is usually a lot simpler than writing an MDX query
DAX queries that return large amounts of data are typically faster (and so less likely to hit timeouts), more CPU efficient (and therefore less likely to cause throttling on a capacity) and more memory efficient (and so less likely to cause memory errors)

The bad news is that the two client tools most often used to bulk extract data from Power BI, Excel PivotTables and Power Query using the Analysis Services connector and its query builder, generate MDX queries. What’s more, they don’t always generate the most efficient MDX queries either.

Let’s see an example. I have a semantic model in a Premium workspace with a table called Property Transactions with around a million rows in it. I connected to the model via the XMLA Endpoint using the “From SQL Server Analysis Services Database (Import)” option in Power Query in Excel:

…and then created a query to get the data from all the columns on the Property Transactions table plus one measure, called Count of Sales, using Power Query’s query builder:

While the query builder generated the MDX for me, you can see that it was not a simple query:

I ran a Profiler trace while this query ran and from the Execution Metrics I saw that:

The query took 54 seconds to complete
CPU Time was also 54 seconds
The approximate peak memory usage of the query was 626292KB

I then created a second Power Query query that used the following DAX query to get the same data, which I think you’ll agree is much more straightforward:

EVALUATE 
ADDCOLUMNS('Property Transactions', "Count of Sales", [Count of Sales])

[You have the option of entering a customer MDX or DAX query when you create your Power Query query]

This time, Execution Metrics showed me that:

The query took 6 seconds to complete
CPU Time was 6 seconds too
The approximate peak memory usage was 142493KB

So the DAX query was simple to write and maintain, took 11% of the time that the MDX query to run, used 11% of the CPU and 22% of the memory. That’s a big improvement. Even though I might be able to rewrite the MDX generated by Power Query to be more efficient there’s no way it would be as simple or as efficient as the DAX query.

[Thanks to Akshai Mirchandani for the information in this post]

TMDL View And Power BI Developer Productivity: An Example Using The Detail Rows Definition Property

For me the biggest new feature in the January 2025 release of Power BI Desktop is the new TMDL View; many other people like Marco are excited about it too. For more advanced Power BI developers (and honestly, I don’t think you need to be that advanced to get value out of it) it makes certain editing tasks for semantic models much simpler, and while I won’t be abandoning the main Power BI Desktop UI completely or stopping using external tools like Tabular Editor it is something I see myself using on a regular basis from now on.

One of the things it allows is the editing of semantic model properties and features that are not exposed by the Power BI Desktop UI but which are nonetheless supported by the engine, and which up to now you’d have had to use Tabular Editor to set. The announcement blog post mentions a few of these – perspectives (useful for the Personalize Visual feature for example) and the isAvailableInMdx property – but my favourite underused property is the Detail Rows Definition property of a measure, also known as Detail Rows Expression. If you have end users querying your model using Analyze In Excel it allows you to customise the results returned by an Excel PivotTable’s Show Details feature; as you might expect Marco and Alberto have an excellent, detailed article on it here. Setting this property allows you to control which columns and rows (so the correct rows are shown for non-trivial measures, as I described here) are returned, and if you can educate your users to use Show Details it can perform a lot better than a gigantic PivotTable to show detail-level data from your model.

What does the workflow for setting this property on a model in Power BI Desktop look like now? What benefits do TMDL View and all the other recent pro developer enhancements in Desktop bring for someone like me? Let’s say I have a measure called Count Of Sales in my semantic model and that I want to customise the columns and their names that are returned by Show Details for this measure. The Detail Rows Definition property takes a DAX expression that returns a table so the first step is to write that expression; now that we have DAX Query View I can do that without leaving Power BI Desktop and because I’m lucky enough to have access to Power BI Copilot (one of the perks of working for Microsoft) I can use that to write my DAX expression easily. I gave Copilot the prompt:

Write a query that gives me the Date, County, Town and Postcode columns columns from the Property Transactions table along with the Count of Sales measure. Rename the Date column to be "Sales Date".

…and it immediately gave me the DAX query I wanted without needing to faff around looking up the syntax to the SELECTCOLUMNS() function:

EVALUATE
  SELECTCOLUMNS(
    'Property Transactions',
    "Sales Date", 'Property Transactions'[Date], // Renaming Date column to Sales Date
    "County", 'Property Transactions'[County],
    "Town", 'Property Transactions'[Town],
    "Postcode", 'Property Transactions'[Postcode],
    "Count of Sales", [Count Of Sales] // Including the Count of Sales measure
  )

Next, I copied the DAX query minus the EVALUATE statement, switched over to the TMDL View pane, dragged and dropped the Count of Sales measure into a Script pane:

And then, underneath the existing measure definition, started typing detailRowsDefinition – the intellisense picked up what I was typing before I even had to finish:

I then tried to paste the DAX query into TMDL View and realised it didn’t like line breaks. Rather than trying to look up how to do this with the editor – which I’m sure is possible – I just switched back to DAX Query View, highlighted the query, entered the prompt:

format this query so there are no line breaks

…and it did the job for me! I copied the DAX table expression again and pasted it into TMDL View after the detailRowsDefinition property:

[I was pleasantly surprised that I didn’t have to escape any characters or mess around with double quotes]

Next I hit the Apply button and tested Analyze in Excel with a PivotTable and Show Details:

And bingo, I got the results I wanted:

Was all of this possible before? Absolutely. Is it much quicker and easier now with TMDL View, DAX Query View and Copilot? Absolutely. I’m a pretty experienced Power BI developer and I could certainly have written the DAX expression without Copilot, I have DAX Studio installed (which has pretty good query builder too) to write and test the query, and I have Tabular Editor to set the property. But being able to do all this just using Power BI Desktop makes me so much more productive.

Finding The Size Of All Of The Columns Touched By A DAX Query

I had meant to follow up my recent post on how to find the columns touched by a DAX query by writing one on how to use this technique to find the size of these columns in memory, so you can find the total size of the columns that need to be paged into memory when a DAX query runs on a Direct Lake semantic model. Before I could do that, though, my colleague Michael Kovalsky messaged me to say that not only had he taken the query from that first post and incorporated it in Semantic Link Labs, he’d done the work to get column sizes too. All that’s left for me to do, then, is give you some simple examples of how to use it.

To use Semantic Link Labs you just need to create a new Fabric notebook and install the library:

%pip install semantic-link-labs

After that you can use sempy_labs.get_dax_query_dependencies to get the columns touched by any DAX query, for example:

import sempy_labs as labs
labs.get_dax_query_dependencies(
        dataset = 'InsertSemanticModelName',
        workspace = 'InsertWorkspaceName',
        dax_string = "InsertDAXQuery",
    )

This returns a dataframe with one row for each column touched by the query, plus various statistics about the size of each column in memory.

If you’re working with a Direct Lake semantic model, though, in order to get the correct sizes of each column in memory the query itself will need to have been run beforehand; you can ensure that this happens by setting the optional parameter put_in_memory to True:

import sempy_labs as labs
labs.get_dax_query_dependencies(
        dataset = 'InsertSemanticModelName',
        workspace = 'InsertWorkspaceName',
        dax_string = "InsertDAXQuery",
        put_in_memory = True
    )

Last of all, if you don’t want a dataframe but just want a single number representing the total memory needed by all columns touched by a query, you can use sempy_labs.get_dax_query_memory_size, for example like this:

import sempy_labs as labs
labs.get_dax_query_memory_size(
        dataset = 'InsertSemanticModelName',
        workspace = 'InsertWorkspaceName',
        dax_string = "InsertDAXQuery"
    )

Yet more evidence that, for any Power BI user, Semantic Link and Semantic Link Labs are the best reasons for you to flip the switch to enable Fabric. To find out more about what they are capable check out this user group presentation.

Different Ways To Replace Blanks With Zeros In DAX

My post from a few months ago about the dangers of DAX measures that never return blank attracted quite a lot of attention – this is a hot topic on the forums and adding zeros to measures is a common cause of memory errors in Power BI. In that post, though, I didn’t talk about what the best way to replace blanks with zeros is if you absolutely have no choice but to do so. One of the comments on that post mentioned that visual calculations is an option and this is something I hadn’t thought about before; now, after conversing with the Gods of DAX (no, not the Italians, I mean Akshai, Marius and Jeffrey!) and doing some testing I can reveal that Visual Calculations can be a good choice sometimes, while more traditional DAX approaches are fine at other times.

Let’s see some examples. I created the following model using the AdventureWorksDW 2017 sample data:

There’s a Product dimension, a Customer dimension and a Date dimension plus a fact table containing sales data. The most important thing to note is that individual customers only buy a few products on a few dates. I also created two measures with the following definitions:

Sales Amount = SUM('FactInternetSales'[SalesAmount])

Mountain-100 Black 38 Sales =
CALCULATE (
    [Sales Amount],
    'DimProduct'[EnglishProductName] = "Mountain-100 Black, 38"
)

The [Sales Amount] measure just sums up the values in the SalesAmount column on the fact table; the [Mountain-100 Black 38 Sales] returns the value of [Sales Amount] for just one product.

Now consider a table visual showing LastName from the DimCustomer table, FullDateAlternateKey from DimDate and the [Sales Amount] and [Mountain-100 Black 38 Sales] measures:

There are a lot of rows here because every combination of LastName and FullDateAlternateKey where there is a value for [Sales Amount] is shown. Connecting Profiler to Power BI Desktop and capturing the Execution Metrics trace event (DAX Studio also shows this now) shows that this query has a peak memory consumption of 2063KB

{
	"timeStart": "2024-11-03T19:07:26.831Z",
	"timeEnd": "2024-11-03T19:07:26.844Z",

	"durationMs": 13,
	"vertipaqJobCpuTimeMs": 0,
	"queryProcessingCpuTimeMs": 0,
	"totalCpuTimeMs": 0,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 2063,

	"commandType": "Statement",
	"queryDialect": 3,
	"queryResultRows": 502
}

As you can see the [Mountain-100 Black 38 Sales] is mostly empty, and let’s say you need to replace the blanks in this column with zeros.

Changing the measure definition to add zero to the result of the Calculate(), as follows:

Mountain-100 Black 38 Sales =
CALCULATE (
    [Sales Amount],
    'DimProduct'[EnglishProductName] = "Mountain-100 Black, 38"
) + 0

Doesn’t do what you want because now you get a row in the table for every combination of LastName and FullDateAlternateKey, which means the rows which have non-zero values are hard to find:

Instead, only adding zero when there is a value for [Sales Amount], something like this:

Mountain-100 Black 38 Sales =
IF (
    NOT ( ISBLANK ( [Sales Amount] ) ),
    CALCULATE (
        [Sales Amount],
        'DimProduct'[EnglishProductName] = "Mountain-100 Black, 38"
    ) + 0
)

…does the trick. What does memory usage look like? Here are the Execution Metrics:

{
	"timeStart": "2024-11-03T19:17:22.470Z",
	"timeEnd": "2024-11-03T19:17:22.500Z",

	"durationMs": 30,
	"vertipaqJobCpuTimeMs": 0,
	"queryProcessingCpuTimeMs": 31,
	"totalCpuTimeMs": 31,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 3585,

	"commandType": "Statement",
	"queryDialect": 3,
	"queryResultRows": 502
}

Memory usage has increased only slightly, to 3585KB.

What about using Visual Calculations instead? Reverting to the original definition of the [Mountain-100 Black 38 Sales] measure and then creating a Visual Calculation like so:

No Blanks = [Mountain-100 Black 38 Sales]+0

…shows that this doesn’t solve the problem because again you get unwanted rows with no sales. Using:

No Blanks = 
IF (
    NOT ( ISBLANK ( [Sales Amount] ) ),
    [Mountain-100 Black 38 Sales] + 0
)

…does solve the problem and you can of course hide the original [Mountain-100 Black 38 Sales] measure column so it doesn’t appear in your table:

But Execution Metrics shows that memory usage is in fact a lot higher, at 11295KB, which is because the resultset now has one extra column in it and Visual Calculations make a copy of the original resultset in memory when they are calculated:

{
	"timeStart": "2024-11-03T19:31:22.858Z",
	"timeEnd": "2024-11-03T19:31:22.980Z",

	"durationMs": 122,
	"vertipaqJobCpuTimeMs": 0,
	"queryProcessingCpuTimeMs": 109,
	"totalCpuTimeMs": 109,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 11295,

	"commandType": "Statement",
	"queryDialect": 3,
	"queryResultRows": 502

Does this mean that Visual Calculations should never be used? No, not at all. Consider the following matrix visual which only contains the [Mountain-100 Black 38 Sales] measure and has LastName on rows and FullDateAlternateKey on columns:

Memory usage for this visual is 1091KB:

{
	"timeStart": "2024-11-03T19:51:02.966Z",
	"timeEnd": "2024-11-03T19:51:02.974Z",

	"durationMs": 8,
	"vertipaqJobCpuTimeMs": 0,
	"queryProcessingCpuTimeMs": 16,
	"totalCpuTimeMs": 16,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 1091,

	"commandType": "Statement",
	"queryDialect": 3,
	"queryResultRows": 139
}

The resultset returned for the DAX query used to populate this visual only contains one row for each combination of Last Name and Date that has a value for [Mountain-100 Black 38 Sales], 139 rows in all, but because a matrix visual is used to display the results this introduces all the blanks you can see in the screenshot. You could could try to replace these blanks with some very complex DAX but I’m not even going to try. Instead, Visual Calculations solve this problem very easily:

No Blanks Matrix = [Mountain-100 Black 38 Sales]+0

Here’s the matrix with [Mountain-100 Black 38 Sales] hidden and the Visual Calculation applied:

Execution Metrics reveal that peak memory consumption is just 2054KB and the number of rows returned is higher but still just 2070 rows:

{
	"timeStart": "2024-11-03T19:45:49.298Z",
	"timeEnd": "2024-11-03T19:45:49.337Z",

	"durationMs": 39,
	"vertipaqJobCpuTimeMs": 0,
	"queryProcessingCpuTimeMs": 31,
	"totalCpuTimeMs": 31,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 2054,

	"commandType": "Statement",
	"queryDialect": 3,
	"queryResultRows": 2070
}

Overall, both traditional DAX solutions and Visual Calculations are effective in different scenarios, so I suggest that you test the query performance and memory usage of different solutions yourself.

Finding The Columns Touched By A DAX Query

Some time ago I wrote a post about how you can use the DISCOVER_CALC_DEPENDENCY DMV to get a list of all the tables, columns and relationships referenced by a DAX query. That’s really useful and indeed the new INFO.CALCDEPENDENCY DAX function allows you to do the same thing without all the complexity of calling a DMV. However this doesn’t quite give you everything you need to solve one important problem: which columns are touched when a DAX query runs? The INFO.CALCDEPENDENCY function gives you a list of all the columns used directly in a DAX query and it also gives you a list of all the columns needed by all the measures used in a DAX query. However it only gives you a list of the relationships used and these relationships also rely on columns. In this post I’ll show you how to get a list of all the columns that are touched by a DAX query.

For illustration, consider the following model:

Note that the relationship between the Product table and the Sales table is inactive. The definitions of the three measures are as follows:

Units = SUM(Sales[SalesUnits])
Value = SUM(Sales[SalesValue])
Apples Value =
CALCULATE (
    [Value],
    KEEPFILTERS ( 'Product'[ProductName] = "Apples" ),
    USERELATIONSHIP ( 'Product'[ProductID], 'Sales'[ProductID] )
)

There is also a calculated column called CustomerID and Name defined as follows:

Customer ID and Name = 
"(" & 'Customer'[CustomerID] & ") " & Customer[CustomerName]

Now consider the following visual:

The DAX query for this is as follows:

// DAX Query
DEFINE
	VAR __DS0Core = 
		SUMMARIZECOLUMNS(
			ROLLUPADDISSUBTOTAL('Country'[CountryName], "IsGrandTotalRowTotal"),
			"Apples_Value", 'Sales'[Apples Value]
		)

	VAR __DS0PrimaryWindowed = 
		TOPN(502, __DS0Core, [IsGrandTotalRowTotal], 0, [Apples_Value], 0, 'Country'[CountryName], 1)

EVALUATE
	__DS0PrimaryWindowed

ORDER BY
	[IsGrandTotalRowTotal] DESC, [Apples_Value] DESC, 'Country'[CountryName]

Passing this query to INFO.CALCDEPENDENCY like so:

EVALUATE
VAR Source_Query =
"
DEFINE
	VAR __DS0Core = 
		SUMMARIZECOLUMNS(
			ROLLUPADDISSUBTOTAL('Country'[CountryName], ""IsGrandTotalRowTotal""),
			""Apples_Value"", 'Sales'[Apples Value]
		)

	VAR __DS0PrimaryWindowed = 
		TOPN(502, __DS0Core, [IsGrandTotalRowTotal], 0, [Apples_Value], 0, 'Country'[CountryName], 1)

EVALUATE
	__DS0PrimaryWindowed

ORDER BY
	[IsGrandTotalRowTotal] DESC, [Apples_Value] DESC, 'Country'[CountryName]
"
VAR All_Dependencies =
INFO.CALCDEPENDENCY(
	"QUERY",
	Source_Query
)
RETURN All_Dependencies

[Note that all double quotes in the query being passed in have been escaped]

…returns all the measures, tables, columns and relationships used:

It knows that the Apples Value measure needs the SalesValue column and the ProductName column and it knows that the relationships between the Country table and the Sales table, and the Product table and the Sales table, are also required but it doesn’t tell you which columns those relationships in turn need. Luckily the new INFO.VIEW.RELATIONSHIPS() function can be used to get that extra information easily.

Here’s a more complex DAX query that returns a list of all the measures touched by a DAX query, including those needed by all the measures and all the relationships needed by that query:

EVALUATE
VAR Source_Query =
"
DEFINE
	VAR __DS0Core = 
		SUMMARIZECOLUMNS(
			ROLLUPADDISSUBTOTAL('Country'[CountryName], ""IsGrandTotalRowTotal""),
			""Apples_Value"", 'Sales'[Apples Value]
		)

	VAR __DS0PrimaryWindowed = 
		TOPN(502, __DS0Core, [IsGrandTotalRowTotal], 0, [Apples_Value], 0, 'Country'[CountryName], 1)

EVALUATE
	__DS0PrimaryWindowed

ORDER BY
	[IsGrandTotalRowTotal] DESC, [Apples_Value] DESC, 'Country'[CountryName]
"
VAR All_Dependencies =
INFO.CALCDEPENDENCY(
	"QUERY",
	Source_Query
)
VAR Referenced_Cols = 
SELECTCOLUMNS(
	FILTER(
		All_Dependencies, 
		[REFERENCED_OBJECT_TYPE] IN {"COLUMN", "CALC_COLUMN"} 
		),
	"FullyQualifiedName",
	"'" & [REFERENCED_TABLE] & "'[" & [REFERENCED_OBJECT] & "]"
)
VAR Referenced_Relationships =
SELECTCOLUMNS(
FILTER(
	All_Dependencies, 
	[REFERENCED_OBJECT_TYPE] IN {"ACTIVE_RELATIONSHIP", "RELATIONSHIP"} ),
"Name",
[REFERENCED_OBJECT])
VAR Relationship_Details =
FILTER(
	INFO.VIEW.RELATIONSHIPS(),
	[Name] IN Referenced_Relationships
)
VAR Relationship_Columns = 
UNION(
	SELECTCOLUMNS(
		Relationship_Details,
		"FullyQualifiedName",
		"'" & [FromTable] & "'[" & [FromColumn] & "]"
	),
		SELECTCOLUMNS(
		Relationship_Details,
		"FullyQualifiedName",
		"'" & [ToTable] & "'[" & [ToColumn] & "]"
	)
)
VAR All_Referenced_Columns = 
UNION(
	Referenced_Cols,
	Relationship_Columns
)
RETURN 
All_Referenced_Columns

Here’s what it returns:

As you can see, this returns a list of all the columns used directly by the query, any measures referenced by the query, and all the relationships involved too.

You can use this query very easily in DAX Query View: just paste it in, then capture the DAX query generated by your own visuals using Performance Analyzer, escape any double quotes and paste it into the Source_Query variable in the query above.

Here’s one more example of how it works, a visual that touches a lot more columns:

I won’t post the full DAX query of the visual or my query again, but here’s the list of columns returned by my query:

One thing to notice about this is that while it shows the calculated column [Customer ID and Name] is used, it doesn’t break down this column’s dependencies and therefore doesn’t list the [CustomerName] column. This is a deliberate decision on my part.

Why is all this useful? The obvious answer is that it would allow you to see whether a query would break or return a different result if you deleted a column from your model. More interestingly – and this is something I need to research more – given that in Direct Lake mode columns are paged into memory when needed by a query, and given that it’s possible to use other DMVs/INFO functions to get the size of a column in memory (for example as DAX Studio’s Model Metrics feature does), it should be possible to use this approach to find the total memory used by all the columns needed by a DAX query on a Direct Lake model. This would be very helpful when diagnosing memory errors in Direct Lake mode.

What Can You Do With Copilot In DAX Query View?

On my LinkedIn feed this morning I saw an amazing demo from Brian Julius showing how you can use AI to generate code for Python visuals in Power BI, immediately followed by this video from Marco Russo warning us not to believe all the hype about AI. I guess I, like most people, am waiting to see what turns out to be useful in the real world as a result of all of the current investment in AI. Certainly every Power BI customer I talk to these days is very interested in Copilot and the focus always seems to be using Copilot to understand the data shown on a report or to ask questions about the data in a semantic model. That’s understandable because these are the features that will benefit the largest number of users. However, I think the often-overlooked Copilot in DAX Query View is where you’ll get the most value today.

I have to admit that I’m a relatively late convert to DAX Query View, probably because I’ve been a hardcore DAX Studio user for so long. The ease with which you can write, test and update the definitions of measures in your model with DAX Query View is what lured me over and once I was there I started testing its Copilot features too. The documentation gives you the basics of what is possible with Copilot in DAX Query View but it turns out it’s capable of a lot more. So what are the practical benefits of Copilot in DAX Query View?

Writing measures

DAX is incredibly powerful but it’s also hard to learn. Up until recently if you were struggling to write a DAX measure you’d probably go to your favourite search engine, search for blog posts or questions on forums that seemed to describe what you wanted to achieve, copy some code from them and hope you can adapt it to your own needs. Nowadays more and more people are using LLMs to write DAX and while this doesn’t replace the need to learn the language (don’t throw away that copy of “The Definitive Guide to DAX” yet) it is a faster way to get working code than using search engines alone. Copilot in DAX Query View gives you this experience more conveniently right inside Power BI Desktop and with none of the security and privacy concerns about using a free public AI service. It also knows about the structure of your semantic model without you needing to explicitly pass that information in through your prompts, leading to better quality results.

Let’s see an example. Consider the following model:

In DAX Query View, the following Copilot prompt

Write a measure that returns Order Amount for the employee Gabi and show that measure in a query broken down by product

…returns the following measure definition and DAX query:

DEFINE
  MEASURE 'Orders'[Gabi's Order Amount] =
    CALCULATE(
      [Order Amount],
      'Employee'[Employee Name] = "Gabi"
    )

EVALUATE
  SUMMARIZECOLUMNS(
    'Product'[Product Name],
    "Gabi's Order Amount", [Gabi's Order Amount]
  )

As you can see, it’s successfully generated the measure definition I wanted and also written a DAX query that shows the results. I’ll admit it’s not the most complex measure in the world but all the same I know plenty of users find Calculate() hard to understand and would find this very helpful. I will also admit that the quality of the results is heavily dependent on whether you have followed best practices for Power BI semantic model design.

Explain DAX topics and apply them to your code

Looking at the measure Copilot just wrote you might think “should I be using the KeepFilters() function here?”. Entering the prompt:

Explain the DAX KeepFilters function

…gives you a nice explanation of what the KeepFilters() function does:

That’s great, but I could get that from a search engine. What is impressive is that the explanation includes a version of the measure above rewritten to use KeepFilters():

DEFINE
  MEASURE 'Orders'[Gabi's Order Amount] =
    CALCULATE(
      [Order Amount],
      KEEPFILTERS('Employee'[Employee Name] = "Gabi")
    )

You can then prompt:

Rewrite the "Gabi's Order Amount" measure so that it uses the DAX KeepFilters function

…and see the rewritten version side-by-side with the original:

…so you can see what exactly has changed and decide whether to accept the change or not.

Explain DAX code

Taking over the ownership of a semantic model that someone else built and understanding their code is tough. Copilot in DAX Query View can help here too. Let’s say you have the following measure in your model:

Avg Order Amount = 
 //Chris said to use the Divide() function
 //to avoid possible division by zero
 DIVIDE(
    [Order Amount],
    [Order Count]
 )

The prompt:

Explain the avg order amount measure

…gives you an explanation of what the measure, and all of the measures it references, does:

It takes code comments as well as the code itself into account and although at the time of writing it doesn’t look at the description properties of your tables, columns or measures it will soon.

Suggest improvements to your measures

Let’s say you have a measure with the following definition:

Order Amount Apples = 
CALCULATE(
    [Order Amount],
    FILTER('Product', 'Product'[Product Name]="Apples")
)

An experienced Power BI developer will read this and immediately hear an (Italian accented) voice in their head saying “Filter columns not tables!” even if in this case the query optimiser will ensure this measure performs well. Most Power BI developers will not see a problem though, and while I’m fairly sure that Tabular Editor’s best practices analyser would also pick up the problem not everyone knows about or is allowed to use Tabular Editor.

In DAX Query View you can enter the prompt:

Optimise the Order Amount Apples measure and explain the suggested changes

…and get a pretty good summary of how the measure could be improved to avoid the use of the Filter() function:

Conclusion

Copilot in DAX Query View is not magic. It’s not perfect and it will hallucinate or make bad suggestions occasionally, although I didn’t cherry-pick any of its output while writing this post and it is getting better and better at an impressive rate. It will not replace all the other tools and techniques you use today to write DAX and it will not put Marco and Alberto out of a job. But there’s a saying inside Microsoft that “Copilot is not a pilot”, which sums up nicely how to think about it: Copilot won’t do your job for you but will help you do your job faster and better. As these examples demonstrate, Copilot in DAX Query View can provide a significant productivity boost, especially for less experienced Power BI developers. If you’re the Power BI admin for a large organisation with a lot of self-service usage then I think that productivity boost more than justifies the cost of the Fabric capacity required to run it.

[And yes, before you comment, I know that the fact Power BI Copilot is only available for F64/P1 and above capacities and has geographic restrictions isn’t ideal, but trust me we’re working on ways to make it more affordable and more widely available. We want as many people as possible to use it but we can’t give it away for free. If you’re comparing this with a free service then remember the old saying: if a product is free then you are the product. You may be ok with that but your employer’s lawyers may not be.]

Why Power BI Table Visuals With Scrollbars Can Cause Problems

Concluding my series of blog posts on seemingly harmless things you can do in a Power BI report that can lead to performance problems and the “This visual has exceeded the available resources” error (see also “Calculate(), Filter() and DAX memory usage” and “DAX measures that never return blank“), in this post I’ll show how table visuals with vertical scrollbars that potentially show thousands of rows can be a Bad Thing.

Using the Import mode semantic model I’ve used in the last two posts, consider this table visual:

The two measures displayed are trivial counts and averages. However there are 538753 combinations of County and Postcode that have data, and so in theory the table can display 538753 rows if you scroll down far enough. That can’t be good, can it?

Those of you who know how Power BI generates DAX queries for table visuals like this might disagree, though. Here’s the DAX query for this table visual:

DEFINE
	VAR __DS0Core = 
		SUMMARIZECOLUMNS(
			'Property Transactions'[County],
			'Property Transactions'[Postcode],
			"Count_Of_Sales", 'Property Transactions'[Count Of Sales],
			"Average_Price_Paid", 'Property Transactions'[Average Price Paid]
		)

	VAR __DS0PrimaryWindowed = 
		TOPN(501, __DS0Core, 'Property Transactions'[County], 1, 'Property Transactions'[Postcode], 1)

EVALUATE
	__DS0PrimaryWindowed

ORDER BY
	'Property Transactions'[County], 'Property Transactions'[Postcode]

As you can see, the query contains a TOPN filter which means that it actually only returns the first 501 rows out of those potential 538753. It’s only if you scroll down far enough that another query is triggered to get the next 501 rows. So maybe it’s not so bad?

Actually it can be bad, even with the TOPN filter. Here are the Execution Metrics for the query above:

{
	"timeStart": "2024-07-10T22:54:30.330Z",
	"timeEnd": "2024-07-10T22:54:31.064Z",

	"durationMs": 734,
	"vertipaqJobCpuTimeMs": 219,
	"queryProcessingCpuTimeMs": 516,
	"totalCpuTimeMs": 734,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 66599,

	"commandType": 27,
	"queryDialect": 3,
	"queryResultRows": 501
}

Notice the durationMS and approximatePeakMemConsumptionKB metrics, which tell you how long the query takes to run and how much memory it uses.

Now consider the following version of a report with a slicer for the County column and the county “Bath and North East Somerset” selected.

Here’s the DAX query for the table visual now:

DEFINE
	VAR __DS0FilterTable = 
		TREATAS({"BATH AND NORTH EAST SOMERSET"}, 'Property Transactions'[County])

	VAR __DS0Core = 
		SUMMARIZECOLUMNS(
			'Property Transactions'[County],
			'Property Transactions'[Postcode],
			__DS0FilterTable,
			"Count_Of_Sales", 'Property Transactions'[Count Of Sales],
			"Average_Price_Paid", 'Property Transactions'[Average Price Paid]
		)

	VAR __DS0PrimaryWindowed = 
		TOPN(501, __DS0Core, 'Property Transactions'[County], 1, 'Property Transactions'[Postcode], 1)

EVALUATE
	__DS0PrimaryWindowed

ORDER BY
	'Property Transactions'[County], 'Property Transactions'[Postcode]

As you would expect, this query now contains a filter on the county selected in the slicer. It also contains a TOPN filter but the interesting thing is that since there are more than 501 postcodes in the selected county, the query returns exactly the same rows as before. Of course you can’t now scroll down and see data for other counties than the one selected in the slicer but you can always select a different county in the slicer.

Here are the Execution Metrics for the new query:

{
	"timeStart": "2024-07-10T22:58:29.725Z",
	"timeEnd": "2024-07-10T22:58:29.741Z",

	"durationMs": 16,
	"vertipaqJobCpuTimeMs": 0,
	"queryProcessingCpuTimeMs": 16,
	"totalCpuTimeMs": 16,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 2564,

	"commandType": 27,
	"queryDialect": 3,
	"queryResultRows": 501
}

Notice that both the duration and peak memory usage of the query are both much, much lower: duration has gone from 734ms to 16ms, while peak memory usage has gone from 66599KB to 2564KB. If you have more measures which are more complex and/or inefficiently written the difference could be much bigger.

Why is there such a big difference? Well if you look at the physical query plan generated for both queries you can see that before the TOPN filter is applied, the Vertipaq engine still has to spool the entire, unfiltered table. In the first query this unfiltered table has 538753 rows, which is why it’s relatively slow and memory hungry, whereas in the second query this table, with the County slicer, has only 1869 rows (the number of postcodes in the selected county). As a result slicing on County, even though it doesn’t affect which rows are returned after TOPN filter is applied, leads to a more efficient query. If you’re using DirectQuery mode something similar happens because, as I noted here, the TOPN filter can only be pushed back to the relational source in a few basic scenarios which means the SQL queries generated can return a very large number of rows and which can hit the Max Intermediate Row Set Count limit.

What can you learn from this? Table visuals that can display a large number of rows should be avoided. I’m not report design guru but I would go so far as to say that any table with a scrollbar on is a mistake: if your user needs to scroll down to see all the rows or scroll right to see all the columns, why not use a slicer to allow them to select the rows they want to see more easily or use field parameters to allow them to select the columns they want to see? Not only will this be better from a usability point of view, performance will be a lot better and memory usage will be lower. And yes, I know that your end users ask for tables with lots of rows and columns because that’s what they are used to seeing in Excel, but giving them what they want risks slow performance and memory errors. So they need to make a choice.

DAX Measures That Never Return Blank

Following on from my earlier post on the Query Memory Limit in Power BI, and as companion to last week’s post on how a DAX antipattern using Calculate() and Filter() can lead to excessive memory consumption by queries (and therefore lead to you hitting the Query Memory Limit), in this post I want to look at the effects of another DAX antipattern on performance and memory usage: measures that can never return a blank value.

Adding zero to the result returned by a measure

What do I mean by a measure that never returns blank? Well using the same model that I used in last week’s post, let’s say I have a simple measure that counts the rows in my fact table:

Count Of Sales = COUNTROWS('Property Transactions')

I can use this measure in a matrix visual with Property Type on columns and Postcode on rows, like so:

Here’s what Execution Metrics says about the DAX query for this visual:

{
	"timeStart": "2024-07-03T10:16:16.251Z",
	"timeEnd": "2024-07-03T10:16:19.345Z",

	"durationMs": 3094,
	"vertipaqJobCpuTimeMs": 188,
	"queryProcessingCpuTimeMs": 2656,
	"totalCpuTimeMs": 2844,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 179404,

	"commandType": 27,
	"queryDialect": 3,
	"queryResultRows": 230
}

And here’s what DAX Studio’s Server Timings shows:

It’s not fast (there are a lot of Postcodes even though the query has a Topn N filter and only returns 230 rows – that’s a subject for another blog post though) but the thing to look at is the peak memory consumption which is 179404KB or 175MB and the fact that there is just one Storage Engine query.

Now let’s say your boss says she hates all that white space in the matrix and wants to see zeroes instead. The obvious way to fix this is to add a zero to the result of the count in the measure, like so:

Count Of Sales = COUNTROWS('Property Transactions')+0

It’s a minor change so what could go wrong? Here’s what Execution Metrics shows after making that change:

{
	"timeStart": "2024-07-03T10:22:01.343Z",
	"timeEnd": "2024-07-03T10:22:10.390Z",

	"durationMs": 9047,
	"vertipaqJobCpuTimeMs": 313,
	"queryProcessingCpuTimeMs": 8578,
	"totalCpuTimeMs": 8891,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 379432,

	"commandType": 27,
	"queryDialect": 3,
	"queryResultRows": 612
}

And here are the Server Timings:

The query is now three times slower, has three Storage Engine queries, and uses 370MB of memory – twice as much as before! This increase in memory can easily push you over the Query Memory Limit and result in the “This visual has exceeded the available resources” error, as this real-life example shows.

Why does this change make such a big difference? Adding zero to the result of the count results in the creation of a “dense” measure, that’s to say a measure that never returns a blank value. The white space in the first visual was the result of the measure not returning any values, whereas in the second visual the measure is returning values for every cell in the matrix. What is a blank value? I strongly recommend you read Marco’s article here for a detailed introduction to this subject; blanks are somewhat similar to nulls in SQL but there are some very important differences, and in this case the thing to point out is that the expression BLANK()+0 returns 0 in DAX. As a result a measure which used to return a blank value in some cases now never returns a blank. This can be extremely bad for DAX performance and memory usage because the Vertipaq engine inside Power BI can do a lot of optimisations when it knows a measure will return blank in some circumstances. Nikola Illic has a great post here going into more detail about this.

Using FORMAT() in measures

There are a few other common ways you can run into the same problem. One is the use of the FORMAT() function to return a formatted string from a measure, for example:

Count Of Sales = FORMAT(COUNTROWS('Property Transactions'),"Standard")

Here’s what Execution Metrics shows for the visual above with the new version of the measure:

{
	"timeStart": "2024-07-04T09:48:43.655Z",
	"timeEnd": "2024-07-04T09:48:53.499Z",

	"durationMs": 9844,
	"vertipaqJobCpuTimeMs": 281,
	"queryProcessingCpuTimeMs": 9359,
	"totalCpuTimeMs": 9656,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 380480,

	"commandType": 27,
	"queryDialect": 3,
	"queryResultRows": 612
}

As you can see, the duration and memory usage is very similar to the version of the measure above that added a zero to the result of the COUNT(): slow and memory hungry.

The FORMAT() function returns a text value that represents a formatted version of the value passed in; blank values are converted to empty strings, so again this new version of the measure can never return a blank value. My recommendation is to always try to avoid using FORMAT() in measures because of the impact it can have on performance. Instead, try to use format strings (for more examples see my posts here and here) or dynamic format strings (you should also read the SQLBI article on dynamic format strings to be aware of the implications of using them) wherever possible. It’s a shame that format strings don’t have built in support for formatting blank values; I’m told there are some good technical reasons why this hasn’t happened yet.

IFERROR() and DIVIDE()

Some other DAX functions can also trigger this performance issue. The IFERROR() or ISERROR() functions are one example, since they allow you to trap errors in DAX expressions and return a value instead. However I can say quite confidently that you should never, ever use IFERROR() or ISERROR() when writing DAX anyway and there is some guidance documentation to back me up on this.

Another example which is less obviously bad is the use of the third parameter of the DIVIDE() function. As you probably know, the DIVIDE() function allows you to handle division by zero and division by blank safely, and by default it will return a blank value when division by zero or division by blank is detected. However there is an optional third parameter which allows you to return a value instead of a blank when dividing by zero or blank, and this can also trigger the problem we’re looking at.

Consider the following measure:

Division Test = 
DIVIDE(
    1,
    COUNTROWS('Property Transactions')
)

Here it is used in our visual:

Here are the Execution Metrics:

{
	"timeStart": "2024-07-04T11:42:29.636Z",
	"timeEnd": "2024-07-04T11:42:32.386Z",

	"durationMs": 2750,
	"vertipaqJobCpuTimeMs": 234,
	"queryProcessingCpuTimeMs": 2516,
	"totalCpuTimeMs": 2750,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 210960,

	"commandType": 27,
	"queryDialect": 3,
	"queryResultRows": 230
}

Now, here’s the same measure with the third parameter of DIVIDE() set:

Division Test = 
DIVIDE(
    1,
    COUNTROWS('Property Transactions'),
    0
)

And here are the Execution Metrics:

{
	"timeStart": "2024-07-04T11:44:24.997Z",
	"timeEnd": "2024-07-04T11:44:32.247Z",

	"durationMs": 7250,
	"vertipaqJobCpuTimeMs": 359,
	"queryProcessingCpuTimeMs": 6891,
	"totalCpuTimeMs": 7250,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 379436,

	"commandType": 27,
	"queryDialect": 3,
	"queryResultRows": 612
}

Once again, the second version of the measure that never returns a blank is slower and uses more memory.

Summary

As you have seen, Power BI measures that can never return a blank value can be very bad for performance and query memory usage – so as a result, when you are writing DAX you should always be careful to avoid doing this.

Category: DAX

Memory Overhead Of Distinct Count Measures In Power BI

Like this:

Limit The Impact Of Expensive Power BI Queries On Your Capacity By Reducing The Query Timeout

Like this:

Why DAX Is Better Than MDX For Bulk Extracts Of Data From Power BI

Like this:

TMDL View And Power BI Developer Productivity: An Example Using The Detail Rows Definition Property

Like this:

Finding The Size Of All Of The Columns Touched By A DAX Query

Like this:

Different Ways To Replace Blanks With Zeros In DAX

Like this:

Finding The Columns Touched By A DAX Query

Like this:

What Can You Do With Copilot In DAX Query View?

Like this:

Why Power BI Table Visuals With Scrollbars Can Cause Problems

Like this:

DAX Measures That Never Return Blank

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: