Power BI, Parallelism And Dependencies Between SQL Queries In DirectQuery Mode

This is going to sound strange, but one of the things I like about tuning Power BI DirectQuery semantic models is that their generally-slower performance and the fact you can see the SQL queries that are generated to get data makes it much easier to understand some of the innermost workings of the Power BI engine. For example this week I was trying to tune a DAX query on a DirectQuery model using DAX Studio and the Server Timings showed me something like this:

As I described here, Power BI can send SQL queries in parallel in DirectQuery mode and you can see from the Timeline column there is some parallelism happening here – the last two SQL queries generated by the DAX query run at the same time – but everything has to wait for that first SQL query to complete. Why? Can this be tuned?

Here’s the scenario that produced the query above. I have a DirectQuery semantic model built from the ContosoDW SQL Server sample database:

There are three base measures defined:

Distinct Customers = DISTINCTCOUNT(FactOnlineSales[CustomerKey])
January Customers =
CALCULATE([Distinct Customers],
KEEPFILTERS('DimDate'[CalendarMonthLabel]="January"))
Monday Customers =
CALCULATE([Distinct Customers],
KEEPFILTERS('DimDate'[CalendarDayOfWeekLabel]="Monday"))

Note that these measures are written specifically to prevent fusion from taking place: each measure generates a separate SQL query. Here’s what DAX Studio’s Server Timings shows for the DAX query generated for the table shown above:

As you can see, the three SQL queries generated by the DAX query are run in parallel.

Now consider the following measure:

IF Test = IF([Distinct Customers]>3000, [January Customers], [Monday Customers])

Here’s what this measure returns:

If you run the query generated for this visual in DAX Studio, Server Timings shows what I showed in the first screenshot in this post:

The last two substantial SQL queries, on lines 4 and 5, can only run when the first SQL query, on line 1, has finished. The details of SQL queries tell you more about what’s going on here. The first SQL query, on line 1, just gets the values for the [Distinct Customers] measure for all rows in the table:

The WHERE clauses for the SQL queries on line 4:

..and line 5:

…show that these last two queries only get the values for the [January Customers] and [Monday Customers] measures for the rows where the [IF Test] measure needs to display them. And this explains why the first SQL query has to finish before these last two SQL queries can be run: the WHERE clauses of these last two SQL queries are constructed using the results returned by the first SQL query.

There is another way of evaluating the IF condition in the [IF Test] measure. Instead of “strict” evaluation, where the engine only gets the value of [January Customers] for the rows in the table where [Distinct Customers] is greater than 3000 and only gets the value [Monday Customers] for the remaining rows, it can get values for [January Customers] and [Monday Customers] for all rows in the table and then throw away the values it doesn’t need. This is “eager” evaluation and as you would expect, Marco and Alberto have a great article explaining strict and eager evaluation here that is worth reading; Power BI can decide to use either strict or eager evaluation with the IF function depending on which one it thinks will be more efficient. However you can force the use of eager evaluation by using the IF.EAGER DAX function instead of IF:

IF EAGER Test =
IF.EAGER([Distinct Customers]>3000, [January Customers], [Monday Customers])

Here’s what Server Timings shows for the DAX query that uses IF.EAGER:

As you can see, the use of IF.EAGER means that the three substantial SQL queries generated by Power BI for this DAX query can now be run in parallel because there are no dependencies between them: they get the values of [Distinct Customers], [January Customers] and [Monday Customers] for all rows in the table. However, even though these three SQL queries are now run in parallel, it doesn’t result in any performance benefits here because it looks like the three queries are slower as a result of all being run at the same time. Power BI has made the right call to use strict evaluation with the IF function in this case but if you see it using strict evaluation I think it’s worth experimenting with IF.EAGER to see if it performs better – especially in DirectQuery mode where Power BI knows less about the performance characteristics of the database you’re using as your data source.

[Thanks to Phil Seamark for helping me understand this behaviour]

New Books: “The Definitive Guide To DAX” 3rd Edition And “Microsoft Power BI Visual Calculations”

For some reason I haven’t had any free copies of books to review recently; maybe the market for tech books has finally collapsed with AI? Books are still being published though and luckily, as someone who once published a book via an O’Reilly imprint, I have a lifetime subscription to O’Reilly online learning which gives me free access to all the tech books I ever need. Two books were published in the last few months that I was curious to read: the third edition of “The Definitive Guide To DAX” by my friends Marco Russo and Alberto Ferrari, and “Microsoft Power BI Visual Calculations” by my colleague Jeroen ter Heerdt, Madzy Stikkelorum and Marc Lelijveld. As I’ve said many times, I don’t write book reviews here (least of of reviews of books by friends or colleagues where I could never be unbiased), but I think there’s some value sharing my thoughts on these books.

“The Definitive Guide To DAX”, 3rd Edition

It’s generally accepted that the one book that anyone who is serious about Power BI should own is “The Definitive Guide To DAX”. If you don’t already own a copy you should buy one, but since most people who read my blog probably have one already the more interesting question to ask is what’s new in the third edition and whether it’s worth upgrading – especially since I’d seen Marco say that the book had been completely rewritten. I’ve heard the “completely rewritten” line before and I was sceptical but it turns out that it really is a very different book. It’s not completely rewritten because there is material there from previous editions but there are a lot of changes.

First of all, as you would expect, all the new additions to DAX since the second edition was published are covered including user defined functions, visual calculations, calendar-based time intelligence functionality and window functions. These are all really important features you will want to use in your semantic models and reports so this is the main reason you’d want to buy a copy of this edition.

Secondly, the main (and justfied) criticism of the previous editions was that they were, as we say in the UK, “heavy going”. They had absolutely all the information you would ever need but they were not the easiest books to read or understand. That has been addressed in the third edition: the tone is a little bit more friendly and difficult concepts are now explained visually as well as in text. As a result it’s easier to recommend the book for beginners.

Thirdly, some advanced topics (for example around performance tuning) have been dropped. For example I searched for the term “callback” in this new edition and found no mentions; that’s not true of the second edition. I have mixed feelings about this because it means the book isn’t as “definitive” as it used to be, but I can understand why it’s happened: with so much new content to add, keeping these advanced topics would have made an already long book too long. And let’s be honest, how often do you look at the details of a DAX query plan? If the aim is to teach DAX then cutting content means it’s easier for the reader to focus on the core concepts.

In summary, then, another great piece of work from Marco and Alberto and worth buying even if you have a copy of an earlier edition.

“Microsoft Power BI Visual Calculations”

A whole book about visual calculations? As I mentioned above, they’re covered in one chapter of “The Definitive Guide To DAX” but that book focuses on DAX; this one takes more time to explain the concepts and, crucially, includes a lot of practical examples of how to use them. Like user-defined functions, when visual calculations were released there was an explosion of community content showing how they can be used to solve problems that were difficult to solve in Power BI before – problems that no-one could have been anticipated that would be solved with visual calculations. The real value of this book is showing how to build a bump chart or a tornado chart with visual calculations and that makes it worth checking out.

Closing thoughts: why buy a book?

As you would expect, a lot of the information contained in these books is already available for free somewhere on the internet. And with AI you don’t even need to know how to search for it or stitch it all together – you can ask a question and get an answer customised to your exact scenario. So why buy books any more? I guess it depends on whether you only want to get your problems solved or understand how to solve problems yourself. For me (even though my attention span has eroded in recent years, just like everyone else’s) the only way to grasp really difficult concepts is through long-form written explanations or training courses, not fragments found in blog posts or 10-minute videos. I suspect that AI is the final nail in the coffin of the tech publishing industry but the tech book industry not being viable any more is not the same thing as tech books not being useful any more. Or maybe I’m just old-fashioned.

Calculate(), DAX Fusion And Filters On 0 In Power BI

Here’s a fun DAX performance tip that I found this week. Do you have measures that use Calculate() with a filter on a numeric column? Is one of the filters on the value 0? If so then this may affect you.

As always, a simple example is the best way of explaining the problem and the solution. Consider the following table in an Import mode semantic model:

Here are some measures that reference that table:

'Sales Amount' = SUM('Sales'[SalesAmount])
'Apples Sales' = CALCULATE([Sales Amount], Sales[ProductID]=0)
'Oranges Sales' = CALCULATE([Sales Amount], Sales[ProductID]=1)
'Pears Sales' = CALCULATE([Sales Amount], Sales[ProductID]=2)
'Grapes Sales' = CALCULATE([Sales Amount], Sales[ProductID]=3)

Let’s say you then have a report which shows the value of the ‘Oranges Sales’ measure:

Running the DAX query generated by this table in DAX Studio with Server Timings enabled shows that there is just one Storage Engine query generated by this DAX query:

Here’s the xmSQL for that single SE query:

SET DC_KIND="AUTO";
SELECT
    SUM ( 'Sales'[SalesAmount] )
FROM 'Sales'
WHERE
    'Sales'[ProductID] = 1;

The WHERE clause here filters on the ProductID 1, which is the ID of the product Oranges.

Now if you add the measures ‘Pear Sales’ and ‘Grapes Sales’ to the table visual:

…and run the query in DAX Studio again, you’ll still see a single SE query:

Here’s the xmSQL:

SET DC_KIND="AUTO";
SELECT
    'Sales'[ProductID],
    SUM ( 'Sales'[SalesAmount] ),
    SUM ( 'Sales'[SalesAmount] ),
    SUM ( 'Sales'[SalesAmount] )
FROM 'Sales'
WHERE
    'Sales'[ProductID] IN ( 1, 2, 3 ) ;

As you can see, the WHERE clause now filters on the ProductIDs 1, 2 or 3: the IDs of the three products used in the three measures. This is DAX fusion – specifically horizontal fusion – in action. It’s an optimisation where multiple filters on the same column can be combined into a single SE query. Fewer SE queries is generally better for performance. So far so good.

Now let’s add the measure ‘Apples Sales’ to the table visual:

Running the DAX query in DAX Studio now shows there are two SE queries:

The first SE query has the same xmSQL as the previous DAX query:

SET DC_KIND="AUTO";
SELECT
    'Sales'[ProductID],
    SUM ( 'Sales'[SalesAmount] ),
    SUM ( 'Sales'[SalesAmount] ),
    SUM ( 'Sales'[SalesAmount] )
FROM 'Sales'
WHERE
    'Sales'[ProductID] IN ( 1, 2, 3 ) ;

The second SE query has the following xmSQL and a WHERE clause that indicates it is retrieving the data for just Apples:

SET DC_KIND="AUTO";
SELECT
    SUM ( 'Sales'[SalesAmount] )
FROM 'Sales'
WHERE
    'Sales'[ProductID] IN ( null, 0 ) ;

Two SE queries can mean worse overall performance. Why is the filter on 0 in the ‘Apples Sales’ measure special? Why does it result in a second SE query, why does this second SE query filter on 0 or null, and why doesn’t horizontal fusion take place for Apples?

The answer lies with how DAX handles blanks and zeroes, something discussed in depth in this article by Marco Russo. The filter condition in the ‘Apples Sales’ measure:

'Apples Sales' = CALCULATE([Sales Amount], Sales[ProductID]=0)

..actually filters on 0 or blank and that’s why the xmSQL of that second SE query filters on 0 or null, and that in turn explains why horizontal fusion does not take place – all the other measures filter on a specific number, the ‘Apples Sales’ measure filters on the number 0 or blank.

The solution is to update the measures in the model to use the strictly equal to == operator like so:

'Sales Amount' = SUM('Sales'[SalesAmount])
'Apples Sales' = CALCULATE([Sales Amount], Sales[ProductID]==0)
'Oranges Sales' = CALCULATE([Sales Amount], Sales[ProductID]==1)
'Pears Sales' = CALCULATE([Sales Amount], Sales[ProductID]==2)
'Grapes Sales' = CALCULATE([Sales Amount], Sales[ProductID]==3)

After this change the DAX query that returns the measures for ‘Apples Sales’, ‘Oranges Sales’, ‘Pear Sales’ and ‘Grapes Sales’ now generates a single SE query, meaning that horizontal fusion is taking place for all measures:

Here’s the xmSQL for that query:

SET DC_KIND="AUTO";
SELECT
    'Sales'[ProductID],
    SUM ( 'Sales'[SalesAmount] ),
    SUM ( 'Sales'[SalesAmount] ),
    SUM ( 'Sales'[SalesAmount] ),
    SUM ( 'Sales'[SalesAmount] )
FROM 'Sales'
WHERE
    'Sales'[ProductID] IN ( 0, 1, 2, 3 ) ;

As you can see, the WHERE clause now filters on the Product IDs 0, 1, 2 or 3.

This example uses an Import mode model but this tip also applies to DirectQuery mode and because additional SE queries (which mean additional SQL queries) can have more of an impact on performance in DirectQuery mode then ensuring horizontal fusion takes place can be even more important in DirectQuery mode.

I think this tip could benefit a lot of semantic models out there. A lot of measures use Calculate() and filter on numeric columns: the customer I was working with this week had measures that filtered on a year offset column on their date dimension table (so filtering on offset 0 meant this year, filtering on offset 1 meant last year and so on) and I reproduced the problem on my sample DirectQuery semantic model based on the ContosoDW sample database with a filter on the NumberChildrenAtHome column of the Customer dimension table. Also, I can’t remember the last time I saw the strictly equal to operator in the wild. If you find this tip helps you, please let me know by leaving a comment!

A Look At The Impact Of Calendar Based Time Intelligence On Power BI DirectQuery Performance

Calendar-based time intelligence (see here for the announcement and here for Marco and Alberto’s more in-depth article) is at least the second-most exciting thing to happen in DAX in the last few months: it makes many types of time intelligence calculation much easier to implement. But as far as I know only Reid Havens, in this video, has mentioned the performance impact of using this new feature and that was for Import mode. So I wondered: do these benefits also apply to DirectQuery mode? The answer is on balance yes but it’s not clear-cut.

To illustrate what I mean, I built a simple DirectQuery model against the Adventure Works DW sample database in SQL Server:

This model used the old “Mark as date table” time intelligence.

Here are the definitions of some of the measures:

Sales Amount =
SUM ( 'Internet Sales'[SalesAmount] )

YTD Sales Amount =
CALCULATE ( [Sales Amount], DATESYTD ( 'Date'[FullDateAlternateKey] ) )

PY YTD Sales Amount =
CALCULATE (
    [YTD Sales Amount],
    SAMEPERIODLASTYEAR ( 'Date'[FullDateAlternateKey] )
)

The Sales Amount measure returns the sum of the values in the SalesAmount column; the YTD Sales Amount finds the year-to-date sum of Sales Amount; and PY YTD Sales Amount finds the value of this measure in the same period of the previous year.

I then created a matrix visual showing the PY YTD Sales Amount measure with EnglishProductName from the Product dimension on columns and CalendarYear and EnglishMonthName from the Date dimension on rows:

I copied the DAX query for this visual from Performance Analyzer, pasted it into DAX Studio and then ran it on a cold cache with Server Timings enabled. Here’s what Server Timings showed:

A total duration of 1.9 seconds and 5 SE queries doesn’t look too bad. But here are the Execution Metrics for this query with some important metrics highlighted:

{ 
"timeStart": "2025-11-29T17:17:16.461Z", 
"timeEnd": "2025-11-29T17:17:18.350Z",  
"durationMs": 1890, 
"datasourceConnectionThrottleTimeMs": 0, 
"directQueryConnectionTimeMs": 24, 
"directQueryIterationTimeMs": 166, 
"directQueryTotalTimeMs": 1681, 
"externalQueryExecutionTimeMs": 1493, 
"queryProcessingCpuTimeMs": 16, 
"totalCpuTimeMs": 828, 
"executionDelayMs": 3,  
"approximatePeakMemConsumptionKB": 20977,  
"directQueryTimeoutMs": 3599000, 
"tabularConnectionTimeoutMs": 3600000,  
"commandType": "Statement", 
"queryDialect": 3, 
"queryResultRows": 1613, 
"directQueryRequestCount": 5, 
"directQueryTotalRows": 33756 
}

The important thing to notice is that while the DAX query returns 1613 rows (see the queryResultRows metric) the SQL queries generated for that DAX query return 33756 rows between them (see the directQueryTotalRows metric). Why the big difference? This is because to do the year-to-date calculation using the old time intelligence functionality, Power BI has to run a query at the date granularity, which explains why there are so many more rows returned by the SQL queries. For example here’s a snippet of the last SQL query generated:

Yuck. What’s more, bringing this number of rows from the source can be time-consuming and even after these rows have made it to Power BI, they need to be iterated over (see the directQueryIterationTimeMs metric of 166ms) and aggregated up to get the final result of the calculation. This requires memory (see the approximatePeakMemConsumptionKB metric of 20977KB) and CPU (see the totalCpuTimeMs metric of 828ms) as well as adding to the overall duration of the DAX query.

I then created a copy of this model and set up a calendar using the new calendar-based time intelligence feature like so:

I then modified the measures above to use this new calendar:

Sales Amount =
SUM ( 'Internet Sales'[SalesAmount] )

YTD Sales Amount =
CALCULATE ( [Sales Amount], DATESYTD ( 'Gregorian' ) )

PY YTD Sales Amount =
CALCULATE ( [YTD Sales Amount], SAMEPERIODLASTYEAR ( 'Gregorian' ) )

I then reran the same DAX query from my matrix visual in DAX Studio for this model. Here are the Server Timings:

The good news is that the query is now much faster: 0.5 seconds instead of 1.9 seconds. But there are more SE queries! I’m told this is because some fusion optimisations (this presentation by Phil Seamark is an excellent introduction to this subject) haven’t yet been implemented for the new calendar-based time intelligence functionality yet, which means more SQL queries are generated than you might expect. Indeed some of the SQL queries run are identical. And since there is a limit on the number of connections that Power BI can use to run SQL queries in DirectQuery mode, and since you can run into performance problems when you hit those limits (see here for more details), then more SQL queries can be a bad thing – especially when there are many visuals on a page or a lot of concurrent users using the same semantic model.

However there is more good news if you look closely. Here are the Execution Metrics for this second run:

{ 
"timeStart": "2025-11-29T17:34:56.223Z", 
"timeEnd": "2025-11-29T17:34:56.754Z",  
"durationMs": 531, 
"datasourceConnectionThrottleTimeMs": 0, 
"directQueryConnectionTimeMs": 41, 
"directQueryIterationTimeMs": 38, 
"directQueryTotalTimeMs": 465, 
"externalQueryExecutionTimeMs": 410, 
"queryProcessingCpuTimeMs": 16, 
"totalCpuTimeMs": 141, 
"executionDelayMs": 0,  
"approximatePeakMemConsumptionKB": 3812,  
"directQueryTimeoutMs": 3600000, 
"tabularConnectionTimeoutMs": 3600000,  
"commandType": "Statement", 
"queryDialect": 3, 
"queryResultRows": 1613, 
"directQueryRequestCount": 11, 
"directQueryTotalRows": 3369 
}

Even though there are more SQL queries now the total number of rows returned by them is much less: the directQueryTotalRows metric is only 3369, so about 10% of what it was before. Why? Because instead of having to go down to the date granularity to do the calculations, the new calendar-based time intelligence functionality allows Power BI to do the calculation at the month granularity. Here’s a snippet of one of the SQL queries generated that shows this:

This in turn means that directQueryIterationTimeMs (now only 38ms), totalCpuTimeMs (now only 141ms) and approximatePeakMemConsumptionKB (now only 3812KB) are all much less than before. Also, this could mean you’re less likely to run into the Max Intermediate Row Set Count limit on the maximum number of rows that a DirectQuery SQL query can return and it opens up more opportunities to use aggregations to improve performance.

As a result, if you’re running into query performance, CU or memory-related problems in DirectQuery mode, you should experiment with using the new calendar-based time intelligence feature to see if it can help even if it results in more SQL queries being generated. Hopefully when those fusion optimisations are implemented in the future the benefits will be even greater.

Finally, it’s also worth mentioning that using Visual Calculations or Window functions (as discussed here) have very similar benefits when tuning DirectQuery mode, so you should check them out too and consider using them in combination with calendar-based time intelligence.

Linking Queries Run From DAX Studio To Workspace Monitoring And The Capacity Metrics App

A few weeks ago I wrote a blog post about how you can now link event data from Workspace Monitoring to data in the Fabric Capacity Metrics App using OperationId values. In the latest (3.4.0) release of DAX Studio there’s a new feature that you might have missed that link queries run from there to Workspace Monitoring and the Capacity Metrics App in the same way.

I connected DAX Studio to a published Power BI semantic model, turned on Server Timings and ran a DAX query. When the query finished I clicked the Info button at the top of the Server Timings pane, which opened the Query Information dialog shown below, and then I copied the Request ID value from there:

The Request ID is the same thing as the OperationId value in Workspace Monitoring and the Capacity Metrics app. I was then able to use this value in a KQL query in Workspace Monitoring like so:

SemanticModelLogs
| where Timestamp > ago(1h)
| where toupper(OperationId) == "5D4A4F47-370D-4634-B67B-E4B58CB067A8"
| project Timestamp, OperationName, OperationDetailName, EventText, Status, StatusCode
| order by Timestamp asc

[Note that you need to convert the OperationID value in Workspace Monitoring to uppercase to match it to the Request ID from DAX Studio]

What’s the value of doing this? Probably not much for an individual query because the information you see in Workspace Monitoring is exactly the same as you get in Server Timings in DAX Studio, and DAX Studio shows this information in a way that’s much easier to understand. If you’re testing a number of DAX queries, though, then having all this data available in Workspace Monitoring means you can do detailed comparisons of different runs using KQL.

What is really useful is being able to find the CU usage on a capacity of a DAX query run from DAX Studio – something that is very important when tuning DAX queries. While DAX Studio (and Workspace Monitoring) will give you a CPU Time value for a DAX query, for reasons I won’t go into here you won’t be able to reverse engineer the algorithm that converts CPU Time to CUs. However by linking the RequestId from DAX Studio to the OperationIds seen on the Timepoint Detail and the new Timepoint Item Detail page in newer versions of the Capacity Metrics app (as discussed here) you don’t need to care about that. You can simply take the Request ID from DAX Studio and find it in the Capacity Metrics app. For example on the Timepoint Item Detail (preview) page you can either find the value in the OperationId slicer or add the OperationId column to the lower table using the Select Optional Column(s) slicer and look for it there:

Calling DAX UDFs From Power BI Copilot

Can you call a DAX UDF from Power BI Copilot? I was asked this question by Jake Duddy during the livestream on Power BI Copilot I did with Reid Havens last week. I already knew it was possible because one of the customers I work with had already tried it, but I hadn’t tried it myself. So I did, and it is possible, and here’s the blog post.

A few months ago I wrote a post about how you can put template DAX queries in your AI Instructions to show Copilot how to solve more complex problems that can only be solved with a custom DAX query. I took some of the code from that post and turned it into the following DAX UDF:

createOrReplace

	function ABC = ```
			(
				AUpperBoundary: SCALAR int64,
				BUpperBoundary: SCALAR int64,
				AnalysisDate: SCALAR datetime
			) => 
			VAR ApplyAnalysisDate = 
			CALCULATETABLE(
				'Transactions',
				'Date'[Date] = AnalysisDate
			)
			VAR AddGroupColumn = 
			ADDCOLUMNS(	
				ApplyAnalysisDate, 
				"Group",
				SWITCH(
					TRUE(),
					//If the price is less than or equal to AUpperBoundary
					//then return the value "A"
					Transactions[Price]<=AUpperBoundary, "A (<=£" & AUpperBoundary & ")",
					//If the price is less than or equal to BUpperBoundary
					//then return the value "B"
					Transactions[Price]<=BUpperBoundary, "B (>£" & AUpperBoundary & " and <=£" & BUpperBoundary & ")",
					//Otherwise return the value "C"
					"C (>£" & BUpperBoundary & ")"
				)
			)
			RETURN
			SUMMARIZE(
			    AddGroupColumn,
				[Group],
			    "Count Of Transactions", [Count Of Transactions]
			)

This UDF does a basic form of ABC analysis on the semantic model I’ve used in all my recent Copilot posts containing UK Land Registry data on real estate transactions:

Note: this is not great quality code and it’s certainly not a general purpose solution for ABC analysis in DAX, but it’s a UDF. I then added the following to my AI Instructions describing what the UDF does and how to use it:

This semantic model contains a DAX user-defined function called ABC that does an ABC analysis on the data in the Transactions table. It takes three parameters defined as follows:
AUpperBoundary - an integer value which is the upper boundary of transactions in the A group
BUpperBoundary - an integer value which is the upper boundary of transactions in the B group
AnalysisDate: a datetime value which is the date to filter transactions on
The function returns a table which can be used in an EVALUATE statement in a DAX query.

For example if I wanted to see the number of transactions which took place on 1st January 2025 divided into three groups:
A - transactions between £0 up to and including £250000
B - transactions above £250000 up to and including £700000
C - transactions above £700000
I could call the function as follows:
ABC(250000, 700000, DATE(2025,1,1))

Then I tried the following prompt:

Do an ABC analysis on the Transactions table, grouping all transactions into three groups named A, B and C based on the Price column where the upper boundary for the first group is £290000 and the upper boundary for the second group is £780000. Filter the transactions to just 14th January 2025.

Here’s the response I got:

As in that previous post, Copilot generated a DAX query to answer the question. Apart from that weird Sort Key column it’s the right answer; an additional AI Instruction telling Copilot to sort on the Group column might help here. The difference between this response and the one I got in my previous post is that this time the DAX query generated by Copilot simply calls the UDF. Here it is:

// DAX query generated by Fabric Copilot with "Perform an ABC analysis on the Transactions table, grouping all transactions into three groups named A, B, and C based on the Price column, where group A includes transactions with Price up to £290,000, group B includes transactions with Price above £290,000 and up to £780,000, and group C includes transactions with Price above £780,000. Only include transactions from 14th January 2025."
// ABC analysis for Transactions on 14-Jan-2025 with the specified price boundaries.
// A: Price <= 290,000
// B: 290,000 < Price <= 780,000
// C: Price > 780,000
// The ABC function is user-defined in the model and returns a table.
// Since the column names returned by ABC are not specified, add a dummy sort column to satisfy the ORDER BY requirement.
EVALUATE
  ADDCOLUMNS(
    ABC(290000, 780000, DATE(2025, 1, 14)),
    "Sort Key", 0
  )
ORDER BY
  [Sort Key] ASC

I guess the advantage of this over the DAX template approach is that it’s likely to be a lot more reliable and consistent: the less DAX Copilot has to generate, the more likely it is to do what you expect. On the other hand it’s a lot less flexible. For example I tried this prompt from my previous blog post which asks for an ABC analysis with four rather than three groups:

Do an ABC analysis on the Transactions table, grouping all transactions into four groups named A, B, C and D based on the Price column where the upper boundary for the first group is £300000, the upper boundary for the second group is £750000 and the upper boundary for the third group is £900000. Filter the transactions to just 16th January 2025.

I got the correct result from Copilot but the DAX query generated didn’t use the UDF because the UDF is hard coded to only return three groups; I suppose I was lucky in this case.

Based on these – admittedly fairly basic – tests I think using DAX UDFs with Power BI Copilot could be very useful when you need Copilot to generate complex measures or calculations where you know the general DAX pattern to use.

Monitoring The DAX Queries Generated When The Power BI Copilot Index Is Built

In my last post I talked about the text index that Power BI Copilot builds to help it answer data questions. You might be wondering if you can monitor the index build process and the bad news is that – at least at the time of writing – you can’t do so directly. However you can monitor it indirectly because the index build process runs DAX queries to get text values from the semantic model and you can see the DAX queries being run using Workspace Monitoring, Log Analytics or Profiler. While this is described quite well in the docs here let’s see what these DAX queries actually look like.

I published the semantic model from my last post (which has four text columns CustomerId and CustomerName from the Customers table, CustomerId and TransactionId from the Orders table, none of which are hidden or excluded from indexing using the Simplify The Data Schema feature) to a workspace where Workspace Monitoring was enabled:

I then ran the following KQL query to look at the DAX queries run in the workspace I published to:

SemanticModelLogs
| where Timestamp > ago(1hr)
| where OperationName == "QueryEnd"
| project Timestamp, EventText, DurationMs, CpuTimeMs

I saw that the index build generated three DAX queries with the same pattern:

Here’s the first of these queries:

EVALUATE
SELECTCOLUMNS (
    FILTER (
        VALUES ( 'Customers'[CustomerName] ),
        LEN ( 'Customers'[CustomerName] ) <= 100
    ),
    "valueColumn", 'Customers'[CustomerName]
)

In line with what is described in the documentation this query gets all the distinct text values from the CustomerName column on the Customer table that are less than 100 characters long. I assume there are only three DAX queries even though there are four text columns in the model because one of the columns, TransactionId, has 5 million distinct values in it and Copilot cannot currently index more than 5 million text values in a single model.

[Interestingly, I saw that one of these queries failed with a memory error which I have asked to be investigated]

Each query took between 300-600ms and there was a small gap between queries, so you can see how a large model with a lot of text columns could generate a lot of queries that all together take a noticeable amount of time to run. Finding the amount of time between the start of the first of these queries and the end of the last query will give you a rough idea of how long it takes to build the index, even though there is some extra work that needs to be done after the last query has been run which can sometimes take some time too.

As described here, indexing takes place for Import mode models with Q&A enabled every time the model is published, as well as every time it is refreshed so long as either Copilot or Q&A has been used in the last 14 days. For DirectQuery or Direct Lake models indexing also takes place every time the model is published but because it’s impossible to say whether the underlying data has changed when a refresh takes place, indexing happens instead every 24 hours so long as either Copilot or Q&A have been used in the last 14 days. As a result, if you’re an admin, you might see a lot of DAX queries similar to the one above being run.

[Thanks to Aaron Meyers for much of the information in this post]

Power BI Copilot, AI Instructions And DAX Query Templates

At the end of my last post I showed how, if you put the definition of a measure in the AI Instructions of your Power BI semantic model, Copilot can use it as a template to create new measures with slightly different logic. That got me wondering – can you do the same thing for DAX queries to solve even more complex problems? It turns out you can.

First of all, why would you want to do this? There are some problems that can be solved relatively easily by writing a DAX query but which are quite hard to solve if you’re limited to writing measures. ABC classification is one example of this. Marco and Alberto have a great article on this problem here; the problem is when you want to increase the number of buckets/clusters/groups you use or change the rules for which transactions go into which buckets and make all this dynamic for end users building reports in Power BI Desktop then your semantic model can get very complicated very quickly. In a DAX query you can solve these problems much more easily, in part because you can define a calculated column at query time and use it to do your grouping. And since Copilot can generate and run DAX queries, it can be taught how to do ABC analysis by giving it an example DAX query and some instructions telling it how the query works. Let’s see an example.

The sample semantic model I’ve been using in this series has a fact table called Transactions where each row represents a real estate property sale in England and Wales. Let’s say you want to group these rows into different groups based on the price paid for the property. With nothing in the AI Instructions for the model, the following prompt:

Do an ABC analysis on the Transactions table, grouping all transactions into three groups named A, B and C based on the Price column where the upper boundary for the first group is £250000 and the upper boundary for the second group is £700000. Filter the transactions to just 1st January 2025.

…will sometimes generate the right result but will also sometimes generate a slightly strange or even incorrect result. Here’s an example where the information in the summary is correct but the DAX query is far from ideal:

EVALUATE
  // Create a table with transactions on 1st January 2025 and assign an ABC group based on Price
  SELECTCOLUMNS(
    FILTER(
      'Transactions',
      'Transactions'[Date] = DATE(2025, 1, 1)  // Filter transactions to 1st January 2025
    ),
    "Price", 'Transactions'[Price],
    "ABC Group", 
      SWITCH(
        TRUE(),
        'Transactions'[Price] <= 250000, "A",      // Group A: Price up to £250,000
        'Transactions'[Price] <= 700000, "B",      // Group B: Price over £250,000 and up to £700,000
        "C"                                       // Group C: Price above £700,000
      )
  )
ORDER BY
  [ABC Group] ASC,
  [Price] ASC

This query returns one row per transaction rather than one row per group:

Not something you can give to an end user for sure. Here’s a DAX query that gives me exactly the result I want:

DEFINE
//Create a new calculated column at query time 
//to create the groups for ABC classification
COLUMN 'Transactions'[Group] =
//Upper boundary for price for group A
VAR AUpperBoundary = 250000
//Upper boundary for price for group B
VAR BUpperBoundary = 700000
RETURN
//Return a different letter representing a group name
//based on where the value in the Price column sits in
//the boundaries defined
SWITCH(
TRUE(),
//If the price is less than or equal to the variable AUpperBoundary
//then return the value "A"
Transactions[Price]<=AUpperBoundary, "A (<=£250,000)",
//If the price is less than or equal to the variable BUpperBoundary
//then return the value "B"
Transactions[Price]<=BUpperBoundary, "B (>£250,000 and <=£700,000)",
//Otherwise return the value "C"
"C (>£700,000)"
)
//Returns the results of the classification
EVALUATE
SUMMARIZECOLUMNS(
    'Transactions'[Group],
    //Filter by a given date
     KEEPFILTERS( TREATAS( {DATE(2025,1,1)}, 'Date'[Date] )),
    "Count Of Transactions", [Count Of Transactions]
)
ORDER BY 
    'Transactions'[Group] ASC

Here’s what this query returns in DAX Studio:

Putting this query in the semantic model’s AI Instructions with some explanatory text, like so:

The following DAX query does an ABC analysis on the Transactions table, grouping transactions into three groups called A, B and C, for the 1st January 2025.  The transactions with the lowest prices always go into group A, then subsequent letters represent higher price ranges. If the user asks for an ABC analysis use this query as a template.

DEFINE
//Create a new calculated column at query time 
//to create the groups for ABC classification
COLUMN 'Transactions'[Group] =
//Upper boundary for price for group A
VAR AUpperBoundary = 250000
//Upper boundary for price for group B
VAR BUpperBoundary = 700000
RETURN
//Return a different letter representing a group name
//based on where the value in the Price column sits in
//the boundaries defined
SWITCH(
TRUE(),
//If the price is less than or equal to the variable AUpperBoundary
//then return the value "A"
Transactions[Price]<=AUpperBoundary, "A (<=£250,000)",
//If the price is less than or equal to the variable BUpperBoundary
//then return the value "B"
Transactions[Price]<=BUpperBoundary, "B (>£250,000 and <=£700,000)",
//Otherwise return the value "C"
"C (>£700,000)"
)
//Returns the results of the classification
EVALUATE
SUMMARIZECOLUMNS(
    'Transactions'[Group],
    //Filter by a given date
     KEEPFILTERS( TREATAS( {DATE(2025,1,1)}, 'Date'[Date] )),
    "Count Of Transactions", [Count Of Transactions]
)
ORDER BY 
    'Transactions'[Group] ASC

Means that the prompt above not only returns what you would expect:

…but more importantly, variations on the prompt such as:

Do an ABC analysis on the Transactions table, grouping all transactions into four groups named A, B, C and D based on the Price column where the upper boundary for the first group is £300000, the upper boundary for the second group is £750000 and the upper boundary for the third group is £900000. Filter the transactions to just 16th January 2025.

…take the template query and modify it as you would expect:

Note that the prompt has not only changed the date to filter by but increased the number of groups from three to four, and that Copilot has modified the DAX query to handle this – which I think is very impressive. Here’s the DAX query Copilot generated:

// DAX query generated by Fabric Copilot with "Group all transactions from the Transactions table into four groups named A, B, C, and D based on the Price column, where group A includes transactions with Price up to £300,000, group B includes transactions with Price above £300,000 and up to £750,000, group C includes transactions with Price above £750,000 and up to £900,000, and group D includes transactions with Price above £900,000. Filter the transactions to just 16th January 2025. Show the count of transactions in each group."
DEFINE
// Create a new calculated column at query time 
// to classify transactions into groups A, B, C, and D based on the Price column
COLUMN 'Transactions'[Group] =
    // Define the upper boundaries for each group
    VAR AUpperBoundary = 300000
    VAR BUpperBoundary = 750000
    VAR CUpperBoundary = 900000
    RETURN
        SWITCH(
            TRUE(),
            'Transactions'[Price] <= AUpperBoundary, "A (<=£300,000)",
            'Transactions'[Price] <= BUpperBoundary, "B (>£300,000 and <=£750,000)",
            'Transactions'[Price] <= CUpperBoundary, "C (>£750,000 and <=£900,000)",
            "D (>£900,000)"
        )
        
EVALUATE
    SUMMARIZECOLUMNS(
        'Transactions'[Group],
        // Filter transactions to include only the 16th January 2025
        KEEPFILTERS( TREATAS( { DATE(2025,1,16) }, 'Date'[Date] ) ),
        "Count Of Transactions", [Count Of Transactions]
    )
ORDER BY 
    'Transactions'[Group] ASC

Tools that convert natural language to SQL, like Fabric Data Agents, all allow you to specify example SQL queries like this so it’s hardly a surprise that the same approach applies to DAX. What’s interesting is the type of problems that giving Copilot templates of DAX measures and queries allows you to solve, and I’m only just starting to think of the possibilities.

Memory Overhead Of Distinct Count Measures In Power BI

The series of blog posts I wrote last year on semantic model memory usage, in particular this post on the query memory limit and the “This query uses more memory than the configured limit” error in Power BI, gets a lot of traffic. Since writing that post on the query memory limit I’ve written a few follow-ups on common mistakes that lead to increased query memory usage, such as this one on measures that never return a blank. Today’s post is sort of in that series but it isn’t about a design mistake – it’s just to point out that distinct count measures can be surprisingly memory-hungry.

To illustrate this I built a semantic model consisting of a single table with two columns and 99,999,000 rows, published it and ensured the Large Semantic Model format was enabled:

I created two measures:

Number Of Rows = COUNTROWS('MyTable')
Distinct Customers = DISTINCTCOUNT(MyTable[CustomerID])

Here’s what the model metrics looked like in DAX Studio:

The total model size in memory was 255MB.

I then ran the following DAX query to get the number of rows in the table for each of the 1800 dates in the Date column:

EVALUATE
SUMMARIZECOLUMNS(
    MyTable[Date],
    "Number Of Rows", [Number Of Rows]
)

The approximatePeakMemConsumptionKB from the Execution Metrics Profiler event showed that this query consumed 2109KB – so not much.

Then I ran a similar query to get the number of distinct customer IDs by date:

EVALUATE
SUMMARIZECOLUMNS(
    MyTable[Date],
    "Distinct Customers", [Distinct Customers]
)

The approximatePeakMemConsumptionKB metric for this query was 800325KB – so a lot more than the previous query. In fact even though this model was well under the 1GB size limit for a model not in Premium capacity, the query here used a lot more memory (782MB) than the size of the model itself in memory and it came close to the 1GB limit on the amount of memory a query can consume when the model is not in Premium capacity.

Is there something wrong here? Can the query or model be tuned to reduce memory usage? Not really, no – distinct count queries are almost always more memory intensive than other types of measures. I tested a number of different things such as forcing the use of hash encoding on the CustomerID column, partitioning (Phil Seamark suggested creating one partition for each of the 1800 dates and actually that did reduce memory consumption but it also made the queries extremely slow), changing the ordering of the source data to change how well each column was compressed, calculating the distinct count using the SUMX method, and nothing resulted in lower query memory usage.

What I did find for the model above was that the number of rows returned by the query influenced the memory consumption of the query. So reducing the number of dates returned on rows in my DAX query from 1800 to 366 resulted in approximatePeakMemConsumptionKB going down to 200278KB. So if you’re running into memory errors when running queries with distinct count measures the first thing you should ask yourself is whether you need to show so many distinct counts: I recently ran into this problem with a customer that wanted to plot a line chart of distinct values with dates on the x axis, and we solved the problem by only plotting one day per week for the time period shown on the chart instead of every date. The chart looked almost identical, the DAX query was a lot faster and the memory usage of the DAX query was a lot lower. Distinct count measures combined with table visuals with lots of rows can be dangerous.

The other thing you can do is see if you can remodel your data to turn a distinct count into a count because, as shown above, counts are a lot faster and memory efficient than distinct counts. For example, if you have a fact table containing line items for orders and you need to find the distinct count of order ids, then consider creating a second fact table at the order granularity so you can count the number of rows in it to find the number of distinct orders. This may increase the size of your model but it should certainly reduce your query memory consumption for many queries because you won’t need to do a distinct count.

Limit The Impact Of Expensive Power BI Queries On Your Capacity By Reducing The Query Timeout

The recent announcement of Surge Protection gives Fabric/Power BI capacity admins a way to restrict the impact of background operations on a capacity, preventing them from causing throttling. However, at the time of writing, Surge Protection does not prevent users that are running expensive DAX or MDX queries – which are interactive operations – from causing problems on your capacity. Indeed, right now, there is no direct way to stop runaway queries from consuming a lot of CUs, although there is something you can do which will help a lot: reducing the query timeout.

Surge Protection doesn’t address the problem of expensive queries yet because Power BI only knows the CU usage of a DAX or MDX query when it has finished running – by which time it’s too late to do anything about it. In many cases, though, DAX or MDX queries that consume a lot of CUs are also slow. Therefore reducing the query timeout, which will kill any query that runs longer than a specified duration, will stop these queries from consuming so many CUs.

There are two default query timeouts that you should be aware of in Power BI. First, all DAX queries generated by a Power BI report have a 225 second timeout applied by the report itself. This timeout can be changed in Power BI Desktop but it cannot be changed on a published report in the Power BI Service. Second, you can set a timeout at the capacity level by changing the Query Timeout property in the admin portal. The default setting here is 3600 seconds (one hour). Unlike the first timeout, which only applies to the DAX queries generated by a Power BI report, this timeout applies to all queries run on any semantic model associated with the capacity, including the MDX queries generated by Excel PivotTables via Analyze In Excel. Setting this second timeout to less than 225 seconds means that it will take precedence over the first timeout. Therefore it’s the Query Timeout property on your capacity that you should set.

Hitting a timeout in a Power BI report will give the user a “Query has exceeded the available resources” error; clicking See Details/More Details will give you a message like this:

The XML for Analysis request timed out before it was completed. Timeout value: 10 sec.

Hitting the query timeout in an Excel PivotTable will give you the same message:

What value should you set the Query Timeout to? In my opinion no query should ever run for more than 30 seconds because anything slower will result in a poor experience for your end users – no-one wants to sit around for ages waiting for a report to render. I also think it should be possible to tune any semantic model so all queries run under 30 seconds if you know what you’re doing. That said, in the real world, setting a timeout of 30 seconds may be unrealistic: developers may not have the skills to tune their semantic models. As a result I find a timeout of 100 seconds is often a good compromise but you should experiment with different timeouts to see what the minimum value you can get away with is.

It’s important to note that reducing the query timeout will not stop every expensive query. This is because it’s perfectly possible to have very fast queries that consume a lot of CUs – for example when distinct count measures are used, and/or when there are very large data volumes and/or when there are complex but highly-optimised measures. Also there relatively rare cases where a query will carry on running beyond the duration specified by the timeout, because the Vertipaq engine only checks if the timeout has been exceeded at certain points in the code and depending on the query there could be several seconds (sometimes more) between these checks. Equally, some very slow queries may not use a lot of CUs and having them time out might cause unnecessary disruption. Overall, though, in my experience setting a timeout will stop enough expensive queries to make doing so worthwhile.

[Update: my colleague Akshai Mirchandani has just reminded me that you can also set the Query Timeout at the workspace level as a Server Property using SQL Server Management Studio, as detailed here. The property is called ServerTimeout. This gives you more flexibility than setting it for the whole capacity.]