A Look At The Impact Of Calendar Based Time Intelligence On Power BI DirectQuery Performance

Calendar-based time intelligence (see here for the announcement and here for Marco and Alberto’s more in-depth article) is at least the second-most exciting thing to happen in DAX in the last few months: it makes many types of time intelligence calculation much easier to implement. But as far as I know only Reid Havens, in this video, has mentioned the performance impact of using this new feature and that was for Import mode. So I wondered: do these benefits also apply to DirectQuery mode? The answer is on balance yes but it’s not clear-cut.

To illustrate what I mean, I built a simple DirectQuery model against the Adventure Works DW sample database in SQL Server:

This model used the old “Mark as date table” time intelligence.

Here are the definitions of some of the measures:

Sales Amount =
SUM ( 'Internet Sales'[SalesAmount] )

YTD Sales Amount =
CALCULATE ( [Sales Amount], DATESYTD ( 'Date'[FullDateAlternateKey] ) )

PY YTD Sales Amount =
CALCULATE (
    [YTD Sales Amount],
    SAMEPERIODLASTYEAR ( 'Date'[FullDateAlternateKey] )
)

The Sales Amount measure returns the sum of the values in the SalesAmount column; the YTD Sales Amount finds the year-to-date sum of Sales Amount; and PY YTD Sales Amount finds the value of this measure in the same period of the previous year.

I then created a matrix visual showing the PY YTD Sales Amount measure with EnglishProductName from the Product dimension on columns and CalendarYear and EnglishMonthName from the Date dimension on rows:

I copied the DAX query for this visual from Performance Analyzer, pasted it into DAX Studio and then ran it on a cold cache with Server Timings enabled. Here’s what Server Timings showed:

A total duration of 1.9 seconds and 5 SE queries doesn’t look too bad. But here are the Execution Metrics for this query with some important metrics highlighted:

{ 
"timeStart": "2025-11-29T17:17:16.461Z", 
"timeEnd": "2025-11-29T17:17:18.350Z",  
"durationMs": 1890, 
"datasourceConnectionThrottleTimeMs": 0, 
"directQueryConnectionTimeMs": 24, 
"directQueryIterationTimeMs": 166, 
"directQueryTotalTimeMs": 1681, 
"externalQueryExecutionTimeMs": 1493, 
"queryProcessingCpuTimeMs": 16, 
"totalCpuTimeMs": 828, 
"executionDelayMs": 3,  
"approximatePeakMemConsumptionKB": 20977,  
"directQueryTimeoutMs": 3599000, 
"tabularConnectionTimeoutMs": 3600000,  
"commandType": "Statement", 
"queryDialect": 3, 
"queryResultRows": 1613, 
"directQueryRequestCount": 5, 
"directQueryTotalRows": 33756 
}

The important thing to notice is that while the DAX query returns 1613 rows (see the queryResultRows metric) the SQL queries generated for that DAX query return 33756 rows between them (see the directQueryTotalRows metric). Why the big difference? This is because to do the year-to-date calculation using the old time intelligence functionality, Power BI has to run a query at the date granularity, which explains why there are so many more rows returned by the SQL queries. For example here’s a snippet of the last SQL query generated:

Yuck. What’s more, bringing this number of rows from the source can be time-consuming and even after these rows have made it to Power BI, they need to be iterated over (see the directQueryIterationTimeMs metric of 166ms) and aggregated up to get the final result of the calculation. This requires memory (see the approximatePeakMemConsumptionKB metric of 20977KB) and CPU (see the totalCpuTimeMs metric of 828ms) as well as adding to the overall duration of the DAX query.

I then created a copy of this model and set up a calendar using the new calendar-based time intelligence feature like so:

I then modified the measures above to use this new calendar:

Sales Amount =
SUM ( 'Internet Sales'[SalesAmount] )

YTD Sales Amount =
CALCULATE ( [Sales Amount], DATESYTD ( 'Gregorian' ) )

PY YTD Sales Amount =
CALCULATE ( [YTD Sales Amount], SAMEPERIODLASTYEAR ( 'Gregorian' ) )

I then reran the same DAX query from my matrix visual in DAX Studio for this model. Here are the Server Timings:

The good news is that the query is now much faster: 0.5 seconds instead of 1.9 seconds. But there are more SE queries! I’m told this is because some fusion optimisations (this presentation by Phil Seamark is an excellent introduction to this subject) haven’t yet been implemented for the new calendar-based time intelligence functionality yet, which means more SQL queries are generated than you might expect. Indeed some of the SQL queries run are identical. And since there is a limit on the number of connections that Power BI can use to run SQL queries in DirectQuery mode, and since you can run into performance problems when you hit those limits (see here for more details), then more SQL queries can be a bad thing – especially when there are many visuals on a page or a lot of concurrent users using the same semantic model.

However there is more good news if you look closely. Here are the Execution Metrics for this second run:

{ 
"timeStart": "2025-11-29T17:34:56.223Z", 
"timeEnd": "2025-11-29T17:34:56.754Z",  
"durationMs": 531, 
"datasourceConnectionThrottleTimeMs": 0, 
"directQueryConnectionTimeMs": 41, 
"directQueryIterationTimeMs": 38, 
"directQueryTotalTimeMs": 465, 
"externalQueryExecutionTimeMs": 410, 
"queryProcessingCpuTimeMs": 16, 
"totalCpuTimeMs": 141, 
"executionDelayMs": 0,  
"approximatePeakMemConsumptionKB": 3812,  
"directQueryTimeoutMs": 3600000, 
"tabularConnectionTimeoutMs": 3600000,  
"commandType": "Statement", 
"queryDialect": 3, 
"queryResultRows": 1613, 
"directQueryRequestCount": 11, 
"directQueryTotalRows": 3369 
}

Even though there are more SQL queries now the total number of rows returned by them is much less: the directQueryTotalRows metric is only 3369, so about 10% of what it was before. Why? Because instead of having to go down to the date granularity to do the calculations, the new calendar-based time intelligence functionality allows Power BI to do the calculation at the month granularity. Here’s a snippet of one of the SQL queries generated that shows this:

This in turn means that directQueryIterationTimeMs (now only 38ms), totalCpuTimeMs (now only 141ms) and approximatePeakMemConsumptionKB (now only 3812KB) are all much less than before. Also, this could mean you’re less likely to run into the Max Intermediate Row Set Count limit on the maximum number of rows that a DirectQuery SQL query can return and it opens up more opportunities to use aggregations to improve performance.

As a result, if you’re running into query performance, CU or memory-related problems in DirectQuery mode, you should experiment with using the new calendar-based time intelligence feature to see if it can help even if it results in more SQL queries being generated. Hopefully when those fusion optimisations are implemented in the future the benefits will be even greater.

Finally, it’s also worth mentioning that using Visual Calculations or Window functions (as discussed here) have very similar benefits when tuning DirectQuery mode, so you should check them out too and consider using them in combination with calendar-based time intelligence.

Linking Queries Run From DAX Studio To Workspace Monitoring And The Capacity Metrics App

A few weeks ago I wrote a blog post about how you can now link event data from Workspace Monitoring to data in the Fabric Capacity Metrics App using OperationId values. In the latest (3.4.0) release of DAX Studio there’s a new feature that you might have missed that link queries run from there to Workspace Monitoring and the Capacity Metrics App in the same way.

I connected DAX Studio to a published Power BI semantic model, turned on Server Timings and ran a DAX query. When the query finished I clicked the Info button at the top of the Server Timings pane, which opened the Query Information dialog shown below, and then I copied the Request ID value from there:

The Request ID is the same thing as the OperationId value in Workspace Monitoring and the Capacity Metrics app. I was then able to use this value in a KQL query in Workspace Monitoring like so:

SemanticModelLogs
| where Timestamp > ago(1h)
| where toupper(OperationId) == "5D4A4F47-370D-4634-B67B-E4B58CB067A8"
| project Timestamp, OperationName, OperationDetailName, EventText, Status, StatusCode
| order by Timestamp asc

[Note that you need to convert the OperationID value in Workspace Monitoring to uppercase to match it to the Request ID from DAX Studio]

What’s the value of doing this? Probably not much for an individual query because the information you see in Workspace Monitoring is exactly the same as you get in Server Timings in DAX Studio, and DAX Studio shows this information in a way that’s much easier to understand. If you’re testing a number of DAX queries, though, then having all this data available in Workspace Monitoring means you can do detailed comparisons of different runs using KQL.

What is really useful is being able to find the CU usage on a capacity of a DAX query run from DAX Studio – something that is very important when tuning DAX queries. While DAX Studio (and Workspace Monitoring) will give you a CPU Time value for a DAX query, for reasons I won’t go into here you won’t be able to reverse engineer the algorithm that converts CPU Time to CUs. However by linking the RequestId from DAX Studio to the OperationIds seen on the Timepoint Detail and the new Timepoint Item Detail page in newer versions of the Capacity Metrics app (as discussed here) you don’t need to care about that. You can simply take the Request ID from DAX Studio and find it in the Capacity Metrics app. For example on the Timepoint Item Detail (preview) page you can either find the value in the OperationId slicer or add the OperationId column to the lower table using the Select Optional Column(s) slicer and look for it there:

Calling DAX UDFs From Power BI Copilot

Can you call a DAX UDF from Power BI Copilot? I was asked this question by Jake Duddy during the livestream on Power BI Copilot I did with Reid Havens last week. I already knew it was possible because one of the customers I work with had already tried it, but I hadn’t tried it myself. So I did, and it is possible, and here’s the blog post.

A few months ago I wrote a post about how you can put template DAX queries in your AI Instructions to show Copilot how to solve more complex problems that can only be solved with a custom DAX query. I took some of the code from that post and turned it into the following DAX UDF:

createOrReplace

	function ABC = ```
			(
				AUpperBoundary: SCALAR int64,
				BUpperBoundary: SCALAR int64,
				AnalysisDate: SCALAR datetime
			) => 
			VAR ApplyAnalysisDate = 
			CALCULATETABLE(
				'Transactions',
				'Date'[Date] = AnalysisDate
			)
			VAR AddGroupColumn = 
			ADDCOLUMNS(	
				ApplyAnalysisDate, 
				"Group",
				SWITCH(
					TRUE(),
					//If the price is less than or equal to AUpperBoundary
					//then return the value "A"
					Transactions[Price]<=AUpperBoundary, "A (<=£" & AUpperBoundary & ")",
					//If the price is less than or equal to BUpperBoundary
					//then return the value "B"
					Transactions[Price]<=BUpperBoundary, "B (>£" & AUpperBoundary & " and <=£" & BUpperBoundary & ")",
					//Otherwise return the value "C"
					"C (>£" & BUpperBoundary & ")"
				)
			)
			RETURN
			SUMMARIZE(
			    AddGroupColumn,
				[Group],
			    "Count Of Transactions", [Count Of Transactions]
			)

This UDF does a basic form of ABC analysis on the semantic model I’ve used in all my recent Copilot posts containing UK Land Registry data on real estate transactions:

Note: this is not great quality code and it’s certainly not a general purpose solution for ABC analysis in DAX, but it’s a UDF. I then added the following to my AI Instructions describing what the UDF does and how to use it:

This semantic model contains a DAX user-defined function called ABC that does an ABC analysis on the data in the Transactions table. It takes three parameters defined as follows:
AUpperBoundary - an integer value which is the upper boundary of transactions in the A group
BUpperBoundary - an integer value which is the upper boundary of transactions in the B group
AnalysisDate: a datetime value which is the date to filter transactions on
The function returns a table which can be used in an EVALUATE statement in a DAX query.

For example if I wanted to see the number of transactions which took place on 1st January 2025 divided into three groups:
A - transactions between £0 up to and including £250000
B - transactions above £250000 up to and including £700000
C - transactions above £700000
I could call the function as follows:
ABC(250000, 700000, DATE(2025,1,1))

Then I tried the following prompt:

Do an ABC analysis on the Transactions table, grouping all transactions into three groups named A, B and C based on the Price column where the upper boundary for the first group is £290000 and the upper boundary for the second group is £780000. Filter the transactions to just 14th January 2025.

Here’s the response I got:

As in that previous post, Copilot generated a DAX query to answer the question. Apart from that weird Sort Key column it’s the right answer; an additional AI Instruction telling Copilot to sort on the Group column might help here. The difference between this response and the one I got in my previous post is that this time the DAX query generated by Copilot simply calls the UDF. Here it is:

// DAX query generated by Fabric Copilot with "Perform an ABC analysis on the Transactions table, grouping all transactions into three groups named A, B, and C based on the Price column, where group A includes transactions with Price up to £290,000, group B includes transactions with Price above £290,000 and up to £780,000, and group C includes transactions with Price above £780,000. Only include transactions from 14th January 2025."
// ABC analysis for Transactions on 14-Jan-2025 with the specified price boundaries.
// A: Price <= 290,000
// B: 290,000 < Price <= 780,000
// C: Price > 780,000
// The ABC function is user-defined in the model and returns a table.
// Since the column names returned by ABC are not specified, add a dummy sort column to satisfy the ORDER BY requirement.
EVALUATE
  ADDCOLUMNS(
    ABC(290000, 780000, DATE(2025, 1, 14)),
    "Sort Key", 0
  )
ORDER BY
  [Sort Key] ASC

I guess the advantage of this over the DAX template approach is that it’s likely to be a lot more reliable and consistent: the less DAX Copilot has to generate, the more likely it is to do what you expect. On the other hand it’s a lot less flexible. For example I tried this prompt from my previous blog post which asks for an ABC analysis with four rather than three groups:

Do an ABC analysis on the Transactions table, grouping all transactions into four groups named A, B, C and D based on the Price column where the upper boundary for the first group is £300000, the upper boundary for the second group is £750000 and the upper boundary for the third group is £900000. Filter the transactions to just 16th January 2025.

I got the correct result from Copilot but the DAX query generated didn’t use the UDF because the UDF is hard coded to only return three groups; I suppose I was lucky in this case.

Based on these – admittedly fairly basic – tests I think using DAX UDFs with Power BI Copilot could be very useful when you need Copilot to generate complex measures or calculations where you know the general DAX pattern to use.

Monitoring The DAX Queries Generated When The Power BI Copilot Index Is Built

In my last post I talked about the text index that Power BI Copilot builds to help it answer data questions. You might be wondering if you can monitor the index build process and the bad news is that – at least at the time of writing – you can’t do so directly. However you can monitor it indirectly because the index build process runs DAX queries to get text values from the semantic model and you can see the DAX queries being run using Workspace Monitoring, Log Analytics or Profiler. While this is described quite well in the docs here let’s see what these DAX queries actually look like.

I published the semantic model from my last post (which has four text columns CustomerId and CustomerName from the Customers table, CustomerId and TransactionId from the Orders table, none of which are hidden or excluded from indexing using the Simplify The Data Schema feature) to a workspace where Workspace Monitoring was enabled:

I then ran the following KQL query to look at the DAX queries run in the workspace I published to:

SemanticModelLogs
| where Timestamp > ago(1hr)
| where OperationName == "QueryEnd"
| project Timestamp, EventText, DurationMs, CpuTimeMs

I saw that the index build generated three DAX queries with the same pattern:

Here’s the first of these queries:

EVALUATE
SELECTCOLUMNS (
    FILTER (
        VALUES ( 'Customers'[CustomerName] ),
        LEN ( 'Customers'[CustomerName] ) <= 100
    ),
    "valueColumn", 'Customers'[CustomerName]
)

In line with what is described in the documentation this query gets all the distinct text values from the CustomerName column on the Customer table that are less than 100 characters long. I assume there are only three DAX queries even though there are four text columns in the model because one of the columns, TransactionId, has 5 million distinct values in it and Copilot cannot currently index more than 5 million text values in a single model.

[Interestingly, I saw that one of these queries failed with a memory error which I have asked to be investigated]

Each query took between 300-600ms and there was a small gap between queries, so you can see how a large model with a lot of text columns could generate a lot of queries that all together take a noticeable amount of time to run. Finding the amount of time between the start of the first of these queries and the end of the last query will give you a rough idea of how long it takes to build the index, even though there is some extra work that needs to be done after the last query has been run which can sometimes take some time too.

As described here, indexing takes place for Import mode models with Q&A enabled every time the model is published, as well as every time it is refreshed so long as either Copilot or Q&A has been used in the last 14 days. For DirectQuery or Direct Lake models indexing also takes place every time the model is published but because it’s impossible to say whether the underlying data has changed when a refresh takes place, indexing happens instead every 24 hours so long as either Copilot or Q&A have been used in the last 14 days. As a result, if you’re an admin, you might see a lot of DAX queries similar to the one above being run.

[Thanks to Aaron Meyers for much of the information in this post]

Power BI Copilot, AI Instructions And DAX Query Templates

At the end of my last post I showed how, if you put the definition of a measure in the AI Instructions of your Power BI semantic model, Copilot can use it as a template to create new measures with slightly different logic. That got me wondering – can you do the same thing for DAX queries to solve even more complex problems? It turns out you can.

First of all, why would you want to do this? There are some problems that can be solved relatively easily by writing a DAX query but which are quite hard to solve if you’re limited to writing measures. ABC classification is one example of this. Marco and Alberto have a great article on this problem here; the problem is when you want to increase the number of buckets/clusters/groups you use or change the rules for which transactions go into which buckets and make all this dynamic for end users building reports in Power BI Desktop then your semantic model can get very complicated very quickly. In a DAX query you can solve these problems much more easily, in part because you can define a calculated column at query time and use it to do your grouping. And since Copilot can generate and run DAX queries, it can be taught how to do ABC analysis by giving it an example DAX query and some instructions telling it how the query works. Let’s see an example.

The sample semantic model I’ve been using in this series has a fact table called Transactions where each row represents a real estate property sale in England and Wales. Let’s say you want to group these rows into different groups based on the price paid for the property. With nothing in the AI Instructions for the model, the following prompt:

Do an ABC analysis on the Transactions table, grouping all transactions into three groups named A, B and C based on the Price column where the upper boundary for the first group is £250000 and the upper boundary for the second group is £700000. Filter the transactions to just 1st January 2025.

…will sometimes generate the right result but will also sometimes generate a slightly strange or even incorrect result. Here’s an example where the information in the summary is correct but the DAX query is far from ideal:

EVALUATE
  // Create a table with transactions on 1st January 2025 and assign an ABC group based on Price
  SELECTCOLUMNS(
    FILTER(
      'Transactions',
      'Transactions'[Date] = DATE(2025, 1, 1)  // Filter transactions to 1st January 2025
    ),
    "Price", 'Transactions'[Price],
    "ABC Group", 
      SWITCH(
        TRUE(),
        'Transactions'[Price] <= 250000, "A",      // Group A: Price up to £250,000
        'Transactions'[Price] <= 700000, "B",      // Group B: Price over £250,000 and up to £700,000
        "C"                                       // Group C: Price above £700,000
      )
  )
ORDER BY
  [ABC Group] ASC,
  [Price] ASC

This query returns one row per transaction rather than one row per group:

Not something you can give to an end user for sure. Here’s a DAX query that gives me exactly the result I want:

DEFINE
//Create a new calculated column at query time 
//to create the groups for ABC classification
COLUMN 'Transactions'[Group] =
//Upper boundary for price for group A
VAR AUpperBoundary = 250000
//Upper boundary for price for group B
VAR BUpperBoundary = 700000
RETURN
//Return a different letter representing a group name
//based on where the value in the Price column sits in
//the boundaries defined
SWITCH(
TRUE(),
//If the price is less than or equal to the variable AUpperBoundary
//then return the value "A"
Transactions[Price]<=AUpperBoundary, "A (<=£250,000)",
//If the price is less than or equal to the variable BUpperBoundary
//then return the value "B"
Transactions[Price]<=BUpperBoundary, "B (>£250,000 and <=£700,000)",
//Otherwise return the value "C"
"C (>£700,000)"
)
//Returns the results of the classification
EVALUATE
SUMMARIZECOLUMNS(
    'Transactions'[Group],
    //Filter by a given date
     KEEPFILTERS( TREATAS( {DATE(2025,1,1)}, 'Date'[Date] )),
    "Count Of Transactions", [Count Of Transactions]
)
ORDER BY 
    'Transactions'[Group] ASC

Here’s what this query returns in DAX Studio:

Putting this query in the semantic model’s AI Instructions with some explanatory text, like so:

The following DAX query does an ABC analysis on the Transactions table, grouping transactions into three groups called A, B and C, for the 1st January 2025.  The transactions with the lowest prices always go into group A, then subsequent letters represent higher price ranges. If the user asks for an ABC analysis use this query as a template.

DEFINE
//Create a new calculated column at query time 
//to create the groups for ABC classification
COLUMN 'Transactions'[Group] =
//Upper boundary for price for group A
VAR AUpperBoundary = 250000
//Upper boundary for price for group B
VAR BUpperBoundary = 700000
RETURN
//Return a different letter representing a group name
//based on where the value in the Price column sits in
//the boundaries defined
SWITCH(
TRUE(),
//If the price is less than or equal to the variable AUpperBoundary
//then return the value "A"
Transactions[Price]<=AUpperBoundary, "A (<=£250,000)",
//If the price is less than or equal to the variable BUpperBoundary
//then return the value "B"
Transactions[Price]<=BUpperBoundary, "B (>£250,000 and <=£700,000)",
//Otherwise return the value "C"
"C (>£700,000)"
)
//Returns the results of the classification
EVALUATE
SUMMARIZECOLUMNS(
    'Transactions'[Group],
    //Filter by a given date
     KEEPFILTERS( TREATAS( {DATE(2025,1,1)}, 'Date'[Date] )),
    "Count Of Transactions", [Count Of Transactions]
)
ORDER BY 
    'Transactions'[Group] ASC

Means that the prompt above not only returns what you would expect:

…but more importantly, variations on the prompt such as:

Do an ABC analysis on the Transactions table, grouping all transactions into four groups named A, B, C and D based on the Price column where the upper boundary for the first group is £300000, the upper boundary for the second group is £750000 and the upper boundary for the third group is £900000. Filter the transactions to just 16th January 2025.

…take the template query and modify it as you would expect:

Note that the prompt has not only changed the date to filter by but increased the number of groups from three to four, and that Copilot has modified the DAX query to handle this – which I think is very impressive. Here’s the DAX query Copilot generated:

// DAX query generated by Fabric Copilot with "Group all transactions from the Transactions table into four groups named A, B, C, and D based on the Price column, where group A includes transactions with Price up to £300,000, group B includes transactions with Price above £300,000 and up to £750,000, group C includes transactions with Price above £750,000 and up to £900,000, and group D includes transactions with Price above £900,000. Filter the transactions to just 16th January 2025. Show the count of transactions in each group."
DEFINE
// Create a new calculated column at query time 
// to classify transactions into groups A, B, C, and D based on the Price column
COLUMN 'Transactions'[Group] =
    // Define the upper boundaries for each group
    VAR AUpperBoundary = 300000
    VAR BUpperBoundary = 750000
    VAR CUpperBoundary = 900000
    RETURN
        SWITCH(
            TRUE(),
            'Transactions'[Price] <= AUpperBoundary, "A (<=£300,000)",
            'Transactions'[Price] <= BUpperBoundary, "B (>£300,000 and <=£750,000)",
            'Transactions'[Price] <= CUpperBoundary, "C (>£750,000 and <=£900,000)",
            "D (>£900,000)"
        )
        
EVALUATE
    SUMMARIZECOLUMNS(
        'Transactions'[Group],
        // Filter transactions to include only the 16th January 2025
        KEEPFILTERS( TREATAS( { DATE(2025,1,16) }, 'Date'[Date] ) ),
        "Count Of Transactions", [Count Of Transactions]
    )
ORDER BY 
    'Transactions'[Group] ASC

Tools that convert natural language to SQL, like Fabric Data Agents, all allow you to specify example SQL queries like this so it’s hardly a surprise that the same approach applies to DAX. What’s interesting is the type of problems that giving Copilot templates of DAX measures and queries allows you to solve, and I’m only just starting to think of the possibilities.

Memory Overhead Of Distinct Count Measures In Power BI

The series of blog posts I wrote last year on semantic model memory usage, in particular this post on the query memory limit and the “This query uses more memory than the configured limit” error in Power BI, gets a lot of traffic. Since writing that post on the query memory limit I’ve written a few follow-ups on common mistakes that lead to increased query memory usage, such as this one on measures that never return a blank. Today’s post is sort of in that series but it isn’t about a design mistake – it’s just to point out that distinct count measures can be surprisingly memory-hungry.

To illustrate this I built a semantic model consisting of a single table with two columns and 99,999,000 rows, published it and ensured the Large Semantic Model format was enabled:

I created two measures:

Number Of Rows = COUNTROWS('MyTable')
Distinct Customers = DISTINCTCOUNT(MyTable[CustomerID])

Here’s what the model metrics looked like in DAX Studio:

The total model size in memory was 255MB.

I then ran the following DAX query to get the number of rows in the table for each of the 1800 dates in the Date column:

EVALUATE
SUMMARIZECOLUMNS(
    MyTable[Date],
    "Number Of Rows", [Number Of Rows]
)

The approximatePeakMemConsumptionKB from the Execution Metrics Profiler event showed that this query consumed 2109KB – so not much.

Then I ran a similar query to get the number of distinct customer IDs by date:

EVALUATE
SUMMARIZECOLUMNS(
    MyTable[Date],
    "Distinct Customers", [Distinct Customers]
)

The approximatePeakMemConsumptionKB metric for this query was 800325KB – so a lot more than the previous query. In fact even though this model was well under the 1GB size limit for a model not in Premium capacity, the query here used a lot more memory (782MB) than the size of the model itself in memory and it came close to the 1GB limit on the amount of memory a query can consume when the model is not in Premium capacity.

Is there something wrong here? Can the query or model be tuned to reduce memory usage? Not really, no – distinct count queries are almost always more memory intensive than other types of measures. I tested a number of different things such as forcing the use of hash encoding on the CustomerID column, partitioning (Phil Seamark suggested creating one partition for each of the 1800 dates and actually that did reduce memory consumption but it also made the queries extremely slow), changing the ordering of the source data to change how well each column was compressed, calculating the distinct count using the SUMX method, and nothing resulted in lower query memory usage.

What I did find for the model above was that the number of rows returned by the query influenced the memory consumption of the query. So reducing the number of dates returned on rows in my DAX query from 1800 to 366 resulted in approximatePeakMemConsumptionKB going down to 200278KB. So if you’re running into memory errors when running queries with distinct count measures the first thing you should ask yourself is whether you need to show so many distinct counts: I recently ran into this problem with a customer that wanted to plot a line chart of distinct values with dates on the x axis, and we solved the problem by only plotting one day per week for the time period shown on the chart instead of every date. The chart looked almost identical, the DAX query was a lot faster and the memory usage of the DAX query was a lot lower. Distinct count measures combined with table visuals with lots of rows can be dangerous.

The other thing you can do is see if you can remodel your data to turn a distinct count into a count because, as shown above, counts are a lot faster and memory efficient than distinct counts. For example, if you have a fact table containing line items for orders and you need to find the distinct count of order ids, then consider creating a second fact table at the order granularity so you can count the number of rows in it to find the number of distinct orders. This may increase the size of your model but it should certainly reduce your query memory consumption for many queries because you won’t need to do a distinct count.

Limit The Impact Of Expensive Power BI Queries On Your Capacity By Reducing The Query Timeout

The recent announcement of Surge Protection gives Fabric/Power BI capacity admins a way to restrict the impact of background operations on a capacity, preventing them from causing throttling. However, at the time of writing, Surge Protection does not prevent users that are running expensive DAX or MDX queries – which are interactive operations – from causing problems on your capacity. Indeed, right now, there is no direct way to stop runaway queries from consuming a lot of CUs, although there is something you can do which will help a lot: reducing the query timeout.

Surge Protection doesn’t address the problem of expensive queries yet because Power BI only knows the CU usage of a DAX or MDX query when it has finished running – by which time it’s too late to do anything about it. In many cases, though, DAX or MDX queries that consume a lot of CUs are also slow. Therefore reducing the query timeout, which will kill any query that runs longer than a specified duration, will stop these queries from consuming so many CUs.

There are two default query timeouts that you should be aware of in Power BI. First, all DAX queries generated by a Power BI report have a 225 second timeout applied by the report itself. This timeout can be changed in Power BI Desktop but it cannot be changed on a published report in the Power BI Service. Second, you can set a timeout at the capacity level by changing the Query Timeout property in the admin portal. The default setting here is 3600 seconds (one hour). Unlike the first timeout, which only applies to the DAX queries generated by a Power BI report, this timeout applies to all queries run on any semantic model associated with the capacity, including the MDX queries generated by Excel PivotTables via Analyze In Excel. Setting this second timeout to less than 225 seconds means that it will take precedence over the first timeout. Therefore it’s the Query Timeout property on your capacity that you should set.

Hitting a timeout in a Power BI report will give the user a “Query has exceeded the available resources” error; clicking See Details/More Details will give you a message like this:

The XML for Analysis request timed out before it was completed. Timeout value: 10 sec.

Hitting the query timeout in an Excel PivotTable will give you the same message:

What value should you set the Query Timeout to? In my opinion no query should ever run for more than 30 seconds because anything slower will result in a poor experience for your end users – no-one wants to sit around for ages waiting for a report to render. I also think it should be possible to tune any semantic model so all queries run under 30 seconds if you know what you’re doing. That said, in the real world, setting a timeout of 30 seconds may be unrealistic: developers may not have the skills to tune their semantic models. As a result I find a timeout of 100 seconds is often a good compromise but you should experiment with different timeouts to see what the minimum value you can get away with is.

It’s important to note that reducing the query timeout will not stop every expensive query. This is because it’s perfectly possible to have very fast queries that consume a lot of CUs – for example when distinct count measures are used, and/or when there are very large data volumes and/or when there are complex but highly-optimised measures. Also there relatively rare cases where a query will carry on running beyond the duration specified by the timeout, because the Vertipaq engine only checks if the timeout has been exceeded at certain points in the code and depending on the query there could be several seconds (sometimes more) between these checks. Equally, some very slow queries may not use a lot of CUs and having them time out might cause unnecessary disruption. Overall, though, in my experience setting a timeout will stop enough expensive queries to make doing so worthwhile.

[Update: my colleague Akshai Mirchandani has just reminded me that you can also set the Query Timeout at the workspace level as a Server Property using SQL Server Management Studio, as detailed here. The property is called ServerTimeout. This gives you more flexibility than setting it for the whole capacity.]

Why DAX Is Better Than MDX For Bulk Extracts Of Data From Power BI

This is a post I’ve avoided writing for many years, and before I carry on let me make one thing clear:

Doing bulk extracts of data from a Power BI semantic model is a **really** bad idea

My colleague Matthew Roche wrote a great post on this topic a couple of years ago that is still relevant: using Power BI (or Analysis Services) as a data source for other systems, including other Power BI Import mode semantic models, is an anti-pattern. Power BI is optimised for small, analytical queries that return the amount of data that can be visualised on a single page. It is not optimised for queries that return millions of rows. Running this kind of query on a Power BI semantic model will be slow, is likely to run into timeouts and memory errors, and is also likely to cause CU spikes – and perhaps throttling – on a Premium capacity. If you want the data from a semantic model it’s much better to go back to the original data sources that the semantic model uses.

But

People still use Power BI semantic models as data sources all the time. This is either because they don’t know any better, because they can’t get access to the underlying data sources, or because they want to get the result of any DAX calculations on the model.

So

If you do need to extract large amounts of data from a semantic model I have one important piece of advice: write a DAX query to get the data and not an MDX query. There are two reasons for this:

  • Writing a DAX query to get granular data is usually a lot simpler than writing an MDX query
  • DAX queries that return large amounts of data are typically faster (and so less likely to hit timeouts), more CPU efficient (and therefore less likely to cause throttling on a capacity) and more memory efficient (and so less likely to cause memory errors)

The bad news is that the two client tools most often used to bulk extract data from Power BI, Excel PivotTables and Power Query using the Analysis Services connector and its query builder, generate MDX queries. What’s more, they don’t always generate the most efficient MDX queries either.

Let’s see an example. I have a semantic model in a Premium workspace with a table called Property Transactions with around a million rows in it. I connected to the model via the XMLA Endpoint using the “From SQL Server Analysis Services Database (Import)” option in Power Query in Excel:

…and then created a query to get the data from all the columns on the Property Transactions table plus one measure, called Count of Sales, using Power Query’s query builder:

While the query builder generated the MDX for me, you can see that it was not a simple query:

I ran a Profiler trace while this query ran and from the Execution Metrics I saw that:

  • The query took 54 seconds to complete
  • CPU Time was also 54 seconds
  • The approximate peak memory usage of the query was 626292KB

I then created a second Power Query query that used the following DAX query to get the same data, which I think you’ll agree is much more straightforward:

EVALUATE 
ADDCOLUMNS('Property Transactions', "Count of Sales", [Count of Sales])

[You have the option of entering a customer MDX or DAX query when you create your Power Query query]

This time, Execution Metrics showed me that:

  • The query took 6 seconds to complete
  • CPU Time was 6 seconds too
  • The approximate peak memory usage was 142493KB

So the DAX query was simple to write and maintain, took 11% of the time that the MDX query to run, used 11% of the CPU and 22% of the memory. That’s a big improvement. Even though I might be able to rewrite the MDX generated by Power Query to be more efficient there’s no way it would be as simple or as efficient as the DAX query.

[Thanks to Akshai Mirchandani for the information in this post]

TMDL View And Power BI Developer Productivity: An Example Using The Detail Rows Definition Property

For me the biggest new feature in the January 2025 release of Power BI Desktop is the new TMDL View; many other people like Marco are excited about it too. For more advanced Power BI developers (and honestly, I don’t think you need to be that advanced to get value out of it) it makes certain editing tasks for semantic models much simpler, and while I won’t be abandoning the main Power BI Desktop UI completely or stopping using external tools like Tabular Editor it is something I see myself using on a regular basis from now on.

One of the things it allows is the editing of semantic model properties and features that are not exposed by the Power BI Desktop UI but which are nonetheless supported by the engine, and which up to now you’d have had to use Tabular Editor to set. The announcement blog post mentions a few of these – perspectives (useful for the Personalize Visual feature for example) and the isAvailableInMdx property – but my favourite underused property is the Detail Rows Definition property of a measure, also known as Detail Rows Expression. If you have end users querying your model using Analyze In Excel it allows you to customise the results returned by an Excel PivotTable’s Show Details feature; as you might expect Marco and Alberto have an excellent, detailed article on it here. Setting this property allows you to control which columns and rows (so the correct rows are shown for non-trivial measures, as I described here) are returned, and if you can educate your users to use Show Details it can perform a lot better than a gigantic PivotTable to show detail-level data from your model.

What does the workflow for setting this property on a model in Power BI Desktop look like now? What benefits do TMDL View and all the other recent pro developer enhancements in Desktop bring for someone like me? Let’s say I have a measure called Count Of Sales in my semantic model and that I want to customise the columns and their names that are returned by Show Details for this measure. The Detail Rows Definition property takes a DAX expression that returns a table so the first step is to write that expression; now that we have DAX Query View I can do that without leaving Power BI Desktop and because I’m lucky enough to have access to Power BI Copilot (one of the perks of working for Microsoft) I can use that to write my DAX expression easily. I gave Copilot the prompt:

Write a query that gives me the Date, County, Town and Postcode columns columns from the Property Transactions table along with the Count of Sales measure. Rename the Date column to be "Sales Date".

…and it immediately gave me the DAX query I wanted without needing to faff around looking up the syntax to the SELECTCOLUMNS() function:

EVALUATE
  SELECTCOLUMNS(
    'Property Transactions',
    "Sales Date", 'Property Transactions'[Date], // Renaming Date column to Sales Date
    "County", 'Property Transactions'[County],
    "Town", 'Property Transactions'[Town],
    "Postcode", 'Property Transactions'[Postcode],
    "Count of Sales", [Count Of Sales] // Including the Count of Sales measure
  )

Next, I copied the DAX query minus the EVALUATE statement, switched over to the TMDL View pane, dragged and dropped the Count of Sales measure into a Script pane:

And then, underneath the existing measure definition, started typing detailRowsDefinition – the intellisense picked up what I was typing before I even had to finish:

I then tried to paste the DAX query into TMDL View and realised it didn’t like line breaks. Rather than trying to look up how to do this with the editor – which I’m sure is possible – I just switched back to DAX Query View, highlighted the query, entered the prompt:

format this query so there are no line breaks

…and it did the job for me! I copied the DAX table expression again and pasted it into TMDL View after the detailRowsDefinition property:

[I was pleasantly surprised that I didn’t have to escape any characters or mess around with double quotes]

Next I hit the Apply button and tested Analyze in Excel with a PivotTable and Show Details:

And bingo, I got the results I wanted:

Was all of this possible before? Absolutely. Is it much quicker and easier now with TMDL View, DAX Query View and Copilot? Absolutely. I’m a pretty experienced Power BI developer and I could certainly have written the DAX expression without Copilot, I have DAX Studio installed (which has pretty good query builder too) to write and test the query, and I have Tabular Editor to set the property. But being able to do all this just using Power BI Desktop makes me so much more productive.

Finding The Size Of All Of The Columns Touched By A DAX Query

I had meant to follow up my recent post on how to find the columns touched by a DAX query by writing one on how to use this technique to find the size of these columns in memory, so you can find the total size of the columns that need to be paged into memory when a DAX query runs on a Direct Lake semantic model. Before I could do that, though, my colleague Michael Kovalsky messaged me to say that not only had he taken the query from that first post and incorporated it in Semantic Link Labs, he’d done the work to get column sizes too. All that’s left for me to do, then, is give you some simple examples of how to use it.

To use Semantic Link Labs you just need to create a new Fabric notebook and install the library:

%pip install semantic-link-labs

After that you can use sempy_labs.get_dax_query_dependencies to get the columns touched by any DAX query, for example:

import sempy_labs as labs
labs.get_dax_query_dependencies(
        dataset = 'InsertSemanticModelName',
        workspace = 'InsertWorkspaceName',
        dax_string = "InsertDAXQuery",
    )

This returns a dataframe with one row for each column touched by the query, plus various statistics about the size of each column in memory.

If you’re working with a Direct Lake semantic model, though, in order to get the correct sizes of each column in memory the query itself will need to have been run beforehand; you can ensure that this happens by setting the optional parameter put_in_memory to True:

import sempy_labs as labs
labs.get_dax_query_dependencies(
        dataset = 'InsertSemanticModelName',
        workspace = 'InsertWorkspaceName',
        dax_string = "InsertDAXQuery",
        put_in_memory = True
    )

Last of all, if you don’t want a dataframe but just want a single number representing the total memory needed by all columns touched by a query, you can use sempy_labs.get_dax_query_memory_size, for example like this:

import sempy_labs as labs
labs.get_dax_query_memory_size(
        dataset = 'InsertSemanticModelName',
        workspace = 'InsertWorkspaceName',
        dax_string = "InsertDAXQuery"
    )

Yet more evidence that, for any Power BI user, Semantic Link and Semantic Link Labs are the best reasons for you to flip the switch to enable Fabric. To find out more about what they are capable check out this user group presentation.