Visualising Power BI Performance Analyzer Data With A Vibe-Coded Custom Visual

Performance Analyzer is now available in the browser, not just in Power BI Desktop! Actually everyone got excited about this back in September when it was announced and then forgot about it because it didn’t appear immediately, but now if you are in Edit mode for a published report you can see Performance Analyzer is there in the View menu. Why should you care though? Personally, while I’ve used Performance Analyzer in Desktop often enough over the years to capture DAX queries for performance tuning but I’ve always used DAX Studio, Profiler and Workspace Monitoring/Log Analytics for most of my performance tuning work. In part this is because Performance Analyzer was only available in Desktop – and the performance and behaviour of Power BI reports can be substantially different in Desktop compared to when they are published, and published reports are what your users care about. Now that Performance Analyzer is available for published reports I thought it was time to take another look at it, specifically at the detailed information it gives you when you export its data to json (which is documented here), and write a series of posts on when it can be useful for troubleshooting performance issues.

It’s very easy to use Power Query to load this export data into Excel or Power BI for analysis, so easy it wasn’t ever worth writing a blog post about it. Visualising this data is another matter because none of the Power BI native visuals are suited to the problem and indeed I was never able to find a custom visual that did the job satisfactorily either; but visualising this data is essential to understanding it properly because of the parent/child relationships between events. I really needed to build my own custom visual or use a tool like Deneb to do the job, but I didn’t have the time or skills to do so. However, a few months ago Phil Seamark showed me how to use GitHub Copilot to create custom visuals (see his blog here) and after a few hours of playing around I had something I was happy with.

You can download a sample pbix file with a Power Query query that extracts the data from a Performance Analyzer json export file and visualises it with my custom visual here. I’m not going to publish the code for any of this officially (at least not yet) because it’s still very much a rough draft; as I write more posts in this series I’ll know I’ll need to fix bugs and add functionality to the M code and the visual so things will change a lot. There are also some quirks in the data I need to understand better. Why am I writing this post if I’m not sharing the code, you may ask? I want to explain how I’m visualising Performance Analyzer data when I show screenshots of me doing so in future posts in this series. The point of the series will be to troubleshoot performance problems; the fact I’m doing so using a hacky, vibe-coded custom visual is important for context but it’s not the main aim. It’s also a great example of how AI enables a completely new type of workflow and allows someone like me to do stuff I couldn’t do before.

To illustrate what I’ve got so far, I built a report from an Import mode model with a slicer, a line chart, a scatter plot and an Azure Map visual then recorded events in Performance Analyzer in the browser when I changed the slicer selection:

After exporting the Performance Analyzer data to json and importing that data into my pbix file, here’s what my custom visual showed:

The whole interaction took around 0.7 seconds; scrolling down shows the events for each visual grouped together with the first event for a visual being its Visual Container Lifecycle event and other events relating to things like rendering and queries being run displayed underneath. A grey horizontal line marks the end of a visual’s events. I noticed that in this case some of the events associated with a visual seem to take place after the associated Visual Container Lifecycle event has finished and I think this is because these are events that are executed on different physical machines which may have clocks that are slightly out of sync – something that is called out in the docs.

I also used tooltips in the custom visual to display information for specific events like the DAX query:

That’s enough for now. In the next posts in this series I’ll show an example of how you can use Performance Analyzer and this visual to help troubleshoot specific problems.

Power BI Copilot And Report Filters And Slicers

In my last post I talked about how to push Power BI Copilot to get answers from the semantic model rather than the report you’re looking at. If you want to do this you are probably getting worse answers when Copilot goes to the report than the semantic model; before you try to bypass the report, though, it’s worth spending some time tuning how Copilot works with reports and to do that you need to understand how it works. In this post I will describe one important aspect of this that I’ve recently learned about: how Copilot behaves when filters and slicers are present on a report.

Using the same semantic model I’ve just in all my recent posts on Copilot, I created a report with a single card visual on showing the value of a measure called Count of Transactions with no other visuals or filters:

Using the prompt:

What is the value of Count of Transactions?

Gives the same value shown in the card, as you would expect:

The fact that the result comes in text form and the presence of a citation (the [1] at the end of the response which, when you click it, spotlights the card visual) tells me that Copilot answered this question using data from the report. Changing the prompt to filter by a County, like so:

What is the value of Count of Transactions for the county Devon?

…now gives me a result in the form of a visual:

This indicates that the result came from the semantic model because it could not be derived from the report.

What if the County field is added to the report as a slicer like so?

The second prompt above now gives the same answer but in a different way:

This time the textual answer and the presence of a citation shows that Copilot derived the response from the report. Clicking on the citation now not only spotlights the card visual but also shows that Copilot selected the county Devon in the slicer to get that result:

Also when you click on the citation in this response in the Service (but not in Desktop) a message is displayed at the top of the report telling the user “Copilot filters temporarily applied”:

The same thing happens if there is no slicer but if basic page and report level filters (but not visual level filters – the docs explicitly call out that that this is not supported and it looks like there is a bug here at the moment that results in incorrect results) are present. Here’s the report edited to remove the slicer and replace it with a page-level filter:

And here’s what the second prompt above returns for this new version of the report, and the citation is clicked to spotlight the card visual so it shows the result:

What’s more, editing the report so the filter is an Advanced Filter on the first letter of the name of the County like so:

…means that prompts like this:

Show the Count of Transactions for counties whose name begins with the letter H

…can be answered from the report too. Here’s the response to the prompt above with the citation clicked, the card spotlit and the new filter applied by Copilot shown:

I’m sure I’ve seen all this happen a hundred times but it’s only now that I’ve done these tests that I understand this behaviour, and now I understand it I can use it to design reports that work better with Copilot and troubleshoot problems.

[Thanks to Carly Newsome for telling me about this]

Stopping Power BI Copilot From Answering Questions From Report Visuals

When you ask Power BI Copilot a data question, the first thing it will do is try to answer that question using information from report visuals; if it can’t find the answer on a report page it will then go on try to build a new visual or generate a DAX query. Most of the time you’ll find that answering the question from data already displayed on a report is the method that is most likely to give you the correct answer, but occasionally – depending on the report, the measure and the filters applied to the visual – it can result in incorrect answers. In those situations you can use AI Instructions to influence how Power BI Copilot answers questions.

Consider the following report built on the semantic model I have used for most of my recent posts on Copilot containing real estate price data from the UK Land Registry:

There are two measures displayed in the visuals here: Count Of Transactions and Average Price Paid. Asking questions whose answers are clearly displayed on the page such as:

What is the Count Of Transactions?
What is the Average Price Paid for Flats?

…means that Copilot gets those answers from the report, as you would expect:

You can tell that the question has been answered from a report visual by the presence of Citations (underlined in red in the screenshot above) in the answers which point back to the visual used.

In this case the answers are both correct but let’s pretend that the answer to the question about the Average Price Paid for Flats is not and you want Copilot to bypass the bar chart visual when generating its answer. In this case you can use an AI Instruction like this:

If a user asks a question about the Average Price Paid measure, ignore any visuals on report pages and do not use them to answer the question because they may give a misleading answer. Instead, always generate a new visual to answer the question.

After applying these AI Instructions, the question:

What is the Average Price Paid for Flats?

…is now answered with a new card visual:

In this case Copilot has now ignored the visual on the page containing the answer and instead gone to the semantic model.

While this is a useful trick to know, if you find Copilot is not giving you the results you expect when it answers questions using report content it’s much better to try to understand why that’s happening and tune your report appropriately before trying to bypass the report altogether. How you do that is a topic I will address in a future post.

[Thanks to Celia Bayliss for the information in this post]

A Look At The Impact Of Calendar Based Time Intelligence On Power BI DirectQuery Performance

Calendar-based time intelligence (see here for the announcement and here for Marco and Alberto’s more in-depth article) is at least the second-most exciting thing to happen in DAX in the last few months: it makes many types of time intelligence calculation much easier to implement. But as far as I know only Reid Havens, in this video, has mentioned the performance impact of using this new feature and that was for Import mode. So I wondered: do these benefits also apply to DirectQuery mode? The answer is on balance yes but it’s not clear-cut.

To illustrate what I mean, I built a simple DirectQuery model against the Adventure Works DW sample database in SQL Server:

This model used the old “Mark as date table” time intelligence.

Here are the definitions of some of the measures:

Sales Amount =
SUM ( 'Internet Sales'[SalesAmount] )

YTD Sales Amount =
CALCULATE ( [Sales Amount], DATESYTD ( 'Date'[FullDateAlternateKey] ) )

PY YTD Sales Amount =
CALCULATE (
    [YTD Sales Amount],
    SAMEPERIODLASTYEAR ( 'Date'[FullDateAlternateKey] )
)

The Sales Amount measure returns the sum of the values in the SalesAmount column; the YTD Sales Amount finds the year-to-date sum of Sales Amount; and PY YTD Sales Amount finds the value of this measure in the same period of the previous year.

I then created a matrix visual showing the PY YTD Sales Amount measure with EnglishProductName from the Product dimension on columns and CalendarYear and EnglishMonthName from the Date dimension on rows:

I copied the DAX query for this visual from Performance Analyzer, pasted it into DAX Studio and then ran it on a cold cache with Server Timings enabled. Here’s what Server Timings showed:

A total duration of 1.9 seconds and 5 SE queries doesn’t look too bad. But here are the Execution Metrics for this query with some important metrics highlighted:

{ 
"timeStart": "2025-11-29T17:17:16.461Z", 
"timeEnd": "2025-11-29T17:17:18.350Z",  
"durationMs": 1890, 
"datasourceConnectionThrottleTimeMs": 0, 
"directQueryConnectionTimeMs": 24, 
"directQueryIterationTimeMs": 166, 
"directQueryTotalTimeMs": 1681, 
"externalQueryExecutionTimeMs": 1493, 
"queryProcessingCpuTimeMs": 16, 
"totalCpuTimeMs": 828, 
"executionDelayMs": 3,  
"approximatePeakMemConsumptionKB": 20977,  
"directQueryTimeoutMs": 3599000, 
"tabularConnectionTimeoutMs": 3600000,  
"commandType": "Statement", 
"queryDialect": 3, 
"queryResultRows": 1613, 
"directQueryRequestCount": 5, 
"directQueryTotalRows": 33756 
}

The important thing to notice is that while the DAX query returns 1613 rows (see the queryResultRows metric) the SQL queries generated for that DAX query return 33756 rows between them (see the directQueryTotalRows metric). Why the big difference? This is because to do the year-to-date calculation using the old time intelligence functionality, Power BI has to run a query at the date granularity, which explains why there are so many more rows returned by the SQL queries. For example here’s a snippet of the last SQL query generated:

Yuck. What’s more, bringing this number of rows from the source can be time-consuming and even after these rows have made it to Power BI, they need to be iterated over (see the directQueryIterationTimeMs metric of 166ms) and aggregated up to get the final result of the calculation. This requires memory (see the approximatePeakMemConsumptionKB metric of 20977KB) and CPU (see the totalCpuTimeMs metric of 828ms) as well as adding to the overall duration of the DAX query.

I then created a copy of this model and set up a calendar using the new calendar-based time intelligence feature like so:

I then modified the measures above to use this new calendar:

Sales Amount =
SUM ( 'Internet Sales'[SalesAmount] )

YTD Sales Amount =
CALCULATE ( [Sales Amount], DATESYTD ( 'Gregorian' ) )

PY YTD Sales Amount =
CALCULATE ( [YTD Sales Amount], SAMEPERIODLASTYEAR ( 'Gregorian' ) )

I then reran the same DAX query from my matrix visual in DAX Studio for this model. Here are the Server Timings:

The good news is that the query is now much faster: 0.5 seconds instead of 1.9 seconds. But there are more SE queries! I’m told this is because some fusion optimisations (this presentation by Phil Seamark is an excellent introduction to this subject) haven’t yet been implemented for the new calendar-based time intelligence functionality yet, which means more SQL queries are generated than you might expect. Indeed some of the SQL queries run are identical. And since there is a limit on the number of connections that Power BI can use to run SQL queries in DirectQuery mode, and since you can run into performance problems when you hit those limits (see here for more details), then more SQL queries can be a bad thing – especially when there are many visuals on a page or a lot of concurrent users using the same semantic model.

However there is more good news if you look closely. Here are the Execution Metrics for this second run:

{ 
"timeStart": "2025-11-29T17:34:56.223Z", 
"timeEnd": "2025-11-29T17:34:56.754Z",  
"durationMs": 531, 
"datasourceConnectionThrottleTimeMs": 0, 
"directQueryConnectionTimeMs": 41, 
"directQueryIterationTimeMs": 38, 
"directQueryTotalTimeMs": 465, 
"externalQueryExecutionTimeMs": 410, 
"queryProcessingCpuTimeMs": 16, 
"totalCpuTimeMs": 141, 
"executionDelayMs": 0,  
"approximatePeakMemConsumptionKB": 3812,  
"directQueryTimeoutMs": 3600000, 
"tabularConnectionTimeoutMs": 3600000,  
"commandType": "Statement", 
"queryDialect": 3, 
"queryResultRows": 1613, 
"directQueryRequestCount": 11, 
"directQueryTotalRows": 3369 
}

Even though there are more SQL queries now the total number of rows returned by them is much less: the directQueryTotalRows metric is only 3369, so about 10% of what it was before. Why? Because instead of having to go down to the date granularity to do the calculations, the new calendar-based time intelligence functionality allows Power BI to do the calculation at the month granularity. Here’s a snippet of one of the SQL queries generated that shows this:

This in turn means that directQueryIterationTimeMs (now only 38ms), totalCpuTimeMs (now only 141ms) and approximatePeakMemConsumptionKB (now only 3812KB) are all much less than before. Also, this could mean you’re less likely to run into the Max Intermediate Row Set Count limit on the maximum number of rows that a DirectQuery SQL query can return and it opens up more opportunities to use aggregations to improve performance.

As a result, if you’re running into query performance, CU or memory-related problems in DirectQuery mode, you should experiment with using the new calendar-based time intelligence feature to see if it can help even if it results in more SQL queries being generated. Hopefully when those fusion optimisations are implemented in the future the benefits will be even greater.

Finally, it’s also worth mentioning that using Visual Calculations or Window functions (as discussed here) have very similar benefits when tuning DirectQuery mode, so you should check them out too and consider using them in combination with calendar-based time intelligence.

Power BI, Build Permissions And Security

If there is sensitive data in your Power BI semantic model that you don’t want some users to see then you need to use row-level security or object-level security to control access to that data. You’re an experienced Power BI developer – you know that, right? But what about Build permissions? If an end-user only has access to a report you’ve built and doesn’t have Build permissions on the underlying semantic model, and if there’s no other security on the semantic model, can they access data in the semantic model that isn’t visible in the report? The answer is potentially yes: you can’t rely on Build permissions for security.

Ever since Power BI was released there have been people who have published a semantic model and report, not used RLS or OLS, and then been surprised and upset when end users have been able to see all the data in the semantic model. A few years ago I wrote a blog post on one way this was possible, the “Show Data Point As Table” feature, and recent changes such as users with View permissions being able to use the Explore feature and the rise of Copilot have caused similar situations. But the fundamental truth is that RLS and OLS have always been necessary to make sure data is secure. Indeed if you check out the documentation for semantic model permissions you’ll see a note explaining that Build permissions are not a security feature:

Build permission is primarily a discoverability feature. It enables users to easily discover semantic models and build Power BI reports and other consumable items based on the discovered models, such as Excel PivotTables and non-Microsoft data visualization tools, using the XMLA endpoint. Users who have Read permission without Build permission can consume and interact with existing reports that have been shared with them. Granting Read permission without Build permission should not be relied upon to secure sensitive data. Users with Read permission, even without Build permission, are able to access and interact with data in the semantic model.

Build permissions do allow you to control whether users can create their own reports in Power BI Desktop or Excel and that’s useful when you want to stop the proliferation of reports, but it’s not security. I understand this is news to a lot of people – especially to many self-service developers who don’t read the docs closely – but that’s why I’ve written this post. If you have published semantic models containing sensitive data and were assuming that users would only be able to see the data displayed in reports then you need to go and implement RLS and OLS in your semantic model right now.

Incidentally, this is also why you are never going to see true page-level security implemented in Power BI reports even though a lot of people ask for it. The semantic model, not the report, is where security is applied in Power BI. All those blog posts and videos out there showing “workarounds” for page-level security are misleading because they are not truly secure. There’s a saying that “you can’t be half pregnant” and the same is true for security: you’re either secure or you aren’t and security-through-obscurity isn’t security. Whether or not a user can see a report page is irrelevant, what is important is whether the user can access the data in the semantic model.

Linking Queries Run From DAX Studio To Workspace Monitoring And The Capacity Metrics App

A few weeks ago I wrote a blog post about how you can now link event data from Workspace Monitoring to data in the Fabric Capacity Metrics App using OperationId values. In the latest (3.4.0) release of DAX Studio there’s a new feature that you might have missed that link queries run from there to Workspace Monitoring and the Capacity Metrics App in the same way.

I connected DAX Studio to a published Power BI semantic model, turned on Server Timings and ran a DAX query. When the query finished I clicked the Info button at the top of the Server Timings pane, which opened the Query Information dialog shown below, and then I copied the Request ID value from there:

The Request ID is the same thing as the OperationId value in Workspace Monitoring and the Capacity Metrics app. I was then able to use this value in a KQL query in Workspace Monitoring like so:

SemanticModelLogs
| where Timestamp > ago(1h)
| where toupper(OperationId) == "5D4A4F47-370D-4634-B67B-E4B58CB067A8"
| project Timestamp, OperationName, OperationDetailName, EventText, Status, StatusCode
| order by Timestamp asc

[Note that you need to convert the OperationID value in Workspace Monitoring to uppercase to match it to the Request ID from DAX Studio]

What’s the value of doing this? Probably not much for an individual query because the information you see in Workspace Monitoring is exactly the same as you get in Server Timings in DAX Studio, and DAX Studio shows this information in a way that’s much easier to understand. If you’re testing a number of DAX queries, though, then having all this data available in Workspace Monitoring means you can do detailed comparisons of different runs using KQL.

What is really useful is being able to find the CU usage on a capacity of a DAX query run from DAX Studio – something that is very important when tuning DAX queries. While DAX Studio (and Workspace Monitoring) will give you a CPU Time value for a DAX query, for reasons I won’t go into here you won’t be able to reverse engineer the algorithm that converts CPU Time to CUs. However by linking the RequestId from DAX Studio to the OperationIds seen on the Timepoint Detail and the new Timepoint Item Detail page in newer versions of the Capacity Metrics app (as discussed here) you don’t need to care about that. You can simply take the Request ID from DAX Studio and find it in the Capacity Metrics app. For example on the Timepoint Item Detail (preview) page you can either find the value in the OperationId slicer or add the OperationId column to the lower table using the Select Optional Column(s) slicer and look for it there:

Calling DAX UDFs From Power BI Copilot

Can you call a DAX UDF from Power BI Copilot? I was asked this question by Jake Duddy during the livestream on Power BI Copilot I did with Reid Havens last week. I already knew it was possible because one of the customers I work with had already tried it, but I hadn’t tried it myself. So I did, and it is possible, and here’s the blog post.

A few months ago I wrote a post about how you can put template DAX queries in your AI Instructions to show Copilot how to solve more complex problems that can only be solved with a custom DAX query. I took some of the code from that post and turned it into the following DAX UDF:

createOrReplace

	function ABC = ```
			(
				AUpperBoundary: SCALAR int64,
				BUpperBoundary: SCALAR int64,
				AnalysisDate: SCALAR datetime
			) => 
			VAR ApplyAnalysisDate = 
			CALCULATETABLE(
				'Transactions',
				'Date'[Date] = AnalysisDate
			)
			VAR AddGroupColumn = 
			ADDCOLUMNS(	
				ApplyAnalysisDate, 
				"Group",
				SWITCH(
					TRUE(),
					//If the price is less than or equal to AUpperBoundary
					//then return the value "A"
					Transactions[Price]<=AUpperBoundary, "A (<=£" & AUpperBoundary & ")",
					//If the price is less than or equal to BUpperBoundary
					//then return the value "B"
					Transactions[Price]<=BUpperBoundary, "B (>£" & AUpperBoundary & " and <=£" & BUpperBoundary & ")",
					//Otherwise return the value "C"
					"C (>£" & BUpperBoundary & ")"
				)
			)
			RETURN
			SUMMARIZE(
			    AddGroupColumn,
				[Group],
			    "Count Of Transactions", [Count Of Transactions]
			)

This UDF does a basic form of ABC analysis on the semantic model I’ve used in all my recent Copilot posts containing UK Land Registry data on real estate transactions:

Note: this is not great quality code and it’s certainly not a general purpose solution for ABC analysis in DAX, but it’s a UDF. I then added the following to my AI Instructions describing what the UDF does and how to use it:

This semantic model contains a DAX user-defined function called ABC that does an ABC analysis on the data in the Transactions table. It takes three parameters defined as follows:
AUpperBoundary - an integer value which is the upper boundary of transactions in the A group
BUpperBoundary - an integer value which is the upper boundary of transactions in the B group
AnalysisDate: a datetime value which is the date to filter transactions on
The function returns a table which can be used in an EVALUATE statement in a DAX query.

For example if I wanted to see the number of transactions which took place on 1st January 2025 divided into three groups:
A - transactions between £0 up to and including £250000
B - transactions above £250000 up to and including £700000
C - transactions above £700000
I could call the function as follows:
ABC(250000, 700000, DATE(2025,1,1))

Then I tried the following prompt:

Do an ABC analysis on the Transactions table, grouping all transactions into three groups named A, B and C based on the Price column where the upper boundary for the first group is £290000 and the upper boundary for the second group is £780000. Filter the transactions to just 14th January 2025.

Here’s the response I got:

As in that previous post, Copilot generated a DAX query to answer the question. Apart from that weird Sort Key column it’s the right answer; an additional AI Instruction telling Copilot to sort on the Group column might help here. The difference between this response and the one I got in my previous post is that this time the DAX query generated by Copilot simply calls the UDF. Here it is:

// DAX query generated by Fabric Copilot with "Perform an ABC analysis on the Transactions table, grouping all transactions into three groups named A, B, and C based on the Price column, where group A includes transactions with Price up to £290,000, group B includes transactions with Price above £290,000 and up to £780,000, and group C includes transactions with Price above £780,000. Only include transactions from 14th January 2025."
// ABC analysis for Transactions on 14-Jan-2025 with the specified price boundaries.
// A: Price <= 290,000
// B: 290,000 < Price <= 780,000
// C: Price > 780,000
// The ABC function is user-defined in the model and returns a table.
// Since the column names returned by ABC are not specified, add a dummy sort column to satisfy the ORDER BY requirement.
EVALUATE
  ADDCOLUMNS(
    ABC(290000, 780000, DATE(2025, 1, 14)),
    "Sort Key", 0
  )
ORDER BY
  [Sort Key] ASC

I guess the advantage of this over the DAX template approach is that it’s likely to be a lot more reliable and consistent: the less DAX Copilot has to generate, the more likely it is to do what you expect. On the other hand it’s a lot less flexible. For example I tried this prompt from my previous blog post which asks for an ABC analysis with four rather than three groups:

Do an ABC analysis on the Transactions table, grouping all transactions into four groups named A, B, C and D based on the Price column where the upper boundary for the first group is £300000, the upper boundary for the second group is £750000 and the upper boundary for the third group is £900000. Filter the transactions to just 16th January 2025.

I got the correct result from Copilot but the DAX query generated didn’t use the UDF because the UDF is hard coded to only return three groups; I suppose I was lucky in this case.

Based on these – admittedly fairly basic – tests I think using DAX UDFs with Power BI Copilot could be very useful when you need Copilot to generate complex measures or calculations where you know the general DAX pattern to use.

Monitoring The DAX Queries Generated When The Power BI Copilot Index Is Built

In my last post I talked about the text index that Power BI Copilot builds to help it answer data questions. You might be wondering if you can monitor the index build process and the bad news is that – at least at the time of writing – you can’t do so directly. However you can monitor it indirectly because the index build process runs DAX queries to get text values from the semantic model and you can see the DAX queries being run using Workspace Monitoring, Log Analytics or Profiler. While this is described quite well in the docs here let’s see what these DAX queries actually look like.

I published the semantic model from my last post (which has four text columns CustomerId and CustomerName from the Customers table, CustomerId and TransactionId from the Orders table, none of which are hidden or excluded from indexing using the Simplify The Data Schema feature) to a workspace where Workspace Monitoring was enabled:

I then ran the following KQL query to look at the DAX queries run in the workspace I published to:

SemanticModelLogs
| where Timestamp > ago(1hr)
| where OperationName == "QueryEnd"
| project Timestamp, EventText, DurationMs, CpuTimeMs

I saw that the index build generated three DAX queries with the same pattern:

Here’s the first of these queries:

EVALUATE
SELECTCOLUMNS (
    FILTER (
        VALUES ( 'Customers'[CustomerName] ),
        LEN ( 'Customers'[CustomerName] ) <= 100
    ),
    "valueColumn", 'Customers'[CustomerName]
)

In line with what is described in the documentation this query gets all the distinct text values from the CustomerName column on the Customer table that are less than 100 characters long. I assume there are only three DAX queries even though there are four text columns in the model because one of the columns, TransactionId, has 5 million distinct values in it and Copilot cannot currently index more than 5 million text values in a single model.

[Interestingly, I saw that one of these queries failed with a memory error which I have asked to be investigated]

Each query took between 300-600ms and there was a small gap between queries, so you can see how a large model with a lot of text columns could generate a lot of queries that all together take a noticeable amount of time to run. Finding the amount of time between the start of the first of these queries and the end of the last query will give you a rough idea of how long it takes to build the index, even though there is some extra work that needs to be done after the last query has been run which can sometimes take some time too.

As described here, indexing takes place for Import mode models with Q&A enabled every time the model is published, as well as every time it is refreshed so long as either Copilot or Q&A has been used in the last 14 days. For DirectQuery or Direct Lake models indexing also takes place every time the model is published but because it’s impossible to say whether the underlying data has changed when a refresh takes place, indexing happens instead every 24 hours so long as either Copilot or Q&A have been used in the last 14 days. As a result, if you’re an admin, you might see a lot of DAX queries similar to the one above being run.

[Thanks to Aaron Meyers for much of the information in this post]

Understanding The “Copilot Analyzed Only Part Of The Model Due To Its Size” Warning In Power BI Copilot

If you’re using Power BI Copilot you may have seen the following warning message:

Copilot analyzed only part of the model due to its size, which may affect these results

What does it mean? What causes it? What can you do to avoid it? Let’s find out.

In order to answer your questions better Power BI Copilot creates an index of the values in all the text columns in your semantic model that it thinks you’re likely to ask questions about. This indexing process is described in the docs here in great detail and I recommend you read the docs before carrying on (incidentally, another common warning message, “Copilot is currently syncing with the data model” that you might have seen is displayed while that indexing is taking place). There is a limit on the number of text values that can be indexed and when Copilot hits that limit you see the “Copilot analyzed only part of the model due to its size” warning.

At the time of writing (and this may change in the future) there are two limits on index size in place which are documented here:

  1. Copilot can only index 1000 text columns – but if you have 1000 columns in your model I would argue you’re already doing something wrong, because a model that large is going to be very difficult to use.
  2. Copilot can only index up to 5 million distinct text values across all text columns – and this is the limit you’re most likely to hit if you’re not careful.

Only text values that are less than 100 characters are indexed; longer text values are ignored and do not cause the warning to appear.

For an example of how you can run into the 5 million distinct text values limit, consider the following simple semantic model that I have designed so that the warning message appears:

Here’s what the contents of the Customers table looks like (the CustomerName column contains random text values):

Here’s what the contents of the Orders table looks like:

The Customers dimension table has 1 million rows in it; the Orders fact table has 5 million rows in it. If you ask Copilot a question about this semantic model like:

Show the Sales Value for the Customer Name WAMUQT

…it may work well but you will see the “Copilot analyzed only part of the model” message:

In other cases you might find that Copilot falls back to generating a DAX query or asks for clarification because, if it can’t find a text value you’ve used in your prompt in its index, it’s less sure about how it can answer the question:

In these cases you may find Copilot’s responses are less reliable, even if you still get correct results in many cases.

There are a couple of flaws in the design of this semantic model that either individually or combined cause that warning message to appear. Remember that it only appears when Copilot thinks it needs to index more than 5 million text values, so what is causing it to go over that limit?

First of all I deliberately made all the key columns (CustomerId on Customers and Orders, TransactionId on Orders) text columns. In this case they can easily be set to be of type Whole Number and the problem will be solved because the only remaining text column, CustomerName, contains 906925 distinct values which is well under the limit. It’s a best practice to use integer keys in Power BI whenever you can anyway. What if you can’t change the data type of your key columns though?

Well the second thing to consider is whether you need the TransactionId column on the fact table at all – primary keys on a fact table are rarely useful and can be very expensive in terms of model size, and in this case there are 5 million distinct values in that column so removing it completely will have a number of other benefits apart from reducing the overall number of text values that need to be indexed.

Thirdly, though, why are these key columns even visible? The CustomerId columns are almost certainly not useful to end users and TransactionId might not be either. Setting all these columns’ Is Hidden property to true means that Copilot does not try to index them, which means that only the CustomerName column is indexed and the warning message goes away. Again, hiding key columns is a best practice regardless of whether you’re using Copilot.

Finally, if you don’t want to hide a text column, you have another option: you can use the Simplify The Data Schema page in the Prep Data For AI dialog to prevent Copilot considering it in its responses. As I said, hidden columns are never considered anyway, but deselecting a column in this page has the same effect:

Here’s Copilot successfully answering the question above without the warning after deselecting the key columns in the Simplify The Data Schema page:

For most Power BI semantic models you should be able to use one or more of these techniques to make sure only the text columns that need to be indexed are indexed, stop the warning message from appearing, and get more reliable results from Copilot.

Monitor Fabric Costs With Fabric Cost Analysis

Following on from my blog post a few months ago about cool stuff in the Fabric Toolbox, there is now another really useful solution available there that anyone with Fabric capacities should check out: Fabric Cost Analysis (or FCA). If you have Fabric capacities it’s important to be able to monitor your Azure costs relating to them, so why not monitor your Fabric costs using a solution built using Fabric itself? This is what the folks behind FCA (who include Romain Casteres, author of this very useful blog post on FinOps for Fabric, plus Cédric Dupui, Manel Omani and Antoine Richet) decided to build and share freely with the community.

A lot of enhancements are planned for the future – I’m told there’s another big release planned for the end of October 2025 – but it’s already quite mature. Not only do you get all the ETL to extract cost data from Azure and Power BI reports you need but there’s also a Fabric Data Agent so you can query your data in natural language!

Aren’t Fabric costs straightforward? Don’t you just pay a flat fee for each capacity? In a lot of cases yes, but there can be complications: you might be pausing and resuming capacities or scaling them up or down. There can also be other costs you might not realise you’re incurring. For example if you’ve ever read any of Matthew Farrow’s excellent content on Fabric costs, such as this post on the cost implications of pausing a capacity, and wondered whether you’ve been charged extra for pausing a throttled capacity, then FCA can answer that question:

Like FUAM this is not an official Microsoft product but a solution accelerator with no official support, but it’s definitely a much better option than building something yourself or not monitoring your costs at all.