Power BI Copilot, AI Instructions And Dealing With Ambiguity

One of the most common questions I hear about Power BI Copilot is how you can stop it from guessing what a user means when they ask an ambiguous question, and instead get it to ask for clarification. This is an interesting problem because Copilot already does this and what you really want is a way to control the level of tolerance for ambiguity. What’s more, if Copilot guesses what the user means correctly you’re probably not going to complain or even notice; it’s only when it guesses incorrectly that you’re going to wish it had asked what the user meant.

Using the semantic model containing UK real estate sales data that I’ve used throughout this series of posts, with no AI Instructions added to the model, consider the following prompt:

where is the nicest place to live in England?

It’s a great example of a question that could potentially be answered from the data but only with more information on what the user means by “nicest”. I think Copilot comes back with a very good response here:

As you can see, Copilot doesn’t know what the user means by “nicest” and asks the user what criteria they want to use to determine whether a place is “nice”.

What about an example of where Copilot does make assumptions about what the user means? Take the following prompt:

show how the number of sales varied recently

This time it does come back with an answer. I think this is a good, useful response but I want you to notice that Copilot has made two assumptions here:

  1. It has interpreted “number of sales” as meaning the measure Count of Transactions
  2. It has taken the term “recently” and decided to show date-level data for the latest available month of data, April

Can this behaviour be changed? Yes, but it’s not an exact science. For example adding the following to the AI Instructions of the model:

If you don't understand what the user is asking for never, ever guess - you must always ask for clarification

Here’s what Copilot now responds to the second prompt above:

As you can see, it’s now asking what is meant by “recently”. Clarifying this as follows:

"recently" means the days in April

Gives the same result as without the AI Instructions:

BUT – even though Copilot asked what “recently” meant, it still went ahead and assumed that “number of sales” meant the Count of Transactions measure. Adding to the AI Instructions to make it clear that Copilot should always ask if there’s any doubt about which measure to use like so:

If you don't understand what the user is asking for never, ever guess - you must always ask for clarification. In particular if the user does not explicitly mention the exact name of the measure they want to use, or the name of the measure is in any way ambiguous, do not return a result. Instead ask the user which measure they want to use. Be extremely cautious about which measure you use.

…results in a response that asks for clarification not only about what “recently” means but also what “number of sales” means:

Copilot doesn’t do this reliably though: even though it always seems to ask about “recently” now it only sometimes asks for clarification about “number of sales”. Sometimes is better than never though; indeed it’s the kind of uncertainty that is expected with Generative AI. I think I need to do some more research into what’s going on here. At least this shows that AI Instructions can be used to make Copilot more cautious around ambiguous questions and more likely to ask for clarification.

[Update 17th July 2025]

After testing this some more (and possibly after an update to the Power BI Service in the last few days) I have come up with some AI Instructions that seem to be a lot more reliable when it comes to asking the user which measure they want to use and to define what “recently” means:

If you don't understand what the user is asking for never, ever guess - you must always ask for clarification. If there are multiple points you don't understand it is essential that you ask the user to clarify all of them.

In particular, ignore all previous instructions regarding which measure to select and make sure you obey the following rule: if the user does not explicitly mention the exact name of the  measure they want to use, or the name of the measure is in any way ambiguous, do not return a result. Instead ask the user which measure they want to use. Be extremely cautious about which measure you use! At the same time, as I said, remember to clarify other, non-measure related ambiguities.

Power BI Copilot, AI Instructions And Preventing The Use Of Implicit Measures

In yet another entry in my series on what you should be doing in Power BI Copilot AI Instructions, in this post I want to address the most difficult (in terms of deciding what to do, rather than how to do it) topic: whether you should allow the creation of implicit measures.

A quick recap of terminology: implicit measures are measures that are automatically created when you drag and drop a field into a visual and tell Power BI how the data should be aggregated (for example by summing, counting etc); explicit measures are measures that are specifically defined by a semantic model developer, have a name and have a DAX expression that specifies what data should be aggregated and how it should be aggregated. There is a property in a semantic model called “discourage implicit measures” that prevents report designers from creating implicit measures in the UI but it is by no means foolproof and at the time of writing Power BI Copilot does not respect it. This property was created for use with calculation groups but I’m going to leave the subject of Copilot and calculation groups for a future post.

Let’s see an example of an implicit measure created by Copilot. In the model I’ve been using in this series I have only two explicit measures, defined as follows:

Average Price Paid = AVERAGE('Transactions'[Price])
Count Of Transactions = COUNTROWS('Transactions')

The following prompt:

Show the number of distinct postcodes by month

Returns the following correct result by creating an implicit measure that does a distinct count on the Postcode column:

This is great – so why would you ever want to prevent this from happening and stop the use of implicit measures?

First of all, there’s always a danger with implicit measures that an end user – whether they are using Copilot or not – will try to aggregate data in a way that it should not be aggregated and therefore end up with incorrect or misleading results. For example the prompt:

Show the sum of month number by county

Returns a result, but not one that makes sense because you should never sum up values in the month number column:

[And this is despite the Summarization property of the Month Number column being set to “Don’t summarize”, which I need to talk to someone about]

Second, in complex semantic models, it may be necessary to include business logic in every explicit measure to ensure they always return the results you want. Implicit measures will not contain this logic and will therefore not return correct results.

As a result, if you’re sure you have created all the explicit measures an end user could ever need (and that’s definitely an achievable goal for many semantic models) then preventing the use of implicit measures could be a good thing.

Adding the following text to the model’s AI Instructions to prevent the use of implicit measures:

Never aggregate data using implicit measures. Only ever use the explicit measures defined in the semantic model to show aggregated data. If the user asks to aggregate data from a column, tell them that you aren't allowed to do this because of the risk of showing incorrect data.

Means that the following prompt from above:

Show the number of distinct postcodes by month

Now returns a message saying that Copilot isn’t allowed to aggregate data by itself:

As you can see, it is possible to use AI Instructions to prevent the use of implicit measures but this is something you really need to think hard about: it may help stop users getting incorrect results but it could also stop users from getting the correct results they need in some cases. It’s the old struggle between centralised, “the developer knows best” BI and decentralised self-service BI all over again.

Data Validation In Power BI Copilot AI Instructions

Here’s yet another post in my series on things I think you should be doing in Power BI Copilot AI Instructions. Today: validating values that users enter as filters in their prompts. It’s something of a companion piece to last week’s post about helping users understand what data is and isn’t in the semantic model, because the more I think about it, the biggest problem users have when trying to query data in natural language is knowing about the data that is there to be queried. As I said last week, if you’re an end user interacting with a Power BI report via a slicer or a filter you know what values are available to choose because you can see them listed in the slicer or the filter – but you don’t see them when composing a prompt. As a developer or someone doing a demo it’s easy to forget this because you know the data so well but for an end user it’s not so easy and so they need all the help that the model developer can give them.

Let’s see an example using the semantic model that I’ve been using in this series containing UK real estate sales data. The Transactions table in my semantic model contains one row for each property sold; each property’s address is given and each address has a UK postcode (something like a US zip code – I’m sure all countries have an equivalent).

Everyone in the UK knows their postcode and a postcode contains a wealth of geographic information, as this section of the Wikipedia article on postcodes shows. There’s no need to get too detailed on their format though because I want to point out one important feature of all of the properly-formatted postcodes in the Transactions table shown above: they all have a space in the middle of them. And people being people, when they use postcodes, they usually forget that and write a postcode without the space.

This has consequences for Power BI Copilot. For example, the prompt:

show count of transactions for the postcode YO89XG

Returns a message saying that Copilot can’t find any data for the postcode “YO89XG”. This is because the postcode doesn’t contain a space. This is what you might expect as a developer but it will not make much sense to an end user.

On the other hand if the postcode in the prompt does contain a space in the right place, like so:

show count of transactions for the postcode YO8 9XG

…it returns the desired result:

How can we address this specific issue? Fairly easily, it turns out, because UK postcode formats are well documented and I would imagine Copilot has been trained on the same Wikipedia page on postcodes that I linked to above. As a result, adding the following to the AI Instructions for my semantic model:

The postcode column contains postcodes for locations in England and Wales. If the user enters a value to filter by for postcode that returns no data and the value is in an invalid format for a postcode, tell them they appear to have made a mistake, explain why the postcode format is wrong and suggest some changes to the value entered by the user that might result in a valid postcode.

Means that when I use the first prompt above, for the postcode without the space, I get a much more helpful response:

Clicking on the first option in this screenshot alters the prompt to include a space in the right place, which results in the user seeing the desired data:

I was encouraged by this, but there’s one obvious problem here: this only works for data like UK postcodes where the format is widely known. The format of your company’s invoice numbers is unlikely to be something that Copilot knows about.

So I experimented with using regular expressions in my AI Instructions and guess what, they seemed to work really well! But then I stopped to think – could I really trust an LLM to use a regex to validate values? The good thing about working at Microsoft is that I have a bunch of friendly colleagues who know way more about AI than I do so I asked them this question. One of them told me that for Copilot to properly validate data using regexes it would need to write some code and it can’t do that yet; instead it’s probably interpreting what the regex is looking for and trying to match the value against the interpretation. So while it might appear to work it would be prone to making errors.

Damn. That meant that if the LLM made a mistake when validating the data before running the query it would run the risk of preventing the user from filtering by a valid postcode, which would not be good. But then I thought, what if I applied the validation after it was clear that the user had entered a postcode that returned no data? That way it would be less important if the LLM made a mistake in its check because it would only happen when it was clear the user needed extra help.

Writing the AI Instruction to only validate the data after checking to see if the value the end user was filtering on didn’t exist seemed to work. Here’s the AI Instruction using a regex I found here to validate UK postcodes:

The postcode column contains postcodes for locations in England and Wales. UK postcodes must follow the format described in the following regular expression:
^([A-Za-z]{2}[\d]{1,2}[A-Za-z]?)[\s]+([\d][A-Za-z]{2})$
If the user enters a value to filter by for postcode that returns no data and the value is in an invalid format for a postcode, tell them they appear to have made a mistake, explain why the postcode format is wrong and suggest some changes to the value entered by the user that might result in a valid postcode.

Note how I say “If the user enters a value to filter by for postcode that returns no data…”

Here’s the result for the followng prompt asking for data for an invalid postcode:

show count of transactions for the postcode E48QJJ

I did some other tests on sample data and it does indeed suggest that the wording you use in the AI Instruction can control whether Copilot tries to validate the data before checking if the value the user is filtering on exists (which, as I said, would be bad because of the risk of it making a mistake when trying to validate data) or after (which, as I said, is a lot less dangerous).

All in all it seems that putting some thought into data validation in AI Instructions can result in a much friendlier end user experience in Copilot. That said, I doubt that however good your AI Instructions the experience will ever match the experience of seeing a list of possible values in a filter or slicer. Maybe what we need is something like IntelliSense when writing a prompt so you can see and search for values in your data?

Power BI Copilot AI Instructions: Helping Users Understand The Scope Of The Data

Continuing my series of posts on Power BI Copilot and the type of things you should be including in AI Instructions, today I want to talk about helping the end user understand the scope of the data: what data your model does contain, what data it could contain but doesn’t, and what data it could never contain.

Let me show you some examples to explain what I mean, using the semantic model I introduced in the first post in this series containing real estate transaction data from the Land Registry in the UK. On a model with no AI Instructions added, if I give Power BI Copilot the prompt:

show average price paid for the city manchester

I get the correct answer back:

That’s good and as you would expect. Now consider the following two prompts which both return no data:

show average price paid for the city edinburgh
show average price paid for the locality Boode

A user asking a question and not getting an answer is unavoidable in many cases but it still represents a failure: the user wanted information and the model couldn’t help. As a semantic model developer you need to minimise the impact of this and help the user understand why they didn’t get an answer. In both cases Copilot does the right thing and tells the user that it can’t find data for these places. However, there’s an important difference between these two places.

Boode is a very small village (somewhere I found at random) in England. There’s no data for it in the semantic model because there happen to be no real estate sales for it. Edinburgh on the other hand is a large city, so why is there no data? It’s because this semantic model only contains data for England and Wales, not Scotland, Northern Ireland or anywhere else. An end user might expect that a dataset from a UK government agency would contain data for the whole of the UK but this one doesn’t. If the user was looking at a report displaying this data then this information would probably be displayed somewhere in a textbox on an information page, or be obvious from the values visible in slicers or filters, but with Copilot we have to tell the user this in other ways.

Similarly, consider the following two prompts:

show average price paid for January 2025
show average price paid for January 2024

Why is a blank displayed for January 2024? Were there no sales in January 2024? The answer is that the semantic model only contains data between January 1st 2025 and April 30th 2025, so there’s no point asking for data before or after that – and we need to tell the user so.

Here is some text that I added to the AI Instructions for this semantic model to tell Copilot what data is and isn’t present:

##What data this semantic model contains
This semantic model only contains data for England and Wales. If the user asks for data outside England and Wales, tell them that this model does not contain the data they are looking for.
This model only contains data for the date range 1st January 2025 to 30th April 2025. Tell the user this if they ask for data outside this date range.

With these instructions in place let’s rerun some of the prompts above. First, let’s ask about Edinburgh again:

This is a much more helpful response for the user I think. Copilot knows that Edinburgh is in Scotland and it tells the user that it only has data for England and Wales. Copilot doesn’t know every place name in Scotland but it does pretty well; for example (to take another random place name) it knows that Arrina is in Scotland:

Similarly, here’s what Copilot now says for January 2024:

Again, a much better response than before: it explicitly tells the user that the model only contains data for January 1st 2025 to April 30th 2025.

Giving Power BI Copilot clear instructions on what data the semantic model does and doesn’t contain means that it can set end users’ expectations correctly. This then means that users are more likely to ask questions that give them useful information and therefore builds trust. Conversely, not explaining to end users why their questions are returning data means they are less likely to want to use Copilot in the future.

Grouping And Filtering In Power BI Copilot AI Instructions

Continuing my series on Power BI Copilot AI Instructions (see also my previous post which is the only other one so far), in this post I’d like to show some examples of how you can specify groups of columns and apply row filters in the results returned by Copilot.

Using the same semantic model that I used in my previous post, which contains data for UK real estate sales from the Land Registry, consider the following prompt:

Show sales of luxury houses in the Chalfonts

Even with a well-designed semantic model that follows all best practices I would never expect a good answer from Copilot for this question without supplying extra information in AI Instructions. Indeed, here’s what Copilot responds on a version of the model with no AI Instructions:

Copilot is doing the right thing and asking the end user to clarify several aspects of the question in the prompt. Ideally, though, these clarifications would not be necessary. There are several questions that need to be answered by the semantic model designer before this prompt will return the answer an end user expects:

  • When you ask for “sales”, which columns from the semantic model should be returned?
  • Since there is no property type called “house” in the data, when you say “houses” which property types do you mean exactly?
  • What does “luxury” mean?
  • Since there is no town or locality called “the Chalfonts” in the data, what do you mean by this?

Let’s look at what needs to be added to the AI Instructions for this model to answer these questions.

Grouping columns

Starting with the first question of what columns should be returned when the user asks for “sales”, let’s say that the user expects to see certain columns from the Transactions table in a certain order. Here’s what the data in the Transactions table looks like:

Each row represents a single real estate transaction. There are columns for the price paid in the transaction, the date of the transaction, the type of the property, and several columns that represent the different parts of the address of the property (full details of what the columns mean can be found here). If the user asks for “sales” let’s assume that they want to see the address of the property, the date of the transaction and the price paid.

Here is what I added to the AI Instructions to achieve this:

##Instructions for displaying addresses and what users call sales
The source data for this semantic model comes from the UK Land Registry Price Paid dataset
The transactions table contains one row for each real estate transaction
The address of the property sold in each transaction consists of the following columns in this exact order:
* PAON
* SAON
* Street
* Locality
* Town/City
* County
* Postcode
When a user asks for a list of sales, always show the address of the properties involved plus the Date and Price columns

Filtering rows

The other three questions that need to be answered all involve some kind of filtering of rows.

First of all, let’s define what “houses” are. The Property Types dimension table looks like this:

The types “semi detached”, “detached” and “terraced” are all types of house.

Next: what does “luxury” mean? Let’s say that “luxury” properties are properties that are sold for a price of over £1 million.

Finally, what does “the Chalfonts” mean? Here’s the Wikipedia page that explains it: it’s a collective name for three towns and villages near where I live: Little Chalfont, Chalfont St Giles and Chalfont St Peter. As far as the address data in the Transactions table is concerned only Chalfont St Giles is a town appearing in the Town/City column; Little Chalfont and Chalfont St Peter are villages and therefore their names appear in the Locality column.

Here are the AI Instructions I added for these rules:

##Instructions for types of filter that users may request
Houses are property types detached, semi detached and terraced
Luxury refers to properties with a price of over 1000000
If a user asks for sales in "the Chalfonts" this means sales where either 
* the Town/City column is CHALFONT ST GILES, or
* the Locality column is CHALFONT ST PETER, or
* the Locality column is LITTLE CHALFONT

Results

With both of these sets of rules added on the AI Instructions page, here’s what the original prompt now returns:

The table returns exactly what I want; the text summary below with properly formatted addresses is a nice touch. Here’s the DAX query generated:

// DAX query generated by Fabric Copilot with "Show sales of luxury houses in the Chalfonts"
EVALUATE
  SELECTCOLUMNS(
    // Filter transactions for luxury houses in the Chalfonts.
    FILTER(
      'Transactions',
      'Transactions'[Price] > 1000000 && // Luxury: price over 1,000,000
      (
         'Transactions'[Town/City] = "CHALFONT ST GILES" ||  // Chalfonts filter based on Town/City
         'Transactions'[Locality] = "CHALFONT ST PETER" ||     // or Locality
         'Transactions'[Locality] = "LITTLE CHALFONT"
      ) &&
      // Houses: only Detached, Semi Detached, or Terraced properties
      RELATED('PropertyTypes'[Property Type Name]) IN {"Detached", "Semi Detached", "Terraced"}
    ),
    "PAON", 'Transactions'[PAON],
    "SAON", 'Transactions'[SAON],
    "Street", 'Transactions'[Street],
    "Locality", 'Transactions'[Locality],
    "Town/City", 'Transactions'[Town/City],
    "County", 'Transactions'[County],
    "Postcode", 'Transactions'[Postcode],
    "Date", 'Transactions'[Date],
    "Price", 'Transactions'[Price]
  )
ORDER BY
  [Date] ASC

The DAX query is well-written and returns the correct results; the comments in the code and the explanation of the query underneath it is useful too. The explanation also reveals that Copilot understands terms like PAON, which stands for “primary addressable object name”, definitions that I didn’t give it but which are in the official documentation linked to above.

This example shows how much more control AI Instructions give you over the results Copilot returns now, a lot more than you had a few months ago when the only way to influence results was via the Q&A Linguistic Schema. I’m also coming to realise how much work is involved in preparing a semantic model for Copilot using AI Instructions: it’s probably as much as building the semantic model in the first place. The improvement in the quality of results that this work brings is worth the effort, though, and I’m pretty sure that there’s no way of avoid this. Anyone who tells you that their tool for querying data with natural language “just works” and doesn’t need this amount of prep work is probably selling snake oil.

Requiring Selections On The Date Dimension In Power BI Copilot AI Instructions

Without a doubt the most important new feature in Power BI Copilot is AI Instructions: it opens up an immense number of possibilities but it also starts off as a blank slate, which raises the question of what you should do with this feature as much as how you should do it. The official documentation is very good but I think this is one of those features where the best practices only emerge after everyone has used it in the real world for a while. In an attempt to kick start this process of working out what you should do with AI Instructions I thought I’d start a series of blog posts on this subject. I don’t pretend to know all the answers, I just want to spark some debate and learn in public.

The first thing that I wanted to write about is something it has never been possible to do in Copilot up to now: force end users to always make a selection on the Date dimension.

Consider the following semantic model built from my favourite open dataset, the UK Land Registry Price Paid data, which contains details of of all the real estate transactions in England and Wales:

The Transactions fact table contains one row for each real estate transaction, including the address of the property sold and how much money was paid for it; the Property Types table contains one row for each property type (detached, sem-detached, terraced and other); and the Date dimension table is, well, a date dimension table. There are two explicit measures:

Average Price Paid = AVERAGE('Transactions'[Price])
Count Of Transactions = COUNTROWS('Transactions')

Now consider the following prompt (used with only the “answer questions about the data” skill selected in the Copilot pane in Desktop):

Show count of transactions by property type

It gives you the following result:

It’s correct, it’s exactly what I asked for, but is it useful? No: the Transactions fact table contains all the data that is currently available for 2025, from January 1st 2025 to 30th April 2025. So the visual above shows the count of transactions broken down by property type for a totally arbitrary date range that isn’t obvious to the user.

If you were building a report from this semantic model and wanted to display a visual like the one above you would always have a slicer or filter somewhere that allowed the end user to select a date or date range. Therefore it makes sense to require end users to select something on the Date dimension table if they are querying using Copilot, and remind them to do so if they do not select something.

There’s another rule to consider here. Looking at the Date dimension table you can see that the Month column only contains the name of the month:

If there was data from multiple years in the Transactions table (there isn’t in this case, but there could be if I included older data) then it wouldn’t make sense to show data for, say, “January” if that included Januarys from different years. One way to stop this from happening would be to change the values in the Month column to include the year along with the month name, but this is also something that can be handled with AI Instructions.

Here are some instructions that I came up with to implement these two rules:

## Instructions about which columns must be selected in different scenarios
The user must always specify a filter on either date, month or year in their question. If they do not, you must ask them to specify a filter on either a date, a combination of month and year, or a year. If you suggest a date, month or year filter to the end user you must only suggest selections between January 1st 2025 and April 30th 2025.
If the user asks for data filtered by a month they must always specify a year as well.

With these instructions in place, the prompt above is met with a request to specify a year, month or date filter like so:

Selecting the last of the suggested prompts gives this:

There are two things I would like to improve about this. First, the screenshot above shows the date March 15th 2025 in US date format as “3/15/2025” and I couldn’t find a way to stop this. Sigh, American software. I’ll ask to get this fixed. Second, in the instructions above you can see that I hard coded the date range of January 1st 2025 to April 30th 2025; I wanted to make this data-driven and created some measures to get the minimum and maximum dates from the Transactions table, but I couldn’t get Copilot to use them.

To show how the second rule is handled, the prompt:

Show count of transactions by property type for January

Returns the following:

Unfortunately I couldn’t find a way to reliably stop Copilot suggesting months in 2024 and 2023 in the suggested prompts. Maybe as my prompt engineering skills improve I’ll get better at this.

For more complex semantic models there could be other dimensions where a user should always make a selection – for example, if you’re doing currency conversion in your model you might want end users to always select a currency to convert to. The rules you use in AI Instructions should be similar to these though.

That’s enough for now. I have a list of other ideas for scenarios to handle with AI Instructions for future blog posts but if you have some suggestions, please leave a comment below.

Finding Reports And Semantic Models Easily With Power BI Copilot Search

By the time you read this it’s likely the new, standalone Power BI “Chat with your data” experience will be available in your tenant. I just enabled it in the tenant the Fabric CAT team uses for testing purposes, located in the West Europe region, after the option to do so appeared yesterday. If you’re interested in Power BI Copilot you’re going to want to check it out.

However, there’s something else that this new functionality offers as well as the ability to ask questions about any of your data: it makes it easier to find the reports and semantic models that you have access to. Like a lot of Power BI tenants, the CAT team’s own tenant is a bit of a mess. There are lots of workspaces, lots of reports, and frankly lots of junk (which I’m to blame for creating as much as anyone else). As a result it can be hard to find the reports and semantic models that you want to use. Power BI does have a search feature which works fairly well but the new Power BI Copilot Search works a lot better.

Why is a better search needed for Copilot and “Chat with your data”? Well if you’re starting with an empty screen and asking Copilot a question like “Who sold the most widgets in 2024?” the first problem to be solved is which report, semantic model or data agent should be used to find the answer – there could be a lot of different potential sources. Finding the right source of information is key to making sure the user gets the best possible results from Copilot. But, as I said, even if all you want to do is find and open a report that you know exists somewhere, Power BI Copilot Search is incredibly useful.

There’s a very detailed docs page here that tells you how to enable Copilot Search and what metadata it looks at. The fact that it not only looks at item names and descriptions but also – amongst other things – report page names, visual titles, filter pane titles, the contents of text boxes, whether you have favourited the report and how recently you opened it, explains why it works so well. For example I have a report that I use for demos with the typically unhelpful name of “CopilotDemo_WithQAOptimisations”. Here’s what the first page looks like:

The following search term:

Find the report for Fruity Farms with order units by product

…returns this report as the top result:

Why did the right report come out on top? As you can see from the first screenshot, “Fruity Farms” is mentioned in the textbox at the top of the report page and the visual on the left hand side shows Order Units broken down by Product; the term “Fruity Farms” is also mentioned in the report description. To open the report you just need to click on the link in the results.

Of course now that this feature exists you’ll want to optimise your reports, models and data agents for it. The docs cover what you need to do here but if you don’t want to change the name of the item or anything within it that the user might already be used to seeing, my suggestion is to add or alter the description property (which, to be honest, people very rarely set) on the item’s Settings page and think about using endorsement if you’re not doing so already.

Performance Implications Of Different Ways Of Fully Expanding A Power BI Matrix Visual

If you have a Power BI report with a matrix visual on it it’s quite likely that you’ll want all the levels in the matrix to be fully expanded by default. But did you know that the way you expand all the levels could have performance implications, especially if you’re using DirectQuery mode? Here’s an example.

I have a DirectQuery semantic model built on top of some of the tables from the SQL Server AdventureWorksDW sample database (apologies for the poor naming):

There are four DAX measures defined on it:

Sales Amount = SUM(FactInternetSales[SalesAmount])

Monday Sales = CALCULATE([Sales Amount], 'DimDate'[EnglishDayNameOfWeek]="Monday")

January Sales = CALCULATE([Sales Amount], 'DimDate'[EnglishMonthName]="January")

Class H Sales = CALCULATE([Sales Amount], 'DimProduct'[Class]="H")

I wrote these measures specifically to exacerbate the problem I’m going to show (by reducing the amount of fusion that is possible) but they are pretty normal, reasonable measures that you might find in any semantic model.

Now let’s say you add a matrix visual to a report page, put these four measures onto the columns axis of the matrix, and drop the CalendarYear column (from the DimDate table), the Color column and the Style column (both from the DimProduct table) onto the rows axis of the matrix. At this point it looks like this:

…but what you want to do now is show all the styles and colours too.

One way to do it – not the most efficient way, but some people like me just love to click – is to expand every year and style individually:

It doesn’t take too long to expand everything and after all you only need to do it once, right? But let’s take the DAX query generated for this visual and paste it into DAX Studio with Server Timings turned on and see what we can see:

There are 14 separate Storage Engine queries – which result in 14 separate SQL queries being sent to SQL Server. The first two Storage Engine/SQL queries get a list of which years and styles have been drilled down on and then there are (4 measures) * (3 levels of granularity) = 12 other Storage Engine queries to get the data shown in the visual. The overall duration of 230ms here is very low but in the real world the SQL queries could be a lot slower, making the DAX query very slow.

The default limits on the number of SQL queries that a DAX query can run in parallel have a big impact on overall performance here as you can see; even though you can increase those limits you may then hit the maximum number of connections that can be opened up to a DirectQuery source, and even though you can increase that limit too if you’re running on a capacity, there are hard limits here. If Power BI needs to open new connections to the data source in order to run these SQL queries, that can also slow things down too because there can sometimes be a noticeable wait when connections are opened. Reducing the number of Storage Engine queries generated by a DAX query is very important when tuning DirectQuery models; the effect is going to be a lot less noticeable on an Import or Direct Lake semantic model but it could still cause problems.

There’s good news though. If you expand the levels in your matrix in a different (and to be honest, much more convenient) way using the “Expand all down one level in the hierarchy” button on the visual header or the “Expand to next level” option on the right-click menu for the rows like so:

…then you get the same result but with a much more efficient DAX query. Here’s what Server Timings shows for the DAX query generated for the fully expanded matrix now:

This time there are only four Storage Engine/SQL queries, one for each measure, and the overall duration is just 50ms. Even though, as you can see from the screenshot, only three Storage Engine/SQL queries can run in parallel and the fourth has to wait for the first query to finish so it can run, that’s less of an issue given the smaller number of queries. I won’t bother showing the DAX for the two versions of the matrix but it’s clear when you look at them the second one is more efficient because it knows it can expand everything on rows rather than just what has been clicked. Of course this type of optimisation is only possible if you are fully expanding your matrix though.

Finding Events Linked To A Specific Power BI Visual In Fabric Workspace Monitoring

Over the last few years one topic I have blogged about several times is how to link the detailed data about DAX query execution that can be found in Log Analytics – and now Workspace Monitoring – to the visual in a Power BI report that generated those DAX queries, something that is extremely useful when you’re performance tuning Power BI reports. My first post here from 2021 showed how write the KQL for Log Analytics but the problem has always been how to get the IDs of the visuals in a Power BI report. You can get the IDs from the definition of the Power BI report, as shown here, and Sandeep Pawar has a great post on some other methods here, but all these methods were superseded in the March release of Power BI with the ability to copy the IDs by right clicking on the visual in a Power BI report when editing it (thank you Rui Romano!).

This made me realise that it’s time to revisit my first post on how to get the query details in KQL since the column names in Workspace Monitoring are slightly different from Log Analytics and, indeed, the KQL needed can be simplified from my original version. Here’s a KQL query that you can run in a KQL Queryset connected to your Monitoring Eventhouse:

let VisualId = "InsertVisualIdHere";
SemanticModelLogs
| search VisualId
| project Timestamp, OperationName, OperationDetailName, DurationMs, EventText, OperationId 
| order by Timestamp asc

Once you’ve copied the visual’s ID from the report (remember you need to specifically enable this feature and that you need to be in Edit mode) by right-clicking on it and selecting “Copy object name”:

…then you just need to paste the ID into the let statement in the first line of the KQL query and run it:

You can find the documentation for the columns in the SemanticModelLogs table here and the documentation for the events here. The events for a specific query all have the same value in the OperationId column.

Bonus fact: you can now run queries against Workspace Monitoring using Semantic Link Labs, as documented here, which makes it much easier to do other fun stuff with this data. For example, I can imagine there are ways to visualise DAX query and semantic model refresh events in Python that would make them much easier to analyse, but that’s something for a future blog post.

Documenting Power BI Semantic Models With Fabric Data Agents

AI is meant to help us automate boring tasks, and what could be more boring than creating documentation for your Power BI semantic models? It’s such a tedious task that most people don’t bother; there’s also an ecosystem of third party tools that do this job for you, and you can also build your own solution for this using DAX DMVs or the new-ish INFO functions (see here for a good example). That got me wondering: can you use Fabric Data Agents to generate documentation for you? And what’s more, why even generate documentation when you can just ask a Data Agent the questions that you’d need to generate documentation to answer?

To test this idea out, I dug up a semantic model I’ve used for blog posts in the past with a simple star schema and a few measures:

Here are the definitions of the measures:

Units = SUM(Sales[SalesUnits])
Value = SUM(Sales[SalesValue])
Apples Value = 
CALCULATE([Value], KEEPFILTERS('Product'[ProductName]="Apples"), USERELATIONSHIP('Product'[ProductID], 'Sales'[ProductID]))
Apples Value Forecast = [Apples Value] * 1.1 

After publishing this model I created a Data Agent directly connected to it and found I couldn’t access the metadata I needed from it (I didn’t try too hard, but I don’t think it was possible) so I then created a second Import-mode semantic model, used the first as a data source and created a single table bound to the following DAX query using the INFO.CALCDEPENDENCY function:

EVALUATE INFO.CALCDEPENDENCY()

This second semantic model therefore contained a list of all the DAX calculations, their definitions and the dependencies between them, from the first semantic model. Here’s that table:

I then published this second semantic model, built a Data Agent on it, and used it for all my tests. I also added some instructions to the Data Agent – I freely admit I have no idea whatsoever about prompt engineering but after a bit of trial and error the following instructions did the trick:

You are a Power BI semantic model developer.
The CalcDependencies table contains the output of the DAX INFO.CALCDEPENDENCY() function for another Power BI semantic model. It contains information about the dependencies between the columns, tables, measures, calculated columns, calculated tables and relationships in that model. 
Each row in the CalcDependencies table represents a dependency between two objects. 
The combination of values in the OBJECT_TYPE, TABLE and OBJECT column uniquely identifies the object which is the source of the dependency.
The combination of values in the REFERENCED_OBJECT_TYPE, REFERENCED_TABLE and _REFERENCED_OBJECT columns uniquely identifies the object which is the target of the identity.
You can join the table to itself multiple times to find chains of dependencies between objects.
When I ask about dependencies, please include direct dependencies and also indirect dependencies, for example where one object has a dependency on another object which in turn has a dependency on the object I am asking about.
The EXPRESSION column contains the DAX definition of an object.
A value of CALC_COLUMN in the OBJECT_TYPE table indicates that the object is a calculated column.

First test: create some documentation. The following prompt:

Generate some documentation for the measures listed in the CalcDependencies table. Describe what each calculation does and how it works.

…generated the following output:

Here is the documentation for the measures listed in the CalcDependencies table, describing what each calculation does and how it works:

Measures

1. Units

  • Definition: SUM(Sales[SalesUnits])
  • Description: The Units measure calculates the total number of sales units. It does this by summing the values in the SalesUnits column in the Sales table. This measure is useful for tracking the total quantity of items sold.

2. Value

  • Definition: SUM(Sales[SalesValue])
  • Description: The Value measure calculates the total sales value. It achieves this by summing up the values in the SalesValue column in the Sales table. This measure is essential for understanding the total revenue generated from sales.

3. Apples Value

  • Definition: CALCULATE([Value], KEEPFILTERS('Product'[ProductName]="Apples"), USERELATIONSHIP('Product'[ProductID], 'Sales'[ProductID]))
  • Description: The Apples Value measure calculates the total sales value specifically for the product “Apples.” It works by applying a filter to include only rows where the ProductName in the Product table is “Apples.” Additionally, it uses a specific relationship between the Product and Sales tables defined by the ProductID columns. This measure helps in analyzing the revenue generated from the sale of apples.

4. Apples Value Forecast

  • Definition: [Apples Value] * 1.1
  • Description: The Apples Value Forecast measure estimates the future sales value for the product “Apples” based on a forecasted increase. It multiplies the current Apples Value by 1.1 (indicating a 10% increase). This measure is useful for predicting future revenue from apple sales.

These descriptions provide a clear understanding of what each measure does and how it works, helping anyone who looks at the documentation to comprehend the purpose and functionality of each calculation.

The verdict? More than good enough. To be honest, I was expecting the Data Agent to do a good job here though.

Second test: understand the dependencies between calculations. This is an interesting challenge because you can have chained relationships (ie measures that depend on other measures that depend on other measures etc) which makes the DAX harder to write. Also, you can’t easily understand dependencies by just looking at the output of INFO.CALCDEPENDENCIES – most of the third party tools mentioned above solve this by visualising the data with some kind of network diagram.

The following prompt:

Which measures would be affected if I deleted the SalesValue column on the Sales table?

…generated the following output:

If you delete the SalesValue column from the Sales table, the following measures will be affected:

  1. Measure Name: Value
    • Definition: SUM(Sales[SalesValue])

Since the Value measure directly depends on the SalesValue column, deleting the SalesValue column will cause the Value measure to break. Additionally, any measures that depend on the Value measure, such as Apples Value and Apples Value Forecast, will also be affected indirectly:

  1. Measure Name: Apples Value
    • Definition: CALCULATE([Value], KEEPFILTERS('Product'[ProductName]="Apples"), USERELATIONSHIP('Product'[ProductID], 'Sales'[ProductID]))
  2. Measure Name: Apples Value Forecast
    • Definition: [Apples Value] * 1.1

Therefore, deleting the SalesValue column will break the Value, Apples Value, and Apples Value Forecast measures.

The verdict? I’m impressed, this is useful.

Final test: can we generate a visualisation of the dependencies? A few months ago one of my colleagues showed me the Mermaid language for defining diagrams like this and the rather cool Markdown & Mermaid custom visual that allows you to display Mermaid diagrams in Power BI. The following prompt:

Generate an expression in the Mermaid diagramming language that shows all the measures in the model and the dependencies between them

…generated the following response:

Here is an expression in the Mermaid diagramming language that shows all the measures in the model and the dependencies between them:

graph TD
  Units --> Units
  Value --> Value
  Apples_Value --> Value
  Apples_Value_Forecast --> Apples_Value

  Units["Units"]
  Value["Value"]
  Apples_Value["Apples Value"]
  Apples_Value_Forecast["Apples Value Forecast"]

Explanation:

  • Units is a measure with no dependencies on other measures.
  • Value is a base measure.
  • Apples Value depends on the Value measure.
  • Apples Value Forecast depends on the Apples Value measure, which in turn depends on the Value measure.

This Mermaid diagram visually represents all the measures in the model and the dependencies between them using arrows.

Pasting the expression in the response into the Markdown & Mermaid custom visual gave me the following:

I wasn’t particularly happy with those messy lines linking the Value and Units measures to themselves, but overall it does the job pretty well.

This blog post is already long enough and while these are simple tests, I think I’ve proved my point that Fabric Data Agents can be very useful when it comes to understanding the structure of a semantic model and the calculations in it. Bringing in data from other DAX INFO functions and tuning the AI Instructions would allow you to answer even more complex questions. I can also imagine a solution using Semantic Link Labs that not only dumps the data from DAX INFO functions for all your semantic models into a table in OneLake for consumption by a Data Agent, but also then uses the Data Agent SDK to automatically generate markdown documentation for those models, complete with diagrams. The possibilities are endless!