Grouping And Filtering In Power BI Copilot AI Instructions

Continuing my series on Power BI Copilot AI Instructions (see also my previous post which is the only other one so far), in this post I’d like to show some examples of how you can specify groups of columns and apply row filters in the results returned by Copilot.

Using the same semantic model that I used in my previous post, which contains data for UK real estate sales from the Land Registry, consider the following prompt:

Show sales of luxury houses in the Chalfonts

Even with a well-designed semantic model that follows all best practices I would never expect a good answer from Copilot for this question without supplying extra information in AI Instructions. Indeed, here’s what Copilot responds on a version of the model with no AI Instructions:

Copilot is doing the right thing and asking the end user to clarify several aspects of the question in the prompt. Ideally, though, these clarifications would not be necessary. There are several questions that need to be answered by the semantic model designer before this prompt will return the answer an end user expects:

  • When you ask for “sales”, which columns from the semantic model should be returned?
  • Since there is no property type called “house” in the data, when you say “houses” which property types do you mean exactly?
  • What does “luxury” mean?
  • Since there is no town or locality called “the Chalfonts” in the data, what do you mean by this?

Let’s look at what needs to be added to the AI Instructions for this model to answer these questions.

Grouping columns

Starting with the first question of what columns should be returned when the user asks for “sales”, let’s say that the user expects to see certain columns from the Transactions table in a certain order. Here’s what the data in the Transactions table looks like:

Each row represents a single real estate transaction. There are columns for the price paid in the transaction, the date of the transaction, the type of the property, and several columns that represent the different parts of the address of the property (full details of what the columns mean can be found here). If the user asks for “sales” let’s assume that they want to see the address of the property, the date of the transaction and the price paid.

Here is what I added to the AI Instructions to achieve this:

##Instructions for displaying addresses and what users call sales
The source data for this semantic model comes from the UK Land Registry Price Paid dataset
The transactions table contains one row for each real estate transaction
The address of the property sold in each transaction consists of the following columns in this exact order:
* PAON
* SAON
* Street
* Locality
* Town/City
* County
* Postcode
When a user asks for a list of sales, always show the address of the properties involved plus the Date and Price columns

Filtering rows

The other three questions that need to be answered all involve some kind of filtering of rows.

First of all, let’s define what “houses” are. The Property Types dimension table looks like this:

The types “semi detached”, “detached” and “terraced” are all types of house.

Next: what does “luxury” mean? Let’s say that “luxury” properties are properties that are sold for a price of over £1 million.

Finally, what does “the Chalfonts” mean? Here’s the Wikipedia page that explains it: it’s a collective name for three towns and villages near where I live: Little Chalfont, Chalfont St Giles and Chalfont St Peter. As far as the address data in the Transactions table is concerned only Chalfont St Giles is a town appearing in the Town/City column; Little Chalfont and Chalfont St Peter are villages and therefore their names appear in the Locality column.

Here are the AI Instructions I added for these rules:

##Instructions for types of filter that users may request
Houses are property types detached, semi detached and terraced
Luxury refers to properties with a price of over 1000000
If a user asks for sales in "the Chalfonts" this means sales where either 
* the Town/City column is CHALFONT ST GILES, or
* the Locality column is CHALFONT ST PETER, or
* the Locality column is LITTLE CHALFONT

Results

With both of these sets of rules added on the AI Instructions page, here’s what the original prompt now returns:

The table returns exactly what I want; the text summary below with properly formatted addresses is a nice touch. Here’s the DAX query generated:

// DAX query generated by Fabric Copilot with "Show sales of luxury houses in the Chalfonts"
EVALUATE
  SELECTCOLUMNS(
    // Filter transactions for luxury houses in the Chalfonts.
    FILTER(
      'Transactions',
      'Transactions'[Price] > 1000000 && // Luxury: price over 1,000,000
      (
         'Transactions'[Town/City] = "CHALFONT ST GILES" ||  // Chalfonts filter based on Town/City
         'Transactions'[Locality] = "CHALFONT ST PETER" ||     // or Locality
         'Transactions'[Locality] = "LITTLE CHALFONT"
      ) &&
      // Houses: only Detached, Semi Detached, or Terraced properties
      RELATED('PropertyTypes'[Property Type Name]) IN {"Detached", "Semi Detached", "Terraced"}
    ),
    "PAON", 'Transactions'[PAON],
    "SAON", 'Transactions'[SAON],
    "Street", 'Transactions'[Street],
    "Locality", 'Transactions'[Locality],
    "Town/City", 'Transactions'[Town/City],
    "County", 'Transactions'[County],
    "Postcode", 'Transactions'[Postcode],
    "Date", 'Transactions'[Date],
    "Price", 'Transactions'[Price]
  )
ORDER BY
  [Date] ASC

The DAX query is well-written and returns the correct results; the comments in the code and the explanation of the query underneath it is useful too. The explanation also reveals that Copilot understands terms like PAON, which stands for “primary addressable object name”, definitions that I didn’t give it but which are in the official documentation linked to above.

This example shows how much more control AI Instructions give you over the results Copilot returns now, a lot more than you had a few months ago when the only way to influence results was via the Q&A Linguistic Schema. I’m also coming to realise how much work is involved in preparing a semantic model for Copilot using AI Instructions: it’s probably as much as building the semantic model in the first place. The improvement in the quality of results that this work brings is worth the effort, though, and I’m pretty sure that there’s no way of avoid this. Anyone who tells you that their tool for querying data with natural language “just works” and doesn’t need this amount of prep work is probably selling snake oil.

Requiring Selections On The Date Dimension In Power BI Copilot AI Instructions

Without a doubt the most important new feature in Power BI Copilot is AI Instructions: it opens up an immense number of possibilities but it also starts off as a blank slate, which raises the question of what you should do with this feature as much as how you should do it. The official documentation is very good but I think this is one of those features where the best practices only emerge after everyone has used it in the real world for a while. In an attempt to kick start this process of working out what you should do with AI Instructions I thought I’d start a series of blog posts on this subject. I don’t pretend to know all the answers, I just want to spark some debate and learn in public.

The first thing that I wanted to write about is something it has never been possible to do in Copilot up to now: force end users to always make a selection on the Date dimension.

Consider the following semantic model built from my favourite open dataset, the UK Land Registry Price Paid data, which contains details of of all the real estate transactions in England and Wales:

The Transactions fact table contains one row for each real estate transaction, including the address of the property sold and how much money was paid for it; the Property Types table contains one row for each property type (detached, sem-detached, terraced and other); and the Date dimension table is, well, a date dimension table. There are two explicit measures:

Average Price Paid = AVERAGE('Transactions'[Price])
Count Of Transactions = COUNTROWS('Transactions')

Now consider the following prompt (used with only the “answer questions about the data” skill selected in the Copilot pane in Desktop):

Show count of transactions by property type

It gives you the following result:

It’s correct, it’s exactly what I asked for, but is it useful? No: the Transactions fact table contains all the data that is currently available for 2025, from January 1st 2025 to 30th April 2025. So the visual above shows the count of transactions broken down by property type for a totally arbitrary date range that isn’t obvious to the user.

If you were building a report from this semantic model and wanted to display a visual like the one above you would always have a slicer or filter somewhere that allowed the end user to select a date or date range. Therefore it makes sense to require end users to select something on the Date dimension table if they are querying using Copilot, and remind them to do so if they do not select something.

There’s another rule to consider here. Looking at the Date dimension table you can see that the Month column only contains the name of the month:

If there was data from multiple years in the Transactions table (there isn’t in this case, but there could be if I included older data) then it wouldn’t make sense to show data for, say, “January” if that included Januarys from different years. One way to stop this from happening would be to change the values in the Month column to include the year along with the month name, but this is also something that can be handled with AI Instructions.

Here are some instructions that I came up with to implement these two rules:

## Instructions about which columns must be selected in different scenarios
The user must always specify a filter on either date, month or year in their question. If they do not, you must ask them to specify a filter on either a date, a combination of month and year, or a year. If you suggest a date, month or year filter to the end user you must only suggest selections between January 1st 2025 and April 30th 2025.
If the user asks for data filtered by a month they must always specify a year as well.

With these instructions in place, the prompt above is met with a request to specify a year, month or date filter like so:

Selecting the last of the suggested prompts gives this:

There are two things I would like to improve about this. First, the screenshot above shows the date March 15th 2025 in US date format as “3/15/2025” and I couldn’t find a way to stop this. Sigh, American software. I’ll ask to get this fixed. Second, in the instructions above you can see that I hard coded the date range of January 1st 2025 to April 30th 2025; I wanted to make this data-driven and created some measures to get the minimum and maximum dates from the Transactions table, but I couldn’t get Copilot to use them.

To show how the second rule is handled, the prompt:

Show count of transactions by property type for January

Returns the following:

Unfortunately I couldn’t find a way to reliably stop Copilot suggesting months in 2024 and 2023 in the suggested prompts. Maybe as my prompt engineering skills improve I’ll get better at this.

For more complex semantic models there could be other dimensions where a user should always make a selection – for example, if you’re doing currency conversion in your model you might want end users to always select a currency to convert to. The rules you use in AI Instructions should be similar to these though.

That’s enough for now. I have a list of other ideas for scenarios to handle with AI Instructions for future blog posts but if you have some suggestions, please leave a comment below.