Chris Webb's BI Blog: AI Instructions

Continuing my series on Power BI Copilot AI Instructions (see also my previous post which is the only other one so far), in this post I’d like to show some examples of how you can specify groups of columns and apply row filters in the results returned by Copilot.

Using the same semantic model that I used in my previous post, which contains data for UK real estate sales from the Land Registry, consider the following prompt:

Show sales of luxury houses in the Chalfonts

Even with a well-designed semantic model that follows all best practices I would never expect a good answer from Copilot for this question without supplying extra information in AI Instructions. Indeed, here’s what Copilot responds on a version of the model with no AI Instructions:

Copilot is doing the right thing and asking the end user to clarify several aspects of the question in the prompt. Ideally, though, these clarifications would not be necessary. There are several questions that need to be answered by the semantic model designer before this prompt will return the answer an end user expects:

When you ask for “sales”, which columns from the semantic model should be returned?
Since there is no property type called “house” in the data, when you say “houses” which property types do you mean exactly?
What does “luxury” mean?
Since there is no town or locality called “the Chalfonts” in the data, what do you mean by this?

Let’s look at what needs to be added to the AI Instructions for this model to answer these questions.

Grouping columns

Starting with the first question of what columns should be returned when the user asks for “sales”, let’s say that the user expects to see certain columns from the Transactions table in a certain order. Here’s what the data in the Transactions table looks like:

Each row represents a single real estate transaction. There are columns for the price paid in the transaction, the date of the transaction, the type of the property, and several columns that represent the different parts of the address of the property (full details of what the columns mean can be found here). If the user asks for “sales” let’s assume that they want to see the address of the property, the date of the transaction and the price paid.

Here is what I added to the AI Instructions to achieve this:

##Instructions for displaying addresses and what users call sales
The source data for this semantic model comes from the UK Land Registry Price Paid dataset
The transactions table contains one row for each real estate transaction
The address of the property sold in each transaction consists of the following columns in this exact order:
* PAON
* SAON
* Street
* Locality
* Town/City
* County
* Postcode
When a user asks for a list of sales, always show the address of the properties involved plus the Date and Price columns

Filtering rows

The other three questions that need to be answered all involve some kind of filtering of rows.

First of all, let’s define what “houses” are. The Property Types dimension table looks like this:

The types “semi detached”, “detached” and “terraced” are all types of house.

Next: what does “luxury” mean? Let’s say that “luxury” properties are properties that are sold for a price of over £1 million.

Finally, what does “the Chalfonts” mean? Here’s the Wikipedia page that explains it: it’s a collective name for three towns and villages near where I live: Little Chalfont, Chalfont St Giles and Chalfont St Peter. As far as the address data in the Transactions table is concerned only Chalfont St Giles is a town appearing in the Town/City column; Little Chalfont and Chalfont St Peter are villages and therefore their names appear in the Locality column.

Here are the AI Instructions I added for these rules:

##Instructions for types of filter that users may request
Houses are property types detached, semi detached and terraced
Luxury refers to properties with a price of over 1000000
If a user asks for sales in "the Chalfonts" this means sales where either 
* the Town/City column is CHALFONT ST GILES, or
* the Locality column is CHALFONT ST PETER, or
* the Locality column is LITTLE CHALFONT

Results

With both of these sets of rules added on the AI Instructions page, here’s what the original prompt now returns:

The table returns exactly what I want; the text summary below with properly formatted addresses is a nice touch. Here’s the DAX query generated:

// DAX query generated by Fabric Copilot with "Show sales of luxury houses in the Chalfonts"
EVALUATE
  SELECTCOLUMNS(
    // Filter transactions for luxury houses in the Chalfonts.
    FILTER(
      'Transactions',
      'Transactions'[Price] > 1000000 && // Luxury: price over 1,000,000
      (
         'Transactions'[Town/City] = "CHALFONT ST GILES" ||  // Chalfonts filter based on Town/City
         'Transactions'[Locality] = "CHALFONT ST PETER" ||     // or Locality
         'Transactions'[Locality] = "LITTLE CHALFONT"
      ) &&
      // Houses: only Detached, Semi Detached, or Terraced properties
      RELATED('PropertyTypes'[Property Type Name]) IN {"Detached", "Semi Detached", "Terraced"}
    ),
    "PAON", 'Transactions'[PAON],
    "SAON", 'Transactions'[SAON],
    "Street", 'Transactions'[Street],
    "Locality", 'Transactions'[Locality],
    "Town/City", 'Transactions'[Town/City],
    "County", 'Transactions'[County],
    "Postcode", 'Transactions'[Postcode],
    "Date", 'Transactions'[Date],
    "Price", 'Transactions'[Price]
  )
ORDER BY
  [Date] ASC

The DAX query is well-written and returns the correct results; the comments in the code and the explanation of the query underneath it is useful too. The explanation also reveals that Copilot understands terms like PAON, which stands for “primary addressable object name”, definitions that I didn’t give it but which are in the official documentation linked to above.

This example shows how much more control AI Instructions give you over the results Copilot returns now, a lot more than you had a few months ago when the only way to influence results was via the Q&A Linguistic Schema. I’m also coming to realise how much work is involved in preparing a semantic model for Copilot using AI Instructions: it’s probably as much as building the semantic model in the first place. The improvement in the quality of results that this work brings is worth the effort, though, and I’m pretty sure that there’s no way of avoid this. Anyone who tells you that their tool for querying data with natural language “just works” and doesn’t need this amount of prep work is probably selling snake oil.

## Instructions about which columns must be selected in different scenarios The user must always specify a filter on either date, month or year in their question. If they do not, you must ask them to specify a filter on either a date, a combination of month and year, or a year. If you suggest a date, month or year filter to the end user you must only suggest selections between January 1st 2025 and April 30th 2025. If the user asks for data filtered by a month they must always specify a year as well.