Get Better Results From Power BI Copilot With Linguistic Modelling

Everyone is excited about Power BI Copilot, and the newly-announced preview of being able to use Copilot to ask questions about all the data in your semantic model rather than just what is shown in a report is a massive step forward. However amazing the LLMs used behind the scenes are, though, the quality of the results your users get from this new Copilot preview depends on a number of factors that you as a developer control. How you model your data is very important and as the announcement blog mentions, so is the linguistic schema that was originally added for Copilot’s predecessor feature, Q&A. Copilot returns much better results than Q&A ever did but the combination of Copilot and the extra information that Q&A’s linguistic schema provides (information Copilot could not know unless you told it) makes Copilot even more powerful. What’s more, you don’t need to edit a YAML file to use this functionality any more because most of the features of the linguistic schema are now available to edit in Power BI Desktop’s Q&A Setup dialog.

In this blog post I’ll show you a few examples of how adding to a model’s linguistic schema improves the new Power BI Copilot preview’s results when you’re querying your semantic model.

Semantic Model

Let’s say you own a farm where you grow fruit. Customers visit the farm to buy fruit and the fruit is picked for these customers by your employees. You store your sales data in a Power BI semantic model that looks like this:

The Orders table is a fact table with one row for each order line. The Order ID column identifies the order that each line item is linked to, the Amount column contains the sales value and the Units column contains the number of fruit delivered for each line. The Employee dimension gives the name of the employee who picked the fruit; the Customer dimension gives the name of the customer who ordered the fruit; the Product dimension gives the name of the fruit picked. Products are associated with Product Groups via a many-to-many relationship.

Here’s the data in the dimension tables:

To get the best results from Copilot note that:

  • The data is modelled as a classic star schema.
  • The table and column names are in human-readable English with no abbreviations and with spaces between the words. I talked about my opinions on Power BI naming conventions in this blog post.
  • All tables and columns that should not be shown in a report have been hidden.
  • The fact table measure columns have been hidden and three explicit measures – Order Amount, Order Units and Order Count (which is a distinct count on the Order ID column) – have been created.

Synonyms

While Copilot performs well on this model, let’s look at a simple question where it doesn’t return the expected result:

Show the number of orders by employee

The prompt returns a visual, but on closer inspection it’s not the result you want. It shows the count of rows in the Orders table which is the number of line items, not a count of orders:

To get the correct result you need to tell Copilot that the Order Count measure returns the number of orders by defining a Synonym. You can do this in the Q&A setup dialog in Power BI Desktop on the Synonyms tab:

Setting “number of orders” as a synonym for the Order Count measure means that the prompt now returns the following visual with the results you want:

Verbs

The next prompt to look at is:

Who picked lemons?

You know that on our farm it’s the employees who pick the fruit but there’s nothing in the model to tell Copilot that. As a result the prompt above results in Copilot saying that it doesn’t know what “picked” means in this context:

On the relationships tab of the Q&A Setup dialog you can fix this by defining a Verb relationship:

The relationship tells Copilot that “Employee names” pick “Product names” with the Orders table connecting the two columns.

With this relationship in place, Copilot correctly answers that Gabi was the only employee who picked lemons:

Nouns

The customer Chris is also widely referred to by employees as “Mr Webb”, but that name isn’t stored anywhere in the model. As a result the prompt

How much money did we make from Mr Webb?

results in the following, fairly reasonable, response:

However with a noun relationship set up to tell Copilot that “Mr Webb” is a kind of customer name where customer name equals “Chris”:

Then the result is what you would expect:

Dynamic Nouns

Copilot does a good job with the many-to-many relationship between Product and Product Group without any optimisation. For example the prompt:

show all citrus fruit and their order amounts

Returns the correct result:

But let’s say that in this case you want to show the individual products rather than the product group “citrus fruit”. You can do this by setting up a dynamic noun relationship:

The relationship is that”Product group names” define kinds of “product names” with the Product To Product Group table linking the two. With this in place the prompt now returns the desired result:

Conclusion

These examples barely scratch the surface of what’s possible with the linguistic schema and Copilot. Apart from the documentation, I found the videos on the (fairly old) “Natural Language for Power BI” YouTube channel which were created when Q&A was launched useful for understanding the concepts here too. There’s a lot to learn here but with some trial and error, as well as listening to feedback from your end users, you should be able to tune Copilot so it returns high quality results almost all the time.

16 thoughts on “Get Better Results From Power BI Copilot With Linguistic Modelling

  1. This is really interesting! Any suggestion as to when will that feature come where based on the data in the datawarehouse which feeds the Power BI , where user can ask the question, the co-pilot generates the query and gives the response.

      1. Chris, thank you for the helpful article! Could you please clarify whether Copilot or its prompts can be customized similarly to how QA can be customized using a .yaml file? I would appreciate detailed steps on how to achieve this.

        For example, we would like to restrict Copilot to work with a limited set of tables from the model or customize its prompts so that if users ask irrelevant questions, it responds with a specific message.

      2. Copilot is (partially) built on top of Q&A and it uses the same Linguistic Schema, which you can export to a yaml file, that Q&A uses. All the docs relevant to Q&A are relevant to Copilot. You can make sure Copilot only considers certain table using the “include in Q&A” property

  2. Nice blog Chris, with some great hints. I’ve found it very hard to get people to engage with good naming conventions and synonyms. I like your reference to “Number of” instead of count, it reminds me of the blank looks I got last time I proposed that. Maybe AI is the leverage to changing the way people think about naming things and natural language query.

  3. Gracias Cris; veía a Copilot como algo de un mundo fuera de este mundo, lo veía muy sofisticado y pensaba que no podría obtener un beneficio real; a partir de ahora Copilot es parte esencial de mis informes

  4. This is great, Chris, and fits with thinking we have been doing.
    A few thoughts:
    Firstly. Embedding this kind of intelligence into Power BI is a great first step however it seems to us that the best place to store it would be a more centralised and accessible business glossary. We have been exploring Purview with mixed results but ideally a business glossary should be in a repository familiar to and queryable by the business (sharepoint, confluence, clickup etc).
    Secondly, this whole thinking aligns to the very exciting developments occurring around LLM Retrieval Augmented Generation (RAG). This involves integrating LLMs and local data into your queries to optimise results as you have done here. We are looking out for an accessible implementation of ‘entity recognition and linking’ which identifies our own entities and can understand them in context.
    Finally, unless you know otherwise, can we just take a moment to bemoan that there is a somewhat prohibitive minimum SKU for this. Given that Microsoft seem to be betting the ranch on Copilot added (cheaply!) to almost all applications it seems curious that in such a key area the bar is still quite high.

  5. This is great, Chris, and completely aligns with thinking we have been doing. Couple of thoughts:
    1. We have been looking at Retrieval Augmented Generation (RAG) of integrating local data with LLM capabilities to optimise the answer of questions in a your own business context. We are waiting for a really accessible implementation of this to carry out ‘named entity recognition and linking’ to identify our own business entities.
    2. Though defining this schema in PowerBI is a great first start, ultimately we think it should be stored in an accessible business-focused database. We have been looking at Purview with mixed results, but really we were thinking about general business repositories like sharepoint, confluence etc.

    1. A lot of things will change with the Q&A deprecation; I’ve been told that only Synonyms will still work out of the Q&A options. Also some of the tips I’ve blogged about for Copilot have stopped working in the last few months although I have hopes that they will start working again at some point. So it’s all in flux.

Leave a Reply to Michael MertensCancel reply