Get Better Results From Power BI Copilot With Linguistic Modelling

Everyone is excited about Power BI Copilot, and the newly-announced preview of being able to use Copilot to ask questions about all the data in your semantic model rather than just what is shown in a report is a massive step forward. However amazing the LLMs used behind the scenes are, though, the quality of the results your users get from this new Copilot preview depends on a number of factors that you as a developer control. How you model your data is very important and as the announcement blog mentions, so is the linguistic schema that was originally added for Copilot’s predecessor feature, Q&A. Copilot returns much better results than Q&A ever did but the combination of Copilot and the extra information that Q&A’s linguistic schema provides (information Copilot could not know unless you told it) makes Copilot even more powerful. What’s more, you don’t need to edit a YAML file to use this functionality any more because most of the features of the linguistic schema are now available to edit in Power BI Desktop’s Q&A Setup dialog.

In this blog post I’ll show you a few examples of how adding to a model’s linguistic schema improves the new Power BI Copilot preview’s results when you’re querying your semantic model.

Semantic Model

Let’s say you own a farm where you grow fruit. Customers visit the farm to buy fruit and the fruit is picked for these customers by your employees. You store your sales data in a Power BI semantic model that looks like this:

The Orders table is a fact table with one row for each order line. The Order ID column identifies the order that each line item is linked to, the Amount column contains the sales value and the Units column contains the number of fruit delivered for each line. The Employee dimension gives the name of the employee who picked the fruit; the Customer dimension gives the name of the customer who ordered the fruit; the Product dimension gives the name of the fruit picked. Products are associated with Product Groups via a many-to-many relationship.

Here’s the data in the dimension tables:

To get the best results from Copilot note that:

  • The data is modelled as a classic star schema.
  • The table and column names are in human-readable English with no abbreviations and with spaces between the words. I talked about my opinions on Power BI naming conventions in this blog post.
  • All tables and columns that should not be shown in a report have been hidden.
  • The fact table measure columns have been hidden and three explicit measures – Order Amount, Order Units and Order Count (which is a distinct count on the Order ID column) – have been created.

Synonyms

While Copilot performs well on this model, let’s look at a simple question where it doesn’t return the expected result:

Show the number of orders by employee

The prompt returns a visual, but on closer inspection it’s not the result you want. It shows the count of rows in the Orders table which is the number of line items, not a count of orders:

To get the correct result you need to tell Copilot that the Order Count measure returns the number of orders by defining a Synonym. You can do this in the Q&A setup dialog in Power BI Desktop on the Synonyms tab:

Setting “number of orders” as a synonym for the Order Count measure means that the prompt now returns the following visual with the results you want:

Verbs

The next prompt to look at is:

Who picked lemons?

You know that on our farm it’s the employees who pick the fruit but there’s nothing in the model to tell Copilot that. As a result the prompt above results in Copilot saying that it doesn’t know what “picked” means in this context:

On the relationships tab of the Q&A Setup dialog you can fix this by defining a Verb relationship:

The relationship tells Copilot that “Employee names” pick “Product names” with the Orders table connecting the two columns.

With this relationship in place, Copilot correctly answers that Gabi was the only employee who picked lemons:

Nouns

The customer Chris is also widely referred to by employees as “Mr Webb”, but that name isn’t stored anywhere in the model. As a result the prompt

How much money did we make from Mr Webb?

results in the following, fairly reasonable, response:

However with a noun relationship set up to tell Copilot that “Mr Webb” is a kind of customer name where customer name equals “Chris”:

Then the result is what you would expect:

Dynamic Nouns

Copilot does a good job with the many-to-many relationship between Product and Product Group without any optimisation. For example the prompt:

show all citrus fruit and their order amounts

Returns the correct result:

But let’s say that in this case you want to show the individual products rather than the product group “citrus fruit”. You can do this by setting up a dynamic noun relationship:

The relationship is that”Product group names” define kinds of “product names” with the Product To Product Group table linking the two. With this in place the prompt now returns the desired result:

Conclusion

These examples barely scratch the surface of what’s possible with the linguistic schema and Copilot. Apart from the documentation, I found the videos on the (fairly old) “Natural Language for Power BI” YouTube channel which were created when Q&A was launched useful for understanding the concepts here too. There’s a lot to learn here but with some trial and error, as well as listening to feedback from your end users, you should be able to tune Copilot so it returns high quality results almost all the time.

Q&A Buttons And Power BI Service Live Connections

A quick Q&A-related tip. If you create a report in Power BI Desktop and use a Power BI service live connection:

image

…and then you put a Q&A button on the report, clicking on the button gives the following error message:

image

Q&A isn’t supported for reports that use DirectQuery data sources, live connections to older Analysis Services data sources, or non-English language models

However, this message only means that Q&A buttons do not work in Power BI Desktop with Power BI service live connections. If you publish the report to the Power BI Service you will find that the Q&A button does work correctly.

image

[Thanks to Seth Bauer and Aaron Meyers for the information]

New Power BI Q&A Functionality Released: Optimisation In The Browser

Seems like another new bit of Power BI functionality got released today: the ability to optimize your data model for Q&A in the browser. Here’s the link to the docs:

http://office.microsoft.com/en-us/office-365-business/power-bi-q-a-optimize-a-power-bi-workbook-cloud-modeling-HA104226408.aspx?redir=0

Previously, the ability to add synonyms to your model to improve the results you got from Q&A was only available in Excel on the desktop, inside the Power Pivot window. Now you can do this, as well as new stuff like add phrasings (described here) and view usage reports, in your Power BI site.

I won’t repeat what the docs say about the actual functionality, but this seems to be yet more evidence that Excel on the desktop is no longer the central hub for Power BI. If this is the case, this is a massive strategic change, and I can understand why it has happened: the need for the ‘right’ version of Excel on the desktop is a massive roadblock for Power BI adoption, especially in enterprise accounts (see also Jen Underwood’s comments on this from yesterday). Maybe now it’s BI in the browser instead?

Power BI Review, Part 2.5: Q&A (Now I’ve Used It With My Data)

A few months ago I posted a review of Q&A, the natural language query functionality in Power BI, based on the sample data sets that were then available. Last week, finally, we got the news that we could enable Q&A on our own Power Pivot models, and having played with this new release I thought it was a good idea to post an update to my original thoughts.

The first thing to point out is that even if you’ve got a Power BI Preview tenant you will need the latest version of Power Pivot for Excel to be able to get the best out of Q&A. This latest release contains some new functionality to add ‘Synonyms’ to the model – what this means is that it allows you, as a model creator, to tell Power BI about other names that end users might use when querying your model. For example on a Geography dimension you might have a column called State but if you are a multinational company you may find that while your State column contains the names of states in the USA, it might contain the names of cantons in Switzerland, counties in the UK, departments in France and so on. As a result you will want Power BI to know that if a user asks for sales by county in the UK that it should actually look in the State column. Devin Knight has already written a good post showing how synonyms work with Q&A which you can see here.

Another complication is that, at the time of writing, the Synonym functionality is only available to users who have installed the streamed version of Office 2013 from Office 365. I have an Office 365 subscription but I had installed Office from an msi before that, so I had to uninstall Office and reinstall the streamed version to be able to see Synonyms – I assume that support for Synonyms in the non-streamed version of Excel will come at some point soon in the future, but in general I would expect that new Power BI functionality will appear first in the streamed version of Office first so if you’re serious about BI you should change over to it as soon as you can. Melissa Coates has a lot more detail on this issue here.

But enough about setup, what about Q&A? The data that I tested it on was a model I’ve been using for user group and conference demos for about six months now, which contains data from the UK’s Land Registry and details all residential property transactions in England and Wales in 2013. It’s fairly simple – two tables, a date table and a transactions table containing around 0.5 million rows – so probably a lot simpler than the average Power Pivot model, but nonetheless real data and one which had been polished for demo purposes. The Excel file holding it is around 25MB so I was well within the Power BI file size limits.

My initial impression after I had added my existing model (with no synonyms etc) to Q&A was that while it worked reasonably well, it worked nowhere near as well as the demo models I had seen. I then set about making changes to the model and re-uploading it, and these changes made all the difference. Some examples of the things I did are:

  • Changed table and column names. In my model I had already taken the trouble to make them human readable, but this did not necessarily mean they were suitable for Q&A. For example, my main fact table was called ‘Land Registry’, so at first Q&A kept suggesting questions like “How many land registries were there in June…” which clearly makes no sense. Renaming the fact table to ‘Sales’ fixed this.
  • Setting synonyms. Unsurprisingly, this had a big impact on usability in the same way that changing the table and column names did. I found that I had to go through several iterations of uploading the data, writing questions, seeing what worked and what didn’t, and adding more synonyms before I had a set that I was happy with; I can imagine that in the real world you’d need to round up several end users and lock them in a room to see how they phrased their questions so as to get a really good list of synonyms for them.
  • Setting Power View-related properties. This included setting the Default Field Set on a table, so I only saw a few important fields in a meaningful order when Q&A returned a table result; and also Summarize By so that Q&A didn’t try to aggregate year values. All of this makes sense given how closely-related Q&A and Power View are, but even though I had a reasonably ‘finished’ model to start off with I still hadn’t set all of these properties because I knew I was never going to try to sum up a year column.
  • Adding new columns. There were a number of cases where I realised that I, as a human user, was able to make assumptions about the data that Q&A could not. For example the source data records sales of four different types of residential property: terraced, detached, semi-detached and flat. The first three are types of house, but the source data doesn’t actually state that they are types houses anywhere so in order to see the total number of sales of houses I had to add another column to explicitly define which property types were houses.
  • Disambiguation. Probably the most irritating thing about the Bing geocoding service that Power View and Q&A use is the way it always chooses a US location when you give it an ambiguous place name. Therefore when looking at sales by town I would see the town name “Bristol” show up on the map as Bristol, Tennessee (population 24,821) rather than Bristol in England (population 416,400). Creating a new column with town name and country concatenated stopped this happening.

The Microsoft blog post I referenced above announcing Q&A promises that a more detailed guide to configuring models for Q&A will be published soon, which is good news. The important point to take away from this, though, is that even the most polished Power Pivot models will need additional tweaks and improvements in order to get the best out of Q&A.

The big question remains, though, whether Q&A will be something that end users actually get some value from. As a not-very-scientific test of this I handed my laptop over to my wife (who has no experience of BI tools but who has a healthy interest in property prices) to see how easy it was for her to use, and straight away she was able to write queries and find the information she was looking for, more or less. There were a still few cases where Q&A and/or my model failed, such as when she searched for “average house price in Amersham” – the model has a measure for “average price”, it knows about the property type “house” and the town “Amersham”, but “average house price” confused it and the query had to be rewritten as “average price of a house in Amersham”. Overall, though, I was pleasantly surprised and as a result I’m rather less sceptical than I was about Q&A’s usefulness, even if I’m still not 100% convinced yet.