Continuing my series of posts on Power BI Copilot and the type of things you should be including in AI Instructions, today I want to talk about helping the end user understand the scope of the data: what data your model does contain, what data it could contain but doesn’t, and what data it could never contain.
Let me show you some examples to explain what I mean, using the semantic model I introduced in the first post in this series containing real estate transaction data from the Land Registry in the UK. On a model with no AI Instructions added, if I give Power BI Copilot the prompt:
show average price paid for the city manchester
I get the correct answer back:
That’s good and as you would expect. Now consider the following two prompts which both return no data:
show average price paid for the city edinburgh
show average price paid for the locality Boode
A user asking a question and not getting an answer is unavoidable in many cases but it still represents a failure: the user wanted information and the model couldn’t help. As a semantic model developer you need to minimise the impact of this and help the user understand why they didn’t get an answer. In both cases Copilot does the right thing and tells the user that it can’t find data for these places. However, there’s an important difference between these two places.
Boode is a very small village (somewhere I found at random) in England. There’s no data for it in the semantic model because there happen to be no real estate sales for it. Edinburgh on the other hand is a large city, so why is there no data? It’s because this semantic model only contains data for England and Wales, not Scotland, Northern Ireland or anywhere else. An end user might expect that a dataset from a UK government agency would contain data for the whole of the UK but this one doesn’t. If the user was looking at a report displaying this data then this information would probably be displayed somewhere in a textbox on an information page, or be obvious from the values visible in slicers or filters, but with Copilot we have to tell the user this in other ways.
Similarly, consider the following two prompts:
show average price paid for January 2025
show average price paid for January 2024
Why is a blank displayed for January 2024? Were there no sales in January 2024? The answer is that the semantic model only contains data between January 1st 2025 and April 30th 2025, so there’s no point asking for data before or after that – and we need to tell the user so.
Here is some text that I added to the AI Instructions for this semantic model to tell Copilot what data is and isn’t present:
##What data this semantic model contains
This semantic model only contains data for England and Wales. If the user asks for data outside England and Wales, tell them that this model does not contain the data they are looking for.
This model only contains data for the date range 1st January 2025 to 30th April 2025. Tell the user this if they ask for data outside this date range.
With these instructions in place let’s rerun some of the prompts above. First, let’s ask about Edinburgh again:
This is a much more helpful response for the user I think. Copilot knows that Edinburgh is in Scotland and it tells the user that it only has data for England and Wales. Copilot doesn’t know every place name in Scotland but it does pretty well; for example (to take another random place name) it knows that Arrina is in Scotland:
Similarly, here’s what Copilot now says for January 2024:
Again, a much better response than before: it explicitly tells the user that the model only contains data for January 1st 2025 to April 30th 2025.
Giving Power BI Copilot clear instructions on what data the semantic model does and doesn’t contain means that it can set end users’ expectations correctly. This then means that users are more likely to ask questions that give them useful information and therefore builds trust. Conversely, not explaining to end users why their questions are returning data means they are less likely to want to use Copilot in the future.
As you probably know, I work on the Fabric Customer Advisory Team at Microsoft. Apart from advising customers a lot of our team’s time is spent collecting feedback about Fabric and sending it back to the relevant people in the Fabric product group so they can fix the problems that need fixing and build the features that need building. We have always worked with different communities of customers to collect this feedback but now we’re launching a much larger-scale programme: the Fabric User Panel. If you’re the type of person who reads my blog then you’re the type of person we want to recruit for it.
When you join the Fabric User Panel you will be able to:
Meet with Fabric product managers, designers, researchers, and engineering teams
Share your real-world experiences to help improve Fabric
Joining the panel is free, and you can choose to leave the panel at any time.
Continuing my series on Power BI Copilot AI Instructions (see also my previous post which is the only other one so far), in this post I’d like to show some examples of how you can specify groups of columns and apply row filters in the results returned by Copilot.
Using the same semantic model that I used in my previous post, which contains data for UK real estate sales from the Land Registry, consider the following prompt:
Show sales of luxury houses in the Chalfonts
Even with a well-designed semantic model that follows all best practices I would never expect a good answer from Copilot for this question without supplying extra information in AI Instructions. Indeed, here’s what Copilot responds on a version of the model with no AI Instructions:
Copilot is doing the right thing and asking the end user to clarify several aspects of the question in the prompt. Ideally, though, these clarifications would not be necessary. There are several questions that need to be answered by the semantic model designer before this prompt will return the answer an end user expects:
When you ask for “sales”, which columns from the semantic model should be returned?
Since there is no property type called “house” in the data, when you say “houses” which property types do you mean exactly?
What does “luxury” mean?
Since there is no town or locality called “the Chalfonts” in the data, what do you mean by this?
Let’s look at what needs to be added to the AI Instructions for this model to answer these questions.
Grouping columns
Starting with the first question of what columns should be returned when the user asks for “sales”, let’s say that the user expects to see certain columns from the Transactions table in a certain order. Here’s what the data in the Transactions table looks like:
Each row represents a single real estate transaction. There are columns for the price paid in the transaction, the date of the transaction, the type of the property, and several columns that represent the different parts of the address of the property (full details of what the columns mean can be found here). If the user asks for “sales” let’s assume that they want to see the address of the property, the date of the transaction and the price paid.
Here is what I added to the AI Instructions to achieve this:
##Instructions for displaying addresses and what users call sales
The source data for this semantic model comes from the UK Land Registry Price Paid dataset
The transactions table contains one row for each real estate transaction
The address of the property sold in each transaction consists of the following columns in this exact order:
* PAON
* SAON
* Street
* Locality
* Town/City
* County
* Postcode
When a user asks for a list of sales, always show the address of the properties involved plus the Date and Price columns
Filtering rows
The other three questions that need to be answered all involve some kind of filtering of rows.
First of all, let’s define what “houses” are. The Property Types dimension table looks like this:
The types “semi detached”, “detached” and “terraced” are all types of house.
Next: what does “luxury” mean? Let’s say that “luxury” properties are properties that are sold for a price of over £1 million.
Finally, what does “the Chalfonts” mean? Here’s the Wikipedia page that explains it: it’s a collective name for three towns and villages near where I live: Little Chalfont, Chalfont St Giles and Chalfont St Peter. As far as the address data in the Transactions table is concerned only Chalfont St Giles is a town appearing in the Town/City column; Little Chalfont and Chalfont St Peter are villages and therefore their names appear in the Locality column.
Here are the AI Instructions I added for these rules:
##Instructions for types of filter that users may request
Houses are property types detached, semi detached and terraced
Luxury refers to properties with a price of over 1000000
If a user asks for sales in "the Chalfonts" this means sales where either
* the Town/City column is CHALFONT ST GILES, or
* the Locality column is CHALFONT ST PETER, or
* the Locality column is LITTLE CHALFONT
Results
With both of these sets of rules added on the AI Instructions page, here’s what the original prompt now returns:
The table returns exactly what I want; the text summary below with properly formatted addresses is a nice touch. Here’s the DAX query generated:
// DAX query generated by Fabric Copilot with "Show sales of luxury houses in the Chalfonts"
EVALUATE
SELECTCOLUMNS(
// Filter transactions for luxury houses in the Chalfonts.
FILTER(
'Transactions',
'Transactions'[Price] > 1000000 && // Luxury: price over 1,000,000
(
'Transactions'[Town/City] = "CHALFONT ST GILES" || // Chalfonts filter based on Town/City
'Transactions'[Locality] = "CHALFONT ST PETER" || // or Locality
'Transactions'[Locality] = "LITTLE CHALFONT"
) &&
// Houses: only Detached, Semi Detached, or Terraced properties
RELATED('PropertyTypes'[Property Type Name]) IN {"Detached", "Semi Detached", "Terraced"}
),
"PAON", 'Transactions'[PAON],
"SAON", 'Transactions'[SAON],
"Street", 'Transactions'[Street],
"Locality", 'Transactions'[Locality],
"Town/City", 'Transactions'[Town/City],
"County", 'Transactions'[County],
"Postcode", 'Transactions'[Postcode],
"Date", 'Transactions'[Date],
"Price", 'Transactions'[Price]
)
ORDER BY
[Date] ASC
The DAX query is well-written and returns the correct results; the comments in the code and the explanation of the query underneath it is useful too. The explanation also reveals that Copilot understands terms like PAON, which stands for “primary addressable object name”, definitions that I didn’t give it but which are in the official documentation linked to above.
This example shows how much more control AI Instructions give you over the results Copilot returns now, a lot more than you had a few months ago when the only way to influence results was via the Q&A Linguistic Schema. I’m also coming to realise how much work is involved in preparing a semantic model for Copilot using AI Instructions: it’s probably as much as building the semantic model in the first place. The improvement in the quality of results that this work brings is worth the effort, though, and I’m pretty sure that there’s no way of avoid this. Anyone who tells you that their tool for querying data with natural language “just works” and doesn’t need this amount of prep work is probably selling snake oil.
By the time you read this it’s likely the new, standalone Power BI “Chat with your data” experience will be available in your tenant. I just enabled it in the tenant the Fabric CAT team uses for testing purposes, located in the West Europe region, after the option to do so appeared yesterday. If you’re interested in Power BI Copilot you’re going to want to check it out.
However, there’s something else that this new functionality offers as well as the ability to ask questions about any of your data: it makes it easier to find the reports and semantic models that you have access to. Like a lot of Power BI tenants, the CAT team’s own tenant is a bit of a mess. There are lots of workspaces, lots of reports, and frankly lots of junk (which I’m to blame for creating as much as anyone else). As a result it can be hard to find the reports and semantic models that you want to use. Power BI does have a search feature which works fairly well but the new Power BI Copilot Search works a lot better.
Why is a better search needed for Copilot and “Chat with your data”? Well if you’re starting with an empty screen and asking Copilot a question like “Who sold the most widgets in 2024?” the first problem to be solved is which report, semantic model or data agent should be used to find the answer – there could be a lot of different potential sources. Finding the right source of information is key to making sure the user gets the best possible results from Copilot. But, as I said, even if all you want to do is find and open a report that you know exists somewhere, Power BI Copilot Search is incredibly useful.
There’s a very detailed docs page here that tells you how to enable Copilot Search and what metadata it looks at. The fact that it not only looks at item names and descriptions but also – amongst other things – report page names, visual titles, filter pane titles, the contents of text boxes, whether you have favourited the report and how recently you opened it, explains why it works so well. For example I have a report that I use for demos with the typically unhelpful name of “CopilotDemo_WithQAOptimisations”. Here’s what the first page looks like:
The following search term:
Find the report for Fruity Farms with order units by product
…returns this report as the top result:
Why did the right report come out on top? As you can see from the first screenshot, “Fruity Farms” is mentioned in the textbox at the top of the report page and the visual on the left hand side shows Order Units broken down by Product; the term “Fruity Farms” is also mentioned in the report description. To open the report you just need to click on the link in the results.
Of course now that this feature exists you’ll want to optimise your reports, models and data agents for it. The docs cover what you need to do here but if you don’t want to change the name of the item or anything within it that the user might already be used to seeing, my suggestion is to add or alter the description property (which, to be honest, people very rarely set) on the item’s Settings page and think about using endorsement if you’re not doing so already.
As a Power Query fan I’m naturally interested in Copilot in Dataflows Gen2. But since Power Query is already very easy to use and since I’m pretty experienced at using it, can Copilot really do anything to help me be more productive? Well one thing I need to do a lot for blog posts, demos and testing is to generate tables of sample data. Typing data into the Enter Data source can be quite tedious and writing the M code to generate sample data isn’t straightforward, so I was wondering if Copilot in Dataflows Gen2 could do it for me. It turns out it can do so really well.
After a little bit of experimentation I’ve come up with an example prompt that can be easily modified to return a table of random values with columns of different data types and a given number of rows, and which works fairly reliably:
Return a table consisting of the following five columns:
Country, containing one of 10 random country names;
Product, containing one of 15 random fruit names;
Person, containing one of 20 random first names that are common in the United Kingdom;
Date, containing random dates between January 1st 2019 and December 31st 2024;
Sales, containing random integers between 0 and 500.
The table should have 1000 rows. Set an appropriate data type for each column.
Here’s an example of the M code generated:
let
// Generate lists of random values
Countries = {"United States", "Canada", "United Kingdom", "Germany", "France", "Italy", "Spain", "Australia", "India", "China"},
Products = {"Apple", "Banana", "Cherry", "Date", "Elderberry", "Fig", "Grape", "Honeydew", "Kiwi", "Lemon", "Mango", "Nectarine", "Orange", "Papaya", "Quince"},
People = {"James", "Mary", "John", "Patricia", "Robert", "Jennifer", "Michael", "Linda", "William", "Elizabeth", "David", "Barbara", "Richard", "Susan", "Joseph", "Jessica", "Thomas", "Sarah", "Charles", "Karen"},
StartDate = #date(2019, 1, 1),
EndDate = #date(2024, 12, 31),
DaysBetween = Duration.Days(EndDate - StartDate) + 1,
// Function to generate random dates
RandomDate = () => Date.From(StartDate + #duration(Number.RoundDown(Number.RandomBetween(0, DaysBetween)), 0, 0, 0)),
// Generate table records
Records = List.Transform({1..1000}, each [
Country = Countries{Number.Mod(_, List.Count(Countries))},
Product = Products{Number.Mod(_, List.Count(Products))},
Person = People{Number.Mod(_, List.Count(People))},
Date = RandomDate(),
Sales = Number.RoundDown(Number.RandomBetween(0, 500))
]),
// Convert list of records to table and set data types
Source = Table.FromRecords(Records),
TypedTable = Table.TransformColumnTypes(Source, {
{"Country", type text},
{"Product", type text},
{"Person", type text},
{"Date", type date},
{"Sales", Int32.Type}
})
in
TypedTable
And here’s an example of the output:
Definitely a time saver as far as I’m concerned. Is it totally reliable? No: it occasionally produces code that errors or which doesn’t contain genuinely random values, but it’s good enough and it’s faster to try the prompt once or twice than write the code myself. I know there are other, more sophisticated ways of generating sample data like this in Fabric, for example in Python, but as I said I’m a Power Query person.
Over the last few years one topic I have blogged about several times is how to link the detailed data about DAX query execution that can be found in Log Analytics – and now Workspace Monitoring – to the visual in a Power BI report that generated those DAX queries, something that is extremely useful when you’re performance tuning Power BI reports. My first post here from 2021 showed how write the KQL for Log Analytics but the problem has always been how to get the IDs of the visuals in a Power BI report. You can get the IDs from the definition of the Power BI report, as shown here, and Sandeep Pawar has a great post on some other methods here, but all these methods were superseded in the March release of Power BI with the ability to copy the IDs by right clicking on the visual in a Power BI report when editing it (thank you Rui Romano!).
This made me realise that it’s time to revisit my first post on how to get the query details in KQL since the column names in Workspace Monitoring are slightly different from Log Analytics and, indeed, the KQL needed can be simplified from my original version. Here’s a KQL query that you can run in a KQL Queryset connected to your Monitoring Eventhouse:
let VisualId = "InsertVisualIdHere";
SemanticModelLogs
| search VisualId
| project Timestamp, OperationName, OperationDetailName, DurationMs, EventText, OperationId
| order by Timestamp asc
Once you’ve copied the visual’s ID from the report (remember you need to specifically enable this feature and that you need to be in Edit mode) by right-clicking on it and selecting “Copy object name”:
…then you just need to paste the ID into the let statement in the first line of the KQL query and run it:
You can find the documentation for the columns in the SemanticModelLogs table here and the documentation for the events here. The events for a specific query all have the same value in the OperationId column.
Bonus fact: you can now run queries against Workspace Monitoring using Semantic Link Labs, as documented here, which makes it much easier to do other fun stuff with this data. For example, I can imagine there are ways to visualise DAX query and semantic model refresh events in Python that would make them much easier to analyse, but that’s something for a future blog post.
AI is meant to help us automate boring tasks, and what could be more boring than creating documentation for your Power BI semantic models? It’s such a tedious task that most people don’t bother; there’s also an ecosystem of third party tools that do this job for you, and you can also build your own solution for this using DAX DMVs or the new-ish INFO functions (see here for a good example). That got me wondering: can you use Fabric Data Agents to generate documentation for you? And what’s more, why even generate documentation when you can just ask a Data Agent the questions that you’d need to generate documentation to answer?
To test this idea out, I dug up a semantic model I’ve used for blog posts in the past with a simple star schema and a few measures:
Here are the definitions of the measures:
Units = SUM(Sales[SalesUnits])
Value = SUM(Sales[SalesValue])
Apples Value =
CALCULATE([Value], KEEPFILTERS('Product'[ProductName]="Apples"), USERELATIONSHIP('Product'[ProductID], 'Sales'[ProductID]))
Apples Value Forecast = [Apples Value] * 1.1
After publishing this model I created a Data Agent directly connected to it and found I couldn’t access the metadata I needed from it (I didn’t try too hard, but I don’t think it was possible) so I then created a second Import-mode semantic model, used the first as a data source and created a single table bound to the following DAX query using the INFO.CALCDEPENDENCY function:
EVALUATE INFO.CALCDEPENDENCY()
This second semantic model therefore contained a list of all the DAX calculations, their definitions and the dependencies between them, from the first semantic model. Here’s that table:
I then published this second semantic model, built a Data Agent on it, and used it for all my tests. I also added some instructions to the Data Agent – I freely admit I have no idea whatsoever about prompt engineering but after a bit of trial and error the following instructions did the trick:
You are a Power BI semantic model developer.
The CalcDependencies table contains the output of the DAX INFO.CALCDEPENDENCY() function for another Power BI semantic model. It contains information about the dependencies between the columns, tables, measures, calculated columns, calculated tables and relationships in that model.
Each row in the CalcDependencies table represents a dependency between two objects.
The combination of values in the OBJECT_TYPE, TABLE and OBJECT column uniquely identifies the object which is the source of the dependency.
The combination of values in the REFERENCED_OBJECT_TYPE, REFERENCED_TABLE and _REFERENCED_OBJECT columns uniquely identifies the object which is the target of the identity.
You can join the table to itself multiple times to find chains of dependencies between objects.
When I ask about dependencies, please include direct dependencies and also indirect dependencies, for example where one object has a dependency on another object which in turn has a dependency on the object I am asking about.
The EXPRESSION column contains the DAX definition of an object.
A value of CALC_COLUMN in the OBJECT_TYPE table indicates that the object is a calculated column.
First test: create some documentation. The following prompt:
Generate some documentation for the measures listed in the CalcDependencies table. Describe what each calculation does and how it works.
…generated the following output:
Here is the documentation for the measures listed in the CalcDependencies table, describing what each calculation does and how it works:
Measures
1. Units
Definition:Â SUM(Sales[SalesUnits])
Description: The Units measure calculates the total number of sales units. It does this by summing the values in the SalesUnits column in the Sales table. This measure is useful for tracking the total quantity of items sold.
2. Value
Definition:Â SUM(Sales[SalesValue])
Description: The Value measure calculates the total sales value. It achieves this by summing up the values in the SalesValue column in the Sales table. This measure is essential for understanding the total revenue generated from sales.
Description: The Apples Value measure calculates the total sales value specifically for the product “Apples.” It works by applying a filter to include only rows where the ProductName in the Product table is “Apples.” Additionally, it uses a specific relationship between the Product and Sales tables defined by the ProductID columns. This measure helps in analyzing the revenue generated from the sale of apples.
4. Apples Value Forecast
Definition:Â [Apples Value] * 1.1
Description: The Apples Value Forecast measure estimates the future sales value for the product “Apples” based on a forecasted increase. It multiplies the current Apples Value by 1.1 (indicating a 10% increase). This measure is useful for predicting future revenue from apple sales.
These descriptions provide a clear understanding of what each measure does and how it works, helping anyone who looks at the documentation to comprehend the purpose and functionality of each calculation.
The verdict? More than good enough. To be honest, I was expecting the Data Agent to do a good job here though.
Second test: understand the dependencies between calculations. This is an interesting challenge because you can have chained relationships (ie measures that depend on other measures that depend on other measures etc) which makes the DAX harder to write. Also, you can’t easily understand dependencies by just looking at the output of INFO.CALCDEPENDENCIES – most of the third party tools mentioned above solve this by visualising the data with some kind of network diagram.
The following prompt:
Which measures would be affected if I deleted the SalesValue column on the Sales table?
…generated the following output:
If you delete the SalesValue column from the Sales table, the following measures will be affected:
Measure Name:Â Value
Definition:Â SUM(Sales[SalesValue])
Since the Value measure directly depends on the SalesValue column, deleting the SalesValue column will cause the Value measure to break. Additionally, any measures that depend on the Value measure, such as Apples Value and Apples Value Forecast, will also be affected indirectly:
Therefore, deleting the SalesValue column will break the Value, Apples Value, and Apples Value Forecast measures.
The verdict? I’m impressed, this is useful.
Final test: can we generate a visualisation of the dependencies? A few months ago one of my colleagues showed me the Mermaid language for defining diagrams like this and the rather cool Markdown & Mermaid custom visual that allows you to display Mermaid diagrams in Power BI. The following prompt:
Generate an expression in the Mermaid diagramming language that shows all the measures in the model and the dependencies between them
…generated the following response:
Here is an expression in the Mermaid diagramming language that shows all the measures in the model and the dependencies between them:
graph TD
Units --> Units
Value --> Value
Apples_Value --> Value
Apples_Value_Forecast --> Apples_Value
Units["Units"]
Value["Value"]
Apples_Value["Apples Value"]
Apples_Value_Forecast["Apples Value Forecast"]
Explanation:
Units is a measure with no dependencies on other measures.
Value is a base measure.
Apples Value depends on the Value measure.
Apples Value Forecast depends on the Apples Value measure, which in turn depends on the Value measure.
This Mermaid diagram visually represents all the measures in the model and the dependencies between them using arrows.
Pasting the expression in the response into the Markdown & Mermaid custom visual gave me the following:
I wasn’t particularly happy with those messy lines linking the Value and Units measures to themselves, but overall it does the job pretty well.
This blog post is already long enough and while these are simple tests, I think I’ve proved my point that Fabric Data Agents can be very useful when it comes to understanding the structure of a semantic model and the calculations in it. Bringing in data from other DAX INFO functions and tuning the AI Instructions would allow you to answer even more complex questions. I can also imagine a solution using Semantic Link Labs that not only dumps the data from DAX INFO functions for all your semantic models into a table in OneLake for consumption by a Data Agent, but also then uses the Data Agent SDK to automatically generate markdown documentation for those models, complete with diagrams. The possibilities are endless!
Now that Fabric Data Agents (what used to be called AI Skills) can use Power BI semantic models as a data source I’ve been spending some time playing around with them, and while I was doing that I realised something – maybe something obvious, but I think still worth writing about. It’s that there are a lot of amazing things you can do in DAX that rarely get done because of the constraints of exposing semantic models through a Power BI report, and because Data Agents generate DAX queries they unlock that hitherto untapped potential for the first time. Up until now I’ve assumed that natural language querying of data in Power BI was something only relatively low-skilled end users (the kind of people who can’t build their own Power BI reports and who struggle with Excel PivotTables) would benefit from; now I think it’s something that will also benefit highly-skilled Power BI data analysts as well. That’s a somewhat vague statement, I know, so let me explain what I mean with an example.
Consider the following semantic model:
There are two dimension tables, Customer and Product, and a fact table called Sales with one measure defined as follows:
Count Of Sales = COUNTROWS('Sales')
There’s one row in the fact table for each sale of a Product to a Customer. Here’s all the data dumped to a table:
So, very simple indeed. Even so there are some common questions that an analyst might want to ask about this data that aren’t easy to answer without some extra measures or modelling – and if you don’t have the skills or time to do this, you’re in trouble. One example is basket analysis type questions like this: which customers bought Apples and also bought Lemons? You can’t easily answer this question with the model as it is in a Power BI report; what you’d need to do is create a disconnected copy of the Product dimension table so that a user can select Apples on the original Product dimension table and select Lemons on this new dimension, and then you’d need to write some DAX to find the customers who bought Apples and Lemons. All very doable but, like I said, needing changes to the model and strong DAX skills.
I published my semantic model to the Service and created a Data Agent that used that model as a source. I added two instructions to the Data Agent:
Always show results as a table, never as bullet points
You can tell customers have bought a product when the Count of Sales measure is greater than 0
The first instruction I added because I got irritated by the way Data Agent shows the results with bullet points rather than as a table. The second probably wasn’t necessary because in most cases Data Agent knew that the Sales table represented a sale of a Product to a Customer, but I added it after one incorrect response just to make that completely clear.
I then asked the Data Agent the following question:
Show me customers who bought apples and who also bought lemons
And I got the correct response:
In this case it solved the problem in two steps, writing a DAX query to get the customers who bought lemons and writing another DAX query to get the customers who bought apples and finding the intersection itself:
At other times I’ve seen it solve the problem more elegantly in a single query and finding the customers who bought apples and lemons using the DAX Intersect() function.
I then asked a similar question:
For customers who bought apples, which other products did they buy?
And again, I got the correct answer:
In this case it ran five separate DAX queries, one for each customer, which I’m not thrilled about but again at other times it solved the problem in a single DAX query more elegantly.
Next I tried to do some ABC analysis:
Group customers into two categories: one that contains all the customers with just one sale, and one that contains all the customers with more than one sale. Show the total count of sales for both categories but do not show individual customer names.
And again I got the correct answer:
I could go on but this post is long enough already. I did get incorrect answers for some prompts and also there were some cases where the Data Agent asked for more details or a simpler question – but that’s what you’d expect. I was pleasantly surprised at how well it worked, especially since I don’t have any previous experience with using AI for data analysis, crafting prompts or anything like that. No complex configuration was required and I didn’t supply any example DAX queries (in fact Data Agents don’t allow you to provide example queries for semantic models yet) or anything like that. What does this all mean though?
I’m not going to argue that your average end user is going to start doing advanced data analysis with semantic models using Data Agents. The results were impressive and while I think Data Agents (and Copilot for that matter) do a pretty good job with simpler problems, I wouldn’t want anyone to blindly trust the results for more advanced problems like these. However if you’re a data analyst who is already competent with DAX and is aware that they always need to verify the results they get from Data Agent, I think this kind of DAX vibe-coding has a lot of value. Imagine you’re a data analyst and you’re asked that question about which products customers who bought apples also bought. You could search the web, probably find this article by the Italians, get scared, spend a few hours digesting it, create a new semantic model with all the extra tables and measures you need, and then finally get the answer you want. Maybe you could try to write a DAX query from scratch that you can run in DAX Studio or DAX Query View, but that requires more skill because no-one blogs about solving problems like this by writing DAX queries. Or you could ask a Data Agent, check the DAX query it spits out to make sure it does what you want, and get your answer much, much faster and easier. I know which option I’d choose.
To finish, let me answer a few likely questions:
Why are you doing this with Fabric Data Agents and not Power BI Copilot?
At the time of writing Data Agents, the Power BI Copilot that you access via the side pane in a report and Power BI Copilot in DAX Query View all have slightly different capabilities. Power BI Copilot in the side pane (what most people think of as Power BI Copilot) couldn’t answer any of these questions when I asked them but I didn’t expect it to because even though it can now create calculations it can still only answer questions that can be answered as a Power BI visual. Copilot in DAX Query View is actually very closely related to the Data Agent’s natural language-to-DAX functionality (in fact at the moment it can see and use more model metadata than Data Agent) and unsurprisingly it did a lot better but the results were still not as good as Data Agent. Expect these differences to go away over time and everything I say here about Data Agents to be equally applicable to Power BI Copilot.
This isn’t anything new or exciting – I see people posting about using AI for data analysis all the time on LinkedIn, Twitter etc. What’s different?
Fair point. I see this type of content all the time too (for example in the Microsoft data community Brian Julius and my colleague Mim always have interesting things to say on this subject) and I was excited to read the recent announcement about Analyst agent in M365 Copilot. But typically people are talking about taking raw data and analysing it in Python or generating SQL queries. What if your data is already in Power BI? If so then DAX is the natural way of analysing it. More importantly there are many advantages to using AI to analyse data via a semantic model: all the joins are predefined, there’s a lot of other rich metadata to improve results, plus all those handy DAX calculations (and one day DAX UDFs) that you’ve defined. You’re much more likely to get reliable results when using AI on top of a semantic model compared to something that generates Python or SQL because a lot more of the hard work has been done in advance.
Is this going to replace Power BI reports?
No, I don’t think this kind of conversational BI is going to replace Power BI reports, paginated reports, Analyze in Excel or any of the other existing ways of interacting with data in Power BI. I think it will be a new way of analysing data in Power BI. And to restate the point I’ve been trying to make in this post: conversational BI will not only empower low-skilled end users, it will also empower data analysts, who may not feel they are true “data scientists” but who do have strong Power BI and DAX skills, to solve more advanced problems like basket analysis or ABC analysis much more easily.
If you’re working with slow data sources in Power BI/Fabric dataflows then you’re probably aware that validation (for Gen1 dataflows) or publishing (for Gen2 dataflows) them can sometimes take a long time. If you’re working with very slow data sources then you may run into the 10 minute timeout on validation/publishing that is documented here. For a Gen1 dataflow you’ll see the following error message if you try to save your dataflow and validation takes more than 10 minutes:
Failed to analyze issues in the query
For a Gen2 Dataflow, where you can save the Dataflow and publishing takes place in the background, you’ll see the following error in your workspace:
Dataflow publish failed
Apart from tuning your data source and tuning your queries, what can you do about this? Well one of the things that happens when you publish a dataflow is that it works out the columns returned, and the data types of those columns, for all of the queries in the dataflow. It does this by trying to run the queries until they return data by applying a top 0 row filter to them; if you can make that faster then validation/publishing will be faster. Obviously query folding is important here because that top 0 filter should fold, as are more obscure, source-specific settings like this one for ODBC sources. However, there is another trick that you can use if you are happy writing some moderately complicated M code – the trick I blogged about here for making Power Query in Power BI Desktop faster.
Let’s see an example with Dataflows Gen2. Conside the following M code which returns a table with three columns and is deliberately written to take 11 minutes and 1 second to return (see this post for more details on how to create artificially slow Power Query queries).
let
Source = Function.InvokeAfter(
() =>
#table(
type table
[
#"Number Column"=number,
#"Text Column"=text,
#"Date Column"=date
],
{
{1,"Hello",#date(2016,1,1)},
{2,"World",#date(2017,12,12)}
}
)
,
#duration(0, 0, 11, 1)
)
in
Source
As you would expect, trying to publish a Gen1 or Gen2 dataflow that uses this query will fail because it takes more than 10 minutes before it returns any rows. However in this case – as in most cases – you know what columns the query returns so it’s possible to use the Table.View M function to intercept the zero-row filter applied during validation/publishing and return a table with no rows in and the columns that the query above returns. You can do this by adding two extra steps in the M code like so:
The first step added here, called TableTypeToReturn, defines the columns and data types of the table returned by the query; if you use this technique yourself, you will need to alter it so it returns the columns and data types of your query. You can read more about #table and table types here and I have a function that will automatically generate this code from an existing query for you here. The second step, called OverrideZeroRowFilter, looks for situations where a Top N filter is being applied and if N=0 returns a table of the type defined in the previous step with zero rows. For a more detailed explanation see that original blog post.
This new version of the query validates/publishes immediately, although it still takes 11 minutes and 1 second to refresh. Of course if you use this technique and then change your query so that different columns or data types are returned you have to update the extra code every time, which can be fiddly, but if you’re running into a timeout then you don’t have any choice and even if validation/publishing is slow it’s probably worth the extra effort.
To demonstrate how to do this, I created a semantic model with two tables: one visible, called VisibleTable, and one hidden, calledHiddenTable.
I then published the semantic model, created a Data Pipeline and added a semantic model refresh activity; selected the connection, workspace and semantic model; waited for the Table(s) dropdown to populate (yes I know it’s slow, we’re working on it):
…and then, when it loaded, noted that only the visible table was shown in the dropdown:
I didn’t select anything and instead clicked “Add dynamic content” to use an expression to select the table instead:
Then in the Pipeline expression builder I entered the following:
@json('
[
{
"table":"HiddenTable"
}
]
')
Having done this I ran the Pipeline and just the hidden table was refreshed. Easy!
The expression needs to be a JSON array of table and partition names. Here’s an example showing how to refresh the table called HiddenTable and the sole partition of the table called VisibleTable (which also happens to be called VisibleTable) in the same refresh:
It’s useful to know how to construct the expression even if you don’t need to refresh hidden tables – for example, you might want to dynamically generate the list of tables or partitions to refresh with an expression.