Sending Alerts From Fabric Workspace Monitoring Using KQL Querysets And Activator

I’ve always been a big fan of using Log Analytics to analyse Power BI engine activity (I’ve blogged about it many times) and so, naturally, I was very happy when the public preview of Fabric Workspace Monitoring was announced – it gives you everything you get from Log Analytics and more, all from the comfort of your own Fabric workspace. Apart from my blog there are lots of example KQL queries out there that you can use with Log Analytics and Workspace Monitoring, for example in this repo or Sandeep Pawar’s recent post. However what is new with Workspace Monitoring is that if you store these queries in a KQL Queryset you can create alerts in Activator, so when something important happens you can be notified of it.

What type of things should you, as an admin, be notified of? I can think of lots, but to demonstrate what’s possible let’s take the example of DAX query errors. I created a simple semantic model and in it I created the following measure:

Measure Returning Error = ERROR("Demo Error!") 

It uses the DAX Error() function, so every time a report visual contains this measure it will return a custom error:

This makes it very easy to generate errors for testing; in the real world the kind of errors you would want to look out for are ones to do with broken DAX measures (maybe referring to other measures or columns that no longer exist) or ones where the query memory limit has been exceeded.

I published this semantic model to a workspace with Workspace Monitoring enabled and then created a report and deliberately created visuals that generated this error.

I then created a KQL Queryset and used the following KQL query to get all the Error events in the last hour for this exact error:

SemanticModelLogs
| where Timestamp > ago(1hr)
| where OperationName =="Error" 
//filter for exactly the custom error number generated by my DAX
| where StatusCode == "-1053163166"
| extend app = tostring(parse_json(ApplicationContext))
| project 
Timestamp,
EventText, 
ExecutingUser, 
modelid = extract_json("$.DatasetId", app), 
reportId = extract_json("$.Sources[0].ReportId", app),
visualId = extract_json("$.Sources[0].VisualId", app),
consumptionMethod = extract_json("$.Sources[0].HostProperties.ConsumptionMethod", app)

A few things to note about this query:

  • The filter on StatusCode allows me to only return the errors generated by the Error function – different errors will have different error numbers but there is no single place where these error numbers are documented, unfortunately.
  • The last half of the query parses the ApplicationContext column to get the IDs of the semantic model, report and visual that generated the error (something I blogged about here) and where the user was, for example the Power BI web application, when the error occurred (something I blogged about here).

Finally, I created an Activator alert from this KQL Queryset by clicking the “Set alert” button shown in the top right-hand corner of the screenshot above to send me an email every time this error occurred, checking (by running the KQL query) every hour:

I customised the contents of the email alert inside Activator:

And sure enough, after an hour, I started getting email alerts for each error:

The great thing about the combination of Workspace Monitoring, KQL and Activator is the flexibility you get. For example, generating one alert per error would probably result in too many alerts to keep track of; instead you could write your KQL query to aggregate the data and only get one alert per user and error type, or error type and visual. As more and more sources of data are added to Workspace Monitoring and more functionality is added – see the public roadmap for details – then being a Fabric admin will get easier and easier.

Refreshing A Power BI Semantic Model With Eventstreams, Pipelines And Activator

Following on from my last post where I showed how to send data from Power Automate to a Fabric Eventstream, in this post I’m going to show how to use it to solve one of my favourite problems: refreshing a Power BI semantic model that uses an Excel workbook stored in OneDrive as a source when that Excel workbook is modified.

Now before I go on I want to be clear that I know this is a ridiculously over-engineered and expensive (in terms of CUs) solution, and that you can do almost the same thing just using Power Automate or in several other different ways – see my colleague Mark Pryce-Maher’s recent videos on using Fabric Open Mirroring with Excel for example. I’m doing this to teach myself Fabric Eventstreams and Activator and see what’s possible with them. Please excuse any mistakes or bad practices.

To start off, I built a Power BI semantic model containing a single table, connected to an Excel workbook stored in OneDrive for Business:

I then built a Fabric Pipeline to refresh the semantic model, using the newly-enhanced Semantic Model Refresh activity:

Next, I created a new Fabric Eventstream and created a slightly more sophisticated version of the Power Automate flow in my previous post that ran every 30 seconds, checked to see if the Excel file was modified, and if it was, sent an event to the Eventstream:

The Compose action in this flow generates some JSON that is sent to the Eventstream and contains the name of the file that has been altered and a dummy value called EventCounter that always contains 1:

I then connected the Eventstream to a new Activator:

In the Activator, I created a new object called ExcelWorkbook. I used the FileName field as the unique identifier and added the EventCounter field as a property:

Finally I created a new rule on the EventCounter property that aggregated the events over 1 minute windows and then fired when the total number of events in each window changed to the value 0 and stayed at 0 for 1 minute:

This is where the real value of Activator’s rules comes in. As I said, it’s easy using other approaches to refresh a semantic model when an Excel file changes. The problem with doing this, however, is that Excel files in OneDrive for Business autosave on a regular basis and if you make several changes to the file over the space of several minutes, you’ll end up trying to refresh the semantic model all the time. This can lead to excessive CU usage on your capacity because you’re refreshing too often as well as situations where one refresh fails because another is already in progress. What the rule above does is only trigger a refresh when changes to the Excel workbook have been detected (so the Total is greater than 0), then there has been a minute where no changes have been detected (so the Total changes to 0) and there has been a further minute where no changes have been detected (so the Total stays at 0). Since Activator rules can trigger Fabric items, I hooked my rule up to the Pipeline shown above to run the refresh:

This screenshot of the Definition tab of the rule shows a pattern of events where there were four one-minute windows where the Excel file was edited and the rule fired three times:

The problem with this rule is that if the Excel file is edited constantly for a long period, so that there is no period of inactivity, the semantic model would not get refreshed until that period of activity is over, which might be too long to wait. I guess the solution would be another rule that detects periods of, say, 10 minutes when the file has been edited constantly and refreshes the model even if there is no period of inactivity.

All in all I think this shows how the combination of Power BI/Power Platform and new features in Fabric can be used to solve old problems in new, interesting ways. I’m looking forward to learning more about Fabric Real-Time Intelligence and Activator!

[Thanks to James Hutton from the Activator team for his help with this post]

Send Data From Power Automate To Fabric Using Eventstreams

Fabric’s Real-Time Intelligence features are, for me, the most interesting things to learn about in the platform. I’m not going to pretend to be an expert in them – far from it – but they are quite easy to use and they open up some interesting possibilities for low-code/no-code people like me. The other day I was wondering if it was possible to send events and data from Power Automate to Fabric using Eventstreams and it turns out it is quite easy to do.

Here’s a really simple illustration. I created a new Eventstream in a Fabric workspace, then clicked the “Use custom endpoint” option and published the Eventstream so it was ready to send data to:

Then, in Power Automate, I built a super-simple flow with a “Manually trigger a flow” trigger and then the “Send event” action from the Event Hubs connector:

This works because you can send data from Power Automate to the Eventstream custom endpoint using the Event Hub protocol. To set up the connection to the Eventstream in Power Automate I entered a name for the connection, chose the “Access Key” authentication type and then copied the “Connection string-primary key” value from the Eventstream custom endpoint, shown in the first screenshot above (you need to click the eye icon next to it in order to copy it to the clipboard) and pasted it into the “Connection String” property in the Power Automate connector:

Then I configured the “Send event” action like so:

I had to manually paste the Event Hub Name, shown in the first screenshot above, as a custom value because I got an error when I tried to use Power Automate’s dropdown box to find the name. I put the Timestamp from the “Manually trigger a flow” trigger into the Content property.

After saving the flow I was then able to trigger it from my browser or the Power Automate mobile app, and every time the flow ran it sent the time of the run to my Eventstream:

Very basic but it proves the point. What could you use this for? Power Automate has a huge number of connectors for Microsoft and third party services: for example you can trigger a flow when a file stored in OneDrive for Business is modified or when new content is published to an RSS feed. Sending these events and their related data on to a Fabric Eventstream not only means you can capture them for analysis in OneLake or an EventHouse but also send the events on to Fabric Activator, which can apply sophisticated rules so you can be alerted when something happens and specific conditions are met in Teams or via email, run Fabric items such as notebooks or even trigger Power Automate flows. I’ll explore some of these scenarios in more detail in future blog posts.

Power BI Copilot: Where Are The CUs Consumed?

If you’ve looked into using Copilot with Power BI, you’ll know that you need a capacity to use it: a P1/F64 or greater. Something I learned recently, though, is that the current Power BI Copilot capabilities are only accessible via a side pane to a report and it’s the report that needs to be stored in a workspace on a P1/F64+ capacity and that’s where the CUs for Copilot are consumed. If the report has a live connection to a semantic model stored in a workspace not on a capacity (which we refer to at Microsoft as “Shared”, but is widely known as “Pro”) then Copilot still works!

For example, I published a semantic model to a workspace stored on Shared/Pro:

I then built a report with a Live Connection to that semantic model and saved it to a workspace on a P1 capacity:

I then opened this report, used Power BI Copilot, and saw that a) it worked and b) the CUs were attributed to the workspace where the report was stored when I looked in the Capacity Metrics App:

This is particularly interesting if you have a large number of semantic models that are stored in Shared/Pro workspaces today but still want to use Power BI Copilot. It means you don’t have to move your semantic models into a workspace on a capacity, which in turn means that refreshing those models or querying them won’t consume CUs on a capacity, which in turn means that you’ll be able to support more users for Power BI Copilot on your capacities.

In the future you’ll have even more flexibility. In the sessions “Microsoft Fabric: What’s New And What’s Next” and “Boost productivity with Microsoft Fabric​, Copilot, and AI” at Ignite last week we announced a number of new Power BI Copilot capabilities coming soon, such as integration with AI Skills and a new “immersive” Copilot experience. We also announced Fabric AI Capacities, coming early 2025, which will allow you to direct all Copilot activity to certain capacities regardless of whether your reports or models are stored on Shared/Pro or Premium, or which capacity they are stored on (see more details from the 16:00 minute mark in the recording of the “Boost productivity with Microsoft Fabric, Copilot and AI” session).

Finding The Size Of All Of The Columns Touched By A DAX Query

I had meant to follow up my recent post on how to find the columns touched by a DAX query by writing one on how to use this technique to find the size of these columns in memory, so you can find the total size of the columns that need to be paged into memory when a DAX query runs on a Direct Lake semantic model. Before I could do that, though, my colleague Michael Kovalsky messaged me to say that not only had he taken the query from that first post and incorporated it in Semantic Link Labs, he’d done the work to get column sizes too. All that’s left for me to do, then, is give you some simple examples of how to use it.

To use Semantic Link Labs you just need to create a new Fabric notebook and install the library:

%pip install semantic-link-labs

After that you can use sempy_labs.get_dax_query_dependencies to get the columns touched by any DAX query, for example:

import sempy_labs as labs
labs.get_dax_query_dependencies(
        dataset = 'InsertSemanticModelName',
        workspace = 'InsertWorkspaceName',
        dax_string = "InsertDAXQuery",
    )

This returns a dataframe with one row for each column touched by the query, plus various statistics about the size of each column in memory.

If you’re working with a Direct Lake semantic model, though, in order to get the correct sizes of each column in memory the query itself will need to have been run beforehand; you can ensure that this happens by setting the optional parameter put_in_memory to True:

import sempy_labs as labs
labs.get_dax_query_dependencies(
        dataset = 'InsertSemanticModelName',
        workspace = 'InsertWorkspaceName',
        dax_string = "InsertDAXQuery",
        put_in_memory = True
    )

Last of all, if you don’t want a dataframe but just want a single number representing the total memory needed by all columns touched by a query, you can use sempy_labs.get_dax_query_memory_size, for example like this:

import sempy_labs as labs
labs.get_dax_query_memory_size(
        dataset = 'InsertSemanticModelName',
        workspace = 'InsertWorkspaceName',
        dax_string = "InsertDAXQuery"
    )

Yet more evidence that, for any Power BI user, Semantic Link and Semantic Link Labs are the best reasons for you to flip the switch to enable Fabric. To find out more about what they are capable check out this user group presentation.

Tuning Power BI Copilot With Summarize By

Continuing my occasional series of ways you can improve the results you get from Power BI Copilot by setting properties on your semantic model, in this post I’ll look at how setting the Summarize By property can have an impact.

Let’s say you have a semantic model that contains the following table of temperature readings at different English seaside resort towns over the summer:

Now let’s say you enter the following prompt in the Copilot pane on a blank report (this forces Copilot to query the semantic model rather than look for answers from visuals in the report):

show temperature by resort

Here’s the response:

As you can see, Power BI Copilot has summed up the temperatures for each resort and this does not make sense. You could argue that the prompt should be more precise and specify the aggregation method to use, but it’s unlikely that most end users would know to do this on their first attempt at prompting or even ever. You could also argue that it would be better to expose an explicit measure and hide the underlying Temperature (C) column in the model – indeed, this is what I would do – but the downside of this is that it may limit the type of analyses that an end user can do.

Let’s say that what you really wanted to see for this prompt is average temperature by resort, and as a semantic model developer you can ensure this happens by setting the Summarize By property on the Temperature (C) column to Average, either on the Column Tools tab in the ribbon or in the Model View:

After making this change, the same prompt now returns the average temperature for each resort:

As with some of the other properties such as Row Label and Key Column I’ve talked about in this series, the Summarize By property is not new but it’s a lot more important to set it when designing models for Copilot.

One last thing to note is that as well as specifying how a column should be aggregated, the Summarize By property can be used to specify that values in a column should never be aggregated at all. This is particularly useful for numeric columns that contain categorical data such as year or zip codes. You can do this by setting Summarize By to None in the model view:

…or, if you’re setting this property on the Column Tools tab in the ribbon, using the Don’t Summarize option:

Different Ways To Replace Blanks With Zeros In DAX

My post from a few months ago about the dangers of DAX measures that never return blank attracted quite a lot of attention – this is a hot topic on the forums and adding zeros to measures is a common cause of memory errors in Power BI. In that post, though, I didn’t talk about what the best way to replace blanks with zeros is if you absolutely have no choice but to do so. One of the comments on that post mentioned that visual calculations is an option and this is something I hadn’t thought about before; now, after conversing with the Gods of DAX (no, not the Italians, I mean Akshai, Marius and Jeffrey!) and doing some testing I can reveal that Visual Calculations can be a good choice sometimes, while more traditional DAX approaches are fine at other times.

Let’s see some examples. I created the following model using the AdventureWorksDW 2017 sample data:

There’s a Product dimension, a Customer dimension and a Date dimension plus a fact table containing sales data. The most important thing to note is that individual customers only buy a few products on a few dates. I also created two measures with the following definitions:

Sales Amount = SUM('FactInternetSales'[SalesAmount])

Mountain-100 Black 38 Sales =
CALCULATE (
    [Sales Amount],
    'DimProduct'[EnglishProductName] = "Mountain-100 Black, 38"
)

The [Sales Amount] measure just sums up the values in the SalesAmount column on the fact table; the [Mountain-100 Black 38 Sales] returns the value of [Sales Amount] for just one product.

Now consider a table visual showing LastName from the DimCustomer table, FullDateAlternateKey from DimDate and the [Sales Amount] and [Mountain-100 Black 38 Sales] measures:

There are a lot of rows here because every combination of LastName and FullDateAlternateKey where there is a value for [Sales Amount] is shown. Connecting Profiler to Power BI Desktop and capturing the Execution Metrics trace event (DAX Studio also shows this now) shows that this query has a peak memory consumption of 2063KB

{
	"timeStart": "2024-11-03T19:07:26.831Z",
	"timeEnd": "2024-11-03T19:07:26.844Z",

	"durationMs": 13,
	"vertipaqJobCpuTimeMs": 0,
	"queryProcessingCpuTimeMs": 0,
	"totalCpuTimeMs": 0,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 2063,

	"commandType": "Statement",
	"queryDialect": 3,
	"queryResultRows": 502
}

As you can see the [Mountain-100 Black 38 Sales] is mostly empty, and let’s say you need to replace the blanks in this column with zeros.

Changing the measure definition to add zero to the result of the Calculate(), as follows:

Mountain-100 Black 38 Sales =
CALCULATE (
    [Sales Amount],
    'DimProduct'[EnglishProductName] = "Mountain-100 Black, 38"
) + 0

Doesn’t do what you want because now you get a row in the table for every combination of LastName and FullDateAlternateKey, which means the rows which have non-zero values are hard to find:

Instead, only adding zero when there is a value for [Sales Amount], something like this:

Mountain-100 Black 38 Sales =
IF (
    NOT ( ISBLANK ( [Sales Amount] ) ),
    CALCULATE (
        [Sales Amount],
        'DimProduct'[EnglishProductName] = "Mountain-100 Black, 38"
    ) + 0
)

…does the trick. What does memory usage look like? Here are the Execution Metrics:

{
	"timeStart": "2024-11-03T19:17:22.470Z",
	"timeEnd": "2024-11-03T19:17:22.500Z",

	"durationMs": 30,
	"vertipaqJobCpuTimeMs": 0,
	"queryProcessingCpuTimeMs": 31,
	"totalCpuTimeMs": 31,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 3585,

	"commandType": "Statement",
	"queryDialect": 3,
	"queryResultRows": 502
}

Memory usage has increased only slightly, to 3585KB.

What about using Visual Calculations instead? Reverting to the original definition of the [Mountain-100 Black 38 Sales] measure and then creating a Visual Calculation like so:

No Blanks = [Mountain-100 Black 38 Sales]+0

…shows that this doesn’t solve the problem because again you get unwanted rows with no sales. Using:

No Blanks = 
IF (
    NOT ( ISBLANK ( [Sales Amount] ) ),
    [Mountain-100 Black 38 Sales] + 0
)

…does solve the problem and you can of course hide the original [Mountain-100 Black 38 Sales] measure column so it doesn’t appear in your table:

But Execution Metrics shows that memory usage is in fact a lot higher, at 11295KB, which is because the resultset now has one extra column in it and Visual Calculations make a copy of the original resultset in memory when they are calculated:

{
	"timeStart": "2024-11-03T19:31:22.858Z",
	"timeEnd": "2024-11-03T19:31:22.980Z",

	"durationMs": 122,
	"vertipaqJobCpuTimeMs": 0,
	"queryProcessingCpuTimeMs": 109,
	"totalCpuTimeMs": 109,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 11295,

	"commandType": "Statement",
	"queryDialect": 3,
	"queryResultRows": 502

Does this mean that Visual Calculations should never be used? No, not at all. Consider the following matrix visual which only contains the [Mountain-100 Black 38 Sales] measure and has LastName on rows and FullDateAlternateKey on columns:

Memory usage for this visual is 1091KB:

{
	"timeStart": "2024-11-03T19:51:02.966Z",
	"timeEnd": "2024-11-03T19:51:02.974Z",

	"durationMs": 8,
	"vertipaqJobCpuTimeMs": 0,
	"queryProcessingCpuTimeMs": 16,
	"totalCpuTimeMs": 16,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 1091,

	"commandType": "Statement",
	"queryDialect": 3,
	"queryResultRows": 139
}

The resultset returned for the DAX query used to populate this visual only contains one row for each combination of Last Name and Date that has a value for [Mountain-100 Black 38 Sales], 139 rows in all, but because a matrix visual is used to display the results this introduces all the blanks you can see in the screenshot. You could could try to replace these blanks with some very complex DAX but I’m not even going to try. Instead, Visual Calculations solve this problem very easily:

No Blanks Matrix = [Mountain-100 Black 38 Sales]+0

Here’s the matrix with [Mountain-100 Black 38 Sales] hidden and the Visual Calculation applied:

Execution Metrics reveal that peak memory consumption is just 2054KB and the number of rows returned is higher but still just 2070 rows:

{
	"timeStart": "2024-11-03T19:45:49.298Z",
	"timeEnd": "2024-11-03T19:45:49.337Z",

	"durationMs": 39,
	"vertipaqJobCpuTimeMs": 0,
	"queryProcessingCpuTimeMs": 31,
	"totalCpuTimeMs": 31,
	"executionDelayMs": 0,

	"approximatePeakMemConsumptionKB": 2054,

	"commandType": "Statement",
	"queryDialect": 3,
	"queryResultRows": 2070
}

Overall, both traditional DAX solutions and Visual Calculations are effective in different scenarios, so I suggest that you test the query performance and memory usage of different solutions yourself.

New Solution For Monitoring Power BI/Fabric Gateways

If you’re the admin for a large Power BI/Fabric tenant you’ll know how important it is to monitor your on-premises data gateways: if they aren’t working then no-one will be able to refresh any semantic models or dataflows connected to on-prem sources. A while ago Rui Romano published a gateway monitoring solution that proved to be very popular, but recently my colleague at Microsoft Edgar Cotte built a new open-source solution that builds on Rui’s work and which is even more powerful, using Fabric Real-Time Intelligence to let you analyse gateway log information in real time.

You can read his announcement on LinkedIn here and download the solution from https://aka.ms/FabricGatewayMonitoring.

This is an amazing piece of work that is already being used by several very large customers. Check it out!

Finding The Columns Touched By A DAX Query

Some time ago I wrote a post about how you can use the DISCOVER_CALC_DEPENDENCY DMV to get a list of all the tables, columns and relationships referenced by a DAX query. That’s really useful and indeed the new INFO.CALCDEPENDENCY DAX function allows you to do the same thing without all the complexity of calling a DMV. However this doesn’t quite give you everything you need to solve one important problem: which columns are touched when a DAX query runs? The INFO.CALCDEPENDENCY function gives you a list of all the columns used directly in a DAX query and it also gives you a list of all the columns needed by all the measures used in a DAX query. However it only gives you a list of the relationships used and these relationships also rely on columns. In this post I’ll show you how to get a list of all the columns that are touched by a DAX query.

For illustration, consider the following model:

Note that the relationship between the Product table and the Sales table is inactive. The definitions of the three measures are as follows:

Units = SUM(Sales[SalesUnits])
Value = SUM(Sales[SalesValue])
Apples Value =
CALCULATE (
    [Value],
    KEEPFILTERS ( 'Product'[ProductName] = "Apples" ),
    USERELATIONSHIP ( 'Product'[ProductID], 'Sales'[ProductID] )
)

There is also a calculated column called CustomerID and Name defined as follows:

Customer ID and Name = 
"(" & 'Customer'[CustomerID] & ") " & Customer[CustomerName]

Now consider the following visual:

The DAX query for this is as follows:

// DAX Query
DEFINE
	VAR __DS0Core = 
		SUMMARIZECOLUMNS(
			ROLLUPADDISSUBTOTAL('Country'[CountryName], "IsGrandTotalRowTotal"),
			"Apples_Value", 'Sales'[Apples Value]
		)

	VAR __DS0PrimaryWindowed = 
		TOPN(502, __DS0Core, [IsGrandTotalRowTotal], 0, [Apples_Value], 0, 'Country'[CountryName], 1)

EVALUATE
	__DS0PrimaryWindowed

ORDER BY
	[IsGrandTotalRowTotal] DESC, [Apples_Value] DESC, 'Country'[CountryName]

Passing this query to INFO.CALCDEPENDENCY like so:

EVALUATE
VAR Source_Query =
"
DEFINE
	VAR __DS0Core = 
		SUMMARIZECOLUMNS(
			ROLLUPADDISSUBTOTAL('Country'[CountryName], ""IsGrandTotalRowTotal""),
			""Apples_Value"", 'Sales'[Apples Value]
		)

	VAR __DS0PrimaryWindowed = 
		TOPN(502, __DS0Core, [IsGrandTotalRowTotal], 0, [Apples_Value], 0, 'Country'[CountryName], 1)

EVALUATE
	__DS0PrimaryWindowed

ORDER BY
	[IsGrandTotalRowTotal] DESC, [Apples_Value] DESC, 'Country'[CountryName]
"
VAR All_Dependencies =
INFO.CALCDEPENDENCY(
	"QUERY",
	Source_Query
)
RETURN All_Dependencies

[Note that all double quotes in the query being passed in have been escaped]

…returns all the measures, tables, columns and relationships used:

It knows that the Apples Value measure needs the SalesValue column and the ProductName column and it knows that the relationships between the Country table and the Sales table, and the Product table and the Sales table, are also required but it doesn’t tell you which columns those relationships in turn need. Luckily the new INFO.VIEW.RELATIONSHIPS() function can be used to get that extra information easily.

Here’s a more complex DAX query that returns a list of all the measures touched by a DAX query, including those needed by all the measures and all the relationships needed by that query:

EVALUATE
VAR Source_Query =
"
DEFINE
	VAR __DS0Core = 
		SUMMARIZECOLUMNS(
			ROLLUPADDISSUBTOTAL('Country'[CountryName], ""IsGrandTotalRowTotal""),
			""Apples_Value"", 'Sales'[Apples Value]
		)

	VAR __DS0PrimaryWindowed = 
		TOPN(502, __DS0Core, [IsGrandTotalRowTotal], 0, [Apples_Value], 0, 'Country'[CountryName], 1)

EVALUATE
	__DS0PrimaryWindowed

ORDER BY
	[IsGrandTotalRowTotal] DESC, [Apples_Value] DESC, 'Country'[CountryName]
"
VAR All_Dependencies =
INFO.CALCDEPENDENCY(
	"QUERY",
	Source_Query
)
VAR Referenced_Cols = 
SELECTCOLUMNS(
	FILTER(
		All_Dependencies, 
		[REFERENCED_OBJECT_TYPE] IN {"COLUMN", "CALC_COLUMN"} 
		),
	"FullyQualifiedName",
	"'" & [REFERENCED_TABLE] & "'[" & [REFERENCED_OBJECT] & "]"
)
VAR Referenced_Relationships =
SELECTCOLUMNS(
FILTER(
	All_Dependencies, 
	[REFERENCED_OBJECT_TYPE] IN {"ACTIVE_RELATIONSHIP", "RELATIONSHIP"} ),
"Name",
[REFERENCED_OBJECT])
VAR Relationship_Details =
FILTER(
	INFO.VIEW.RELATIONSHIPS(),
	[Name] IN Referenced_Relationships
)
VAR Relationship_Columns = 
UNION(
	SELECTCOLUMNS(
		Relationship_Details,
		"FullyQualifiedName",
		"'" & [FromTable] & "'[" & [FromColumn] & "]"
	),
		SELECTCOLUMNS(
		Relationship_Details,
		"FullyQualifiedName",
		"'" & [ToTable] & "'[" & [ToColumn] & "]"
	)
)
VAR All_Referenced_Columns = 
UNION(
	Referenced_Cols,
	Relationship_Columns
)
RETURN 
All_Referenced_Columns

Here’s what it returns:

As you can see, this returns a list of all the columns used directly by the query, any measures referenced by the query, and all the relationships involved too.

You can use this query very easily in DAX Query View: just paste it in, then capture the DAX query generated by your own visuals using Performance Analyzer, escape any double quotes and paste it into the Source_Query variable in the query above.

Here’s one more example of how it works, a visual that touches a lot more columns:

I won’t post the full DAX query of the visual or my query again, but here’s the list of columns returned by my query:

One thing to notice about this is that while it shows the calculated column [Customer ID and Name] is used, it doesn’t break down this column’s dependencies and therefore doesn’t list the [CustomerName] column. This is a deliberate decision on my part.

Why is all this useful? The obvious answer is that it would allow you to see whether a query would break or return a different result if you deleted a column from your model. More interestingly – and this is something I need to research more – given that in Direct Lake mode columns are paged into memory when needed by a query, and given that it’s possible to use other DMVs/INFO functions to get the size of a column in memory (for example as DAX Studio’s Model Metrics feature does), it should be possible to use this approach to find the total memory used by all the columns needed by a DAX query on a Direct Lake model. This would be very helpful when diagnosing memory errors in Direct Lake mode.

Why Would You Use Power BI Copilot?

I’m a big fan of Kurt Buhler‘s work (indeed we like him so much at Microsoft that we get him to help out with our docs) and I have a lot of respect for his opinion. A few weeks ago, in a thread on Twitter/X, he asked an important question about Power BI Copilot: why would someone use it? When I saw this I realised it was something I hadn’t thought about properly but which deserved a detailed answer. Now, after ruminating for a while and talking to a few people (including Kurt), I will put my case for why someone might want to use Power BI Copilot.

[Just to be clear: in this post I am not going to discuss the related question of whether Copilot works well and delivers useful results – even though I think it is pretty good already – because that’s not what Kurt is asking in the thead]

Which Copilot are we talking about here?

Before I carry I need to define which Copilot is being referred to in the question. There are a lot of Copilot experiences in Power BI: the one in DAX Query View, the one that can create new report pages, the one that allows you to summarise and ask questions about data that is visible on report pages, the one that allows you to ask questions about data that is in your semantic model but not necessarily visible in a report, as well as others. Since Kurt and I agree on the usefulness of Copilot in DAX Query View for intermediate developers let’s focus on the last three – which can be summed up as the ability to answer data questions in natural language – which are what most people think about when they think about Copilot for Power BI anyway.

A solution looking for a problem?

To answer the question of why someone would use Power BI Copilot to answer data questions we need to look at who is asking for this functionality. But is anyone actually asking for it? I’ve heard a few people say that Copilot for Power BI is a solution looking for a problem.

Now I can see why you would think that because let’s face it, we can all imagine some VP in Microsoft ordering every program manager to add a Copilot to their product whether they think it’s useful or not because that’s what the market demands. However if you work in the Microsoft field or for a partner then I’m sure you’ll know how much genuine interest there is in Copilot from Power BI customers and users. In my opinion Power BI Copilot is actually the latest attempt to respond to a very real request that I have heard for the whole of my career in BI and which can be summed up as follows:

I want to ask questions in English (or whatever my native language is) about my data rather than have to look at a report someone else has built or build one myself

I’m old enough to remember SQL Server 2000’s English Query feature, which was an attempt to solve the same problem over twenty years ago, and there have been numerous other products and features from Microsoft (including Power BI’s own Q&A) and other BI vendors that have tried to do the same thing in the past. I am not saying that any of them succeeded in their aim because I don’t think any of them did. What I am saying is that they were attempts to do the same thing that Power BI Copilot is trying to do and that therefore Power BI Copilot is not a solution looking for a problem, it’s actually the latest attempt to solve a very real and very old problem.

So what is the problem?

I have never felt the need to ask questions about my data in natural language because I have always found it easy to:

  • Look at an existing report or dashboard and interpret what’s in it
  • Build my own report/dashboard/queries to get answer to questions that are not in existing reports or dashboards

That’s going to be true of anyone working as a data professional or a business analyst. So if the question is “why would I use it?” then the answer is I wouldn’t.

However I believe the people who are asking for natural language querying of data are people who:

  • Find it hard or impossible to interpret data provided in numeric or visual form in a report
  • Find it hard or impossible to build new reports or write queries to get answers that aren’t in the reports they do have access to
  • Have to ask other people to get the answers they need

It’s tempting to say “read what’s in the report” or “learn Power BI” or “learn SQL” when faced with these people and their requests. But, clearly, they keep asking for an alternative and we’re not going to change their minds.

Think back to maths class at school and it shouldn’t be a surprise that some people just don’t enjoy staring at pages of numbers and graphs or learning database query languages in the same way your or I do. What’s more dyscalculia or “maths dyslexia” means that interpreting data – however well designed your visualisations are – doesn’t come easily to a significant number of people. While the imperative to be “data driven” is widespread in every industry and profession today, not being good with data or not enjoying using it is not a barrier to success in most careers. Power BI Copilot provides a different, and perhaps more approachable, way of getting answers to data questions for many people.

It seems to me that a lot of the excitement for Copilot for Power BI comes from C-level people who love the challenge of management, who need data to make decisions, but who don’t like working with data directly or don’t have the time to do so. Today they need to ask their data questions to a business analyst and this introduces latency: asking someone else to do something is always more time consuming than doing it yourself. What I suspect these C-level folks want is to cut the human out of the loop, get answers to their questions more quickly, while keeping the experience of interacting with a human. Power BI Copilot aims to be that virtual business analyst.

Why is Copilot better or more effective than non-AI approaches?

If the “non-AI” approaches mentioned here involve “reading the report” or “learning Power BI” I believe I’ve already dealt with them above. What about non-AI approaches to natural language querying? Clearly LLMs are vastly better at “understanding” natural language requests than any technology ever seen before but there’s more to say here.

As I already mentioned, while the problem of asking data questions in natural language has been around for a long time, no-one has solved it yet. Why? Thinking back to my experiences with Power BI Q&A the reason is that to get good answers you have to ask good questions. Half the challenge of learning programming or any database query language is understanding what you want to do before you even learn how to express that request in code. To get good results from Q&A you have to construct the English-language equivalent of a DAX or SQL query and that’s why it, like all its predecessors, failed. Users are bad at constructing clear, unambiguous questions.

One way that Copilot is an improvement on Q&A is that it can interpret a wider range of questions. For example, I have a demo with a table of books at their year of publication; I was pleasantly surprised that when I asked “Show me all 19th century literature” it gave me all the books published between 1800 and 1899. Q&A would never have been able to do this.

More importantly, Copilot is getting better at asking for clarifications. Going back to the previous question, when a C-level person asks their business analyst for the answer to a data question it will be something vague like “what was growth like for widgets in the UK last year?”. Part of the job of the business analyst is to know exactly what exact product “widgets” refers to and which particular geographies make up the UK without having to ask. Those are things that can be defined in a good Power BI semantic model. However part of the job of the business analyst is to ask follow-up questions like “do you mean month-on-month growth or year-on-year growth?” to clarify the original request and that’s exactly what you’ll see Copilot doing more of in the future.

The ability of LLMs to conduct conversations in near human-quality natural language and use this ability to turn a vague request into a data question that can be answered is, I think, why AI approaches to the problem we’re talking about are better than non-AI approaches.

Conclusion

I hope I have made a good argument for there being people who want to use Power BI Copilot and the reason why is that they can’t, don’t like or don’t want to learn to use existing methods of interacting with data. Will it succeed where so many others have failed? Let’s see where we are in a year or so’s time but I’m confident that the remarkable advances in generative AI mean that it will.