As a Power BI semantic model developer, one of the first things you will learn about is using the Sort By Column property to make sure the values in a column – for example days of the week or months of the year – are sorted correctly. Recently, though, I came across a situation where it was important to set it when you might not expect to have to in order to get good results from Copilot.
Consider the following simple semantic model:
It contains a single table containing sales values broken down by the financial year that products were launched in:
You might think that is isn’t necessary to set the Sort By Column property on the Launch column because, even though it is a text value, sorting it alphabetically gives you the sort order you would expect. Clicking on this column in the Data pane in Desktop to automatically create a visual gives a table with the financial years in the correct order (ie in ascending order by year):
But selecting the Sales Amount measure changes the visual used and, crucially, changes the sort order:
Note that, now, the bar chart is sorted in descending order by the Sales Amount measure, rather than by year.
This type of automatic behaviour is important because it’s something Copilot will rely on. So, for example, the prompt:
Show Sales Amount by product launch date
Returns the same bar chart as a response:
While a prompt to create a new report page does something similar:
Show Sales Amount by product launch date as a new page
Note how, on this report page, both the line chart and the column chart are ordered by Sales Amount again. This is almost certainly not what your end users want to see.
So how can you get Copilot to generate visuals with Launch on the x axis sorted in year order? The solution is to add a new column (in this case called Launch YYYY and hidden) and use it to set the Sort By Column property for the Launch column:
Having done this the two prompts above return visuals sorted by the Launch column and not by the Sales Amount measure. So:
Show Sales Amount by product launch date
…returns
…and the prompt:
Show Sales Amount by product launch date as a new page
…returns:
This is a very simple example – in the real world you should have a separate date dimension table and follow all the other best practices for data modelling. The point is, though, that setting the Sort By Column property on a column means that sort order will always be respected in visuals created by Copilot unless you explicitly override it in your prompt.
I’ve always been a big fan of using Log Analytics to analyse Power BI engine activity (I’ve blogged about it many times) and so, naturally, I was very happy when the public preview of Fabric Workspace Monitoring was announced – it gives you everything you get from Log Analytics and more, all from the comfort of your own Fabric workspace. Apart from my blog there are lots of example KQL queries out there that you can use with Log Analytics and Workspace Monitoring, for example in this repo or Sandeep Pawar’s recent post. However what is new with Workspace Monitoring is that if you store these queries in a KQL Queryset you can create alerts in Activator, so when something important happens you can be notified of it.
What type of things should you, as an admin, be notified of? I can think of lots, but to demonstrate what’s possible let’s take the example of DAX query errors. I created a simple semantic model and in it I created the following measure:
Measure Returning Error = ERROR("Demo Error!")
It uses the DAX Error() function, so every time a report visual contains this measure it will return a custom error:
This makes it very easy to generate errors for testing; in the real world the kind of errors you would want to look out for are ones to do with broken DAX measures (maybe referring to other measures or columns that no longer exist) or ones where the query memory limit has been exceeded.
I published this semantic model to a workspace with Workspace Monitoring enabled and then created a report and deliberately created visuals that generated this error.
I then created a KQL Queryset and used the following KQL query to get all the Error events in the last hour for this exact error:
SemanticModelLogs
| where Timestamp > ago(1hr)
| where OperationName =="Error"
//filter for exactly the custom error number generated by my DAX
| where StatusCode == "-1053163166"
| extend app = tostring(parse_json(ApplicationContext))
| project
Timestamp,
EventText,
ExecutingUser,
modelid = extract_json("$.DatasetId", app),
reportId = extract_json("$.Sources[0].ReportId", app),
visualId = extract_json("$.Sources[0].VisualId", app),
consumptionMethod = extract_json("$.Sources[0].HostProperties.ConsumptionMethod", app)
A few things to note about this query:
The filter on StatusCode allows me to only return the errors generated by the Error function – different errors will have different error numbers but there is no single place where these error numbers are documented, unfortunately.
The last half of the query parses the ApplicationContext column to get the IDs of the semantic model, report and visual that generated the error (something I blogged about here) and where the user was, for example the Power BI web application, when the error occurred (something I blogged about here).
Finally, I created an Activator alert from this KQL Queryset by clicking the “Set alert” button shown in the top right-hand corner of the screenshot above to send me an email every time this error occurred, checking (by running the KQL query) every hour:
I customised the contents of the email alert inside Activator:
And sure enough, after an hour, I started getting email alerts for each error:
The great thing about the combination of Workspace Monitoring, KQL and Activator is the flexibility you get. For example, generating one alert per error would probably result in too many alerts to keep track of; instead you could write your KQL query to aggregate the data and only get one alert per user and error type, or error type and visual. As more and more sources of data are added to Workspace Monitoring and more functionality is added – see the public roadmap for details – then being a Fabric admin will get easier and easier.
Following on from my last post where I showed how to send data from Power Automate to a Fabric Eventstream, in this post I’m going to show how to use it to solve one of my favourite problems: refreshing a Power BI semantic model that uses an Excel workbook stored in OneDrive as a source when that Excel workbook is modified.
Now before I go on I want to be clear that I know this is a ridiculously over-engineered and expensive (in terms of CUs) solution, and that you can do almost the same thing just using Power Automate or in several other different ways – see my colleague Mark Pryce-Maher’s recent videos on using Fabric Open Mirroring with Excel for example. I’m doing this to teach myself Fabric Eventstreams and Activator and see what’s possible with them. Please excuse any mistakes or bad practices.
To start off, I built a Power BI semantic model containing a single table, connected to an Excel workbook stored in OneDrive for Business:
Next, I created a new Fabric Eventstream and created a slightly more sophisticated version of the Power Automate flow in my previous post that ran every 30 seconds, checked to see if the Excel file was modified, and if it was, sent an event to the Eventstream:
The Compose action in this flow generates some JSON that is sent to the Eventstream and contains the name of the file that has been altered and a dummy value called EventCounter that always contains 1:
I then connected the Eventstream to a new Activator:
In the Activator, I created a new object called ExcelWorkbook. I used the FileName field as the unique identifier and added the EventCounter field as a property:
Finally I created a new rule on the EventCounter property that aggregated the events over 1 minute windows and then fired when the total number of events in each window changed to the value 0 and stayed at 0 for 1 minute:
This is where the real value of Activator’s rules comes in. As I said, it’s easy using other approaches to refresh a semantic model when an Excel file changes. The problem with doing this, however, is that Excel files in OneDrive for Business autosave on a regular basis and if you make several changes to the file over the space of several minutes, you’ll end up trying to refresh the semantic model all the time. This can lead to excessive CU usage on your capacity because you’re refreshing too often as well as situations where one refresh fails because another is already in progress. What the rule above does is only trigger a refresh when changes to the Excel workbook have been detected (so the Total is greater than 0), then there has been a minute where no changes have been detected (so the Total changes to 0) and there has been a further minute where no changes have been detected (so the Total stays at 0). Since Activator rules can trigger Fabric items, I hooked my rule up to the Pipeline shown above to run the refresh:
This screenshot of the Definition tab of the rule shows a pattern of events where there were four one-minute windows where the Excel file was edited and the rule fired three times:
The problem with this rule is that if the Excel file is edited constantly for a long period, so that there is no period of inactivity, the semantic model would not get refreshed until that period of activity is over, which might be too long to wait. I guess the solution would be another rule that detects periods of, say, 10 minutes when the file has been edited constantly and refreshes the model even if there is no period of inactivity.
All in all I think this shows how the combination of Power BI/Power Platform and new features in Fabric can be used to solve old problems in new, interesting ways. I’m looking forward to learning more about Fabric Real-Time Intelligence and Activator!
[Thanks to James Hutton from the Activator team for his help with this post]
Fabric’s Real-Time Intelligence features are, for me, the most interesting things to learn about in the platform. I’m not going to pretend to be an expert in them – far from it – but they are quite easy to use and they open up some interesting possibilities for low-code/no-code people like me. The other day I was wondering if it was possible to send events and data from Power Automate to Fabric using Eventstreams and it turns out it is quite easy to do.
Here’s a really simple illustration. I created a new Eventstream in a Fabric workspace, then clicked the “Use custom endpoint” option and published the Eventstream so it was ready to send data to:
Then, in Power Automate, I built a super-simple flow with a “Manually trigger a flow” trigger and then the “Send event” action from the Event Hubs connector:
This works because you can send data from Power Automate to the Eventstream custom endpoint using the Event Hub protocol. To set up the connection to the Eventstream in Power Automate I entered a name for the connection, chose the “Access Key” authentication type and then copied the “Connection string-primary key” value from the Eventstream custom endpoint, shown in the first screenshot above (you need to click the eye icon next to it in order to copy it to the clipboard) and pasted it into the “Connection String” property in the Power Automate connector:
Then I configured the “Send event” action like so:
I had to manually paste the Event Hub Name, shown in the first screenshot above, as a custom value because I got an error when I tried to use Power Automate’s dropdown box to find the name. I put the Timestamp from the “Manually trigger a flow” trigger into the Content property.
After saving the flow I was then able to trigger it from my browser or the Power Automate mobile app, and every time the flow ran it sent the time of the run to my Eventstream:
If you’ve looked into using Copilot with Power BI, you’ll know that you need a capacity to use it: a P1/F64 or greater. Something I learned recently, though, is that the current Power BI Copilot capabilities are only accessible via a side pane to a report and it’s the report that needs to be stored in a workspace on a P1/F64+ capacity and that’s where the CUs for Copilot are consumed. If the report has a live connection to a semantic model stored in a workspace not on a capacity (which we refer to at Microsoft as “Shared”, but is widely known as “Pro”) then Copilot still works!
For example, I published a semantic model to a workspace stored on Shared/Pro:
I then built a report with a Live Connection to that semantic model and saved it to a workspace on a P1 capacity:
I then opened this report, used Power BI Copilot, and saw that a) it worked and b) the CUs were attributed to the workspace where the report was stored when I looked in the Capacity Metrics App:
This is particularly interesting if you have a large number of semantic models that are stored in Shared/Pro workspaces today but still want to use Power BI Copilot. It means you don’t have to move your semantic models into a workspace on a capacity, which in turn means that refreshing those models or querying them won’t consume CUs on a capacity, which in turn means that you’ll be able to support more users for Power BI Copilot on your capacities.
In the future you’ll have even more flexibility. In the sessions “Microsoft Fabric: What’s New And What’s Next” and “Boost productivity with Microsoft Fabric​, Copilot, and AI” at Ignite last week we announced a number of new Power BI Copilot capabilities coming soon, such as integration with AI Skills and a new “immersive” Copilot experience. We also announced Fabric AI Capacities, coming early 2025, which will allow you to direct all Copilot activity to certain capacities regardless of whether your reports or models are stored on Shared/Pro or Premium, or which capacity they are stored on (see more details from the 16:00 minute mark in the recording of the “Boost productivity with Microsoft Fabric, Copilot and AI” session).
I had meant to follow up my recent post on how to find the columns touched by a DAX query by writing one on how to use this technique to find the size of these columns in memory, so you can find the total size of the columns that need to be paged into memory when a DAX query runs on a Direct Lake semantic model. Before I could do that, though, my colleague Michael Kovalsky messaged me to say that not only had he taken the query from that first post and incorporated it in Semantic Link Labs, he’d done the work to get column sizes too. All that’s left for me to do, then, is give you some simple examples of how to use it.
To use Semantic Link Labs you just need to create a new Fabric notebook and install the library:
This returns a dataframe with one row for each column touched by the query, plus various statistics about the size of each column in memory.
If you’re working with a Direct Lake semantic model, though, in order to get the correct sizes of each column in memory the query itself will need to have been run beforehand; you can ensure that this happens by setting the optional parameter put_in_memory to True:
Last of all, if you don’t want a dataframe but just want a single number representing the total memory needed by all columns touched by a query, you can use sempy_labs.get_dax_query_memory_size, for example like this:
Continuing my occasional series of ways you can improve the results you get from Power BI Copilot by setting properties on your semantic model, in this post I’ll look at how setting the Summarize By property can have an impact.
Let’s say you have a semantic model that contains the following table of temperature readings at different English seaside resort towns over the summer:
Now let’s say you enter the following prompt in the Copilot pane on a blank report (this forces Copilot to query the semantic model rather than look for answers from visuals in the report):
show temperature by resort
Here’s the response:
As you can see, Power BI Copilot has summed up the temperatures for each resort and this does not make sense. You could argue that the prompt should be more precise and specify the aggregation method to use, but it’s unlikely that most end users would know to do this on their first attempt at prompting or even ever. You could also argue that it would be better to expose an explicit measure and hide the underlying Temperature (C) column in the model – indeed, this is what I would do – but the downside of this is that it may limit the type of analyses that an end user can do.
Let’s say that what you really wanted to see for this prompt is average temperature by resort, and as a semantic model developer you can ensure this happens by setting the Summarize By property on the Temperature (C) column to Average, either on the Column Tools tab in the ribbon or in the Model View:
After making this change, the same prompt now returns the average temperature for each resort:
As with some of the other properties such as Row Label and Key Column I’ve talked about in this series, the Summarize By property is not new but it’s a lot more important to set it when designing models for Copilot.
One last thing to note is that as well as specifying how a column should be aggregated, the Summarize By property can be used to specify that values in a column should never be aggregated at all. This is particularly useful for numeric columns that contain categorical data such as year or zip codes. You can do this by setting Summarize By to None in the model view:
…or, if you’re setting this property on the Column Tools tab in the ribbon, using the Don’t Summarize option:
My post from a few months ago about the dangers of DAX measures that never return blank attracted quite a lot of attention – this is a hot topic on the forums and adding zeros to measures is a common cause of memory errors in Power BI. In that post, though, I didn’t talk about what the best way to replace blanks with zeros is if you absolutely have no choice but to do so. One of the comments on that post mentioned that visual calculations is an option and this is something I hadn’t thought about before; now, after conversing with the Gods of DAX (no, not the Italians, I mean Akshai, Marius and Jeffrey!) and doing some testing I can reveal that Visual Calculations can be a good choice sometimes, while more traditional DAX approaches are fine at other times.
Let’s see some examples. I created the following model using the AdventureWorksDW 2017 sample data:
There’s a Product dimension, a Customer dimension and a Date dimension plus a fact table containing sales data. The most important thing to note is that individual customers only buy a few products on a few dates. I also created two measures with the following definitions:
The [Sales Amount] measure just sums up the values in the SalesAmount column on the fact table; the [Mountain-100 Black 38 Sales] returns the value of [Sales Amount] for just one product.
Now consider a table visual showing LastName from the DimCustomer table, FullDateAlternateKey from DimDate and the [Sales Amount] and [Mountain-100 Black 38 Sales] measures:
There are a lot of rows here because every combination of LastName and FullDateAlternateKey where there is a value for [Sales Amount] is shown. Connecting Profiler to Power BI Desktop and capturing the Execution Metrics trace event (DAX Studio also shows this now) shows that this query has a peak memory consumption of 2063KB
Doesn’t do what you want because now you get a row in the table for every combination of LastName and FullDateAlternateKey, which means the rows which have non-zero values are hard to find:
Instead, only adding zero when there is a value for [Sales Amount], something like this:
Mountain-100 Black 38 Sales =
IF (
NOT ( ISBLANK ( [Sales Amount] ) ),
CALCULATE (
[Sales Amount],
'DimProduct'[EnglishProductName] = "Mountain-100 Black, 38"
) + 0
)
…does the trick. What does memory usage look like? Here are the Execution Metrics:
Memory usage has increased only slightly, to 3585KB.
What about using Visual Calculations instead? Reverting to the original definition of the [Mountain-100 Black 38 Sales] measure and then creating a Visual Calculation like so:
No Blanks = [Mountain-100 Black 38 Sales]+0
…shows that this doesn’t solve the problem because again you get unwanted rows with no sales. Using:
No Blanks =
IF (
NOT ( ISBLANK ( [Sales Amount] ) ),
[Mountain-100 Black 38 Sales] + 0
)
…does solve the problem and you can of course hide the original [Mountain-100 Black 38 Sales] measure column so it doesn’t appear in your table:
But Execution Metrics shows that memory usage is in fact a lot higher, at 11295KB, which is because the resultset now has one extra column in it and Visual Calculations make a copy of the original resultset in memory when they are calculated:
Does this mean that Visual Calculations should never be used? No, not at all. Consider the following matrix visual which only contains the [Mountain-100 Black 38 Sales] measure and has LastName on rows and FullDateAlternateKey on columns:
The resultset returned for the DAX query used to populate this visual only contains one row for each combination of Last Name and Date that has a value for [Mountain-100 Black 38 Sales], 139 rows in all, but because a matrix visual is used to display the results this introduces all the blanks you can see in the screenshot. You could could try to replace these blanks with some very complex DAX but I’m not even going to try. Instead, Visual Calculations solve this problem very easily:
No Blanks Matrix = [Mountain-100 Black 38 Sales]+0
Here’s the matrix with [Mountain-100 Black 38 Sales] hidden and the Visual Calculation applied:
Execution Metrics reveal that peak memory consumption is just 2054KB and the number of rows returned is higher but still just 2070 rows:
Overall, both traditional DAX solutions and Visual Calculations are effective in different scenarios, so I suggest that you test the query performance and memory usage of different solutions yourself.
If you’re the admin for a large Power BI/Fabric tenant you’ll know how important it is to monitor your on-premises data gateways: if they aren’t working then no-one will be able to refresh any semantic models or dataflows connected to on-prem sources. A while ago Rui Romano published a gateway monitoring solution that proved to be very popular, but recently my colleague at Microsoft Edgar Cotte built a new open-source solution that builds on Rui’s work and which is even more powerful, using Fabric Real-Time Intelligence to let you analyse gateway log information in real time.
Some time ago I wrote a post about how you can use the DISCOVER_CALC_DEPENDENCY DMV to get a list of all the tables, columns and relationships referenced by a DAX query. That’s really useful and indeed the new INFO.CALCDEPENDENCY DAX function allows you to do the same thing without all the complexity of calling a DMV. However this doesn’t quite give you everything you need to solve one important problem: which columns are touched when a DAX query runs? The INFO.CALCDEPENDENCY function gives you a list of all the columns used directly in a DAX query and it also gives you a list of all the columns needed by all the measures used in a DAX query. However it only gives you a list of the relationships used and these relationships also rely on columns. In this post I’ll show you how to get a list of all the columns that are touched by a DAX query.
For illustration, consider the following model:
Note that the relationship between the Product table and the Sales table is inactive. The definitions of the three measures are as follows:
Units = SUM(Sales[SalesUnits])
Value = SUM(Sales[SalesValue])
Apples Value =
CALCULATE (
[Value],
KEEPFILTERS ( 'Product'[ProductName] = "Apples" ),
USERELATIONSHIP ( 'Product'[ProductID], 'Sales'[ProductID] )
)
There is also a calculated column called CustomerID and Name defined as follows:
Customer ID and Name =
"(" & 'Customer'[CustomerID] & ") " & Customer[CustomerName]
Now consider the following visual:
The DAX query for this is as follows:
// DAX Query
DEFINE
VAR __DS0Core =
SUMMARIZECOLUMNS(
ROLLUPADDISSUBTOTAL('Country'[CountryName], "IsGrandTotalRowTotal"),
"Apples_Value", 'Sales'[Apples Value]
)
VAR __DS0PrimaryWindowed =
TOPN(502, __DS0Core, [IsGrandTotalRowTotal], 0, [Apples_Value], 0, 'Country'[CountryName], 1)
EVALUATE
__DS0PrimaryWindowed
ORDER BY
[IsGrandTotalRowTotal] DESC, [Apples_Value] DESC, 'Country'[CountryName]
Passing this query to INFO.CALCDEPENDENCY like so:
EVALUATE
VAR Source_Query =
"
DEFINE
VAR __DS0Core =
SUMMARIZECOLUMNS(
ROLLUPADDISSUBTOTAL('Country'[CountryName], ""IsGrandTotalRowTotal""),
""Apples_Value"", 'Sales'[Apples Value]
)
VAR __DS0PrimaryWindowed =
TOPN(502, __DS0Core, [IsGrandTotalRowTotal], 0, [Apples_Value], 0, 'Country'[CountryName], 1)
EVALUATE
__DS0PrimaryWindowed
ORDER BY
[IsGrandTotalRowTotal] DESC, [Apples_Value] DESC, 'Country'[CountryName]
"
VAR All_Dependencies =
INFO.CALCDEPENDENCY(
"QUERY",
Source_Query
)
RETURN All_Dependencies
[Note that all double quotes in the query being passed in have been escaped]
…returns all the measures, tables, columns and relationships used:
It knows that the Apples Value measure needs the SalesValue column and the ProductName column and it knows that the relationships between the Country table and the Sales table, and the Product table and the Sales table, are also required but it doesn’t tell you which columns those relationships in turn need. Luckily the new INFO.VIEW.RELATIONSHIPS() function can be used to get that extra information easily.
Here’s a more complex DAX query that returns a list of all the measures touched by a DAX query, including those needed by all the measures and all the relationships needed by that query:
EVALUATE
VAR Source_Query =
"
DEFINE
VAR __DS0Core =
SUMMARIZECOLUMNS(
ROLLUPADDISSUBTOTAL('Country'[CountryName], ""IsGrandTotalRowTotal""),
""Apples_Value"", 'Sales'[Apples Value]
)
VAR __DS0PrimaryWindowed =
TOPN(502, __DS0Core, [IsGrandTotalRowTotal], 0, [Apples_Value], 0, 'Country'[CountryName], 1)
EVALUATE
__DS0PrimaryWindowed
ORDER BY
[IsGrandTotalRowTotal] DESC, [Apples_Value] DESC, 'Country'[CountryName]
"
VAR All_Dependencies =
INFO.CALCDEPENDENCY(
"QUERY",
Source_Query
)
VAR Referenced_Cols =
SELECTCOLUMNS(
FILTER(
All_Dependencies,
[REFERENCED_OBJECT_TYPE] IN {"COLUMN", "CALC_COLUMN"}
),
"FullyQualifiedName",
"'" & [REFERENCED_TABLE] & "'[" & [REFERENCED_OBJECT] & "]"
)
VAR Referenced_Relationships =
SELECTCOLUMNS(
FILTER(
All_Dependencies,
[REFERENCED_OBJECT_TYPE] IN {"ACTIVE_RELATIONSHIP", "RELATIONSHIP"} ),
"Name",
[REFERENCED_OBJECT])
VAR Relationship_Details =
FILTER(
INFO.VIEW.RELATIONSHIPS(),
[Name] IN Referenced_Relationships
)
VAR Relationship_Columns =
UNION(
SELECTCOLUMNS(
Relationship_Details,
"FullyQualifiedName",
"'" & [FromTable] & "'[" & [FromColumn] & "]"
),
SELECTCOLUMNS(
Relationship_Details,
"FullyQualifiedName",
"'" & [ToTable] & "'[" & [ToColumn] & "]"
)
)
VAR All_Referenced_Columns =
UNION(
Referenced_Cols,
Relationship_Columns
)
RETURN
All_Referenced_Columns
Here’s what it returns:
As you can see, this returns a list of all the columns used directly by the query, any measures referenced by the query, and all the relationships involved too.
You can use this query very easily in DAX Query View: just paste it in, then capture the DAX query generated by your own visuals using Performance Analyzer, escape any double quotes and paste it into the Source_Query variable in the query above.
Here’s one more example of how it works, a visual that touches a lot more columns:
I won’t post the full DAX query of the visual or my query again, but here’s the list of columns returned by my query:
One thing to notice about this is that while it shows the calculated column [Customer ID and Name] is used, it doesn’t break down this column’s dependencies and therefore doesn’t list the [CustomerName] column. This is a deliberate decision on my part.
Why is all this useful? The obvious answer is that it would allow you to see whether a query would break or return a different result if you deleted a column from your model. More interestingly – and this is something I need to research more – given that in Direct Lake mode columns are paged into memory when needed by a query, and given that it’s possible to use other DMVs/INFO functions to get the size of a column in memory (for example as DAX Studio’s Model Metrics feature does), it should be possible to use this approach to find the total memory used by all the columns needed by a DAX query on a Direct Lake model. This would be very helpful when diagnosing memory errors in Direct Lake mode.