Text Analysis With Power BI Copilot

Power BI reports are usually used to display numeric data. However it is reasonably common to have text data in a semantic model too and for this to be displayed on a report, and while visuals like the new text slicer allow you apply filters to the text, large amounts of text can still be difficult for end users to make sense of. Recently I was wondering if Power BI Copilot can be used to summarise text data (similar to what is possible with Excel Copilot) and it turns out that the answer is yes – with some limitations.

To test this I created a simple semantic model consisting of a single table containing comments from diners at various restaurants:

I then created a report with (1) a slicer on the Restaurant field, (2) a table visual to display the contents of the Feedback field for the selected restaurant, and (3) a Smart Narrative visual:

For the Smart Narrative visual I used the preview Copilot option, restricted it to look at just the table visual, and used the following prompt:

Under a title, in bold, of "Feedback Summary" provide a brief summary of the comments in the Feedback column in no more than two bullet points. Include some quotes from the Feedback column but do not just repeat what is there.

And guess what, it works:

[Notice that when you change the slicer selection you have to click the Refresh button on the Smart Narrative visual to get it to update]

BUT before you get too excited, there are a few things you need to understand:

  • At the time of writing the flavour of Power BI Copilot used by the Smart Narrative visual can only summarise data that is visible on the report page – if it isn’t visible, even if it is present in the semantic model, then it can’t be summarised.
  • It can only consider the first 30000 rows from any given table visual, and if it is pointed at multiple visuals then it can only consider 150000 rows in total.
  • For each table row, it can only consider the first 100 characters of text – anything after that will be truncated. This will be increased to 500 characters of text soon but even then this could be a big limitation (for example the text “If you want good food you have to pay. That said, it’s very good value for a Michelin-starred restaurant.” is 105 characters long). You could work around it by splitting the text up into multiple rows where the length is less than the limit, but I don’t know what impact this would have on the quality of the results.
  • For most scenarios cost would not be a worry (see here for my post on the cost of Copilot; it’s the number of words, not the number of characters, that is significant), but even with the character limits if you had a lot of users trying to analyse a table displaying thousands rows of text values that is filtered/sliced regularly then it could get expensive.

As a result of this you should only consider Power BI Copilot for summarising relatively short pieces of text; you should not store long documents in your semantic model and expect Power BI Copilot to be able to summarise them. If you want to show summaries of longer pieces of text your best bet is to create them upstream before the data reaches Power BI.

20th Blog Birthday

Every year on this date I write a post reflecting on the previous year but, as you’ve probably guessed from the title, this is a special year: it’s twenty years since I started writing this blog. Twenty years ago today on the 30th of December 2004 I was working at IMS Health in London, I was in the office, it was quiet, and since blogging was the cool new thing – and since the great Mosha Pasumansky had just started blogging – I thought I should start a blog of my own. It turned out to be one of the best decisions I ever made career-wise. As a result I thought it would be good to look back on some of my favourite, or some of the most significant, posts from over the years.

My first ever post did a good job of outlining my intentions: back then I spent a lot of time answering questions on the microsoft.public.sqlserver.olap newsgroup and I wanted somewhere to share solutions to common problems, so I could avoid having to explain the same things over and over again. I’ll be honest, though, the posts from that first year are not great quality: the topics are fairly random, the formatting is all over the place and it was a while before I even started to include screenshots in my posts. But everyone has to start somewhere and if I have any advice to offer for people starting out creating content – blogs, videos or whatever – today then it’s to not be self-conscious about what you put out there. The most important thing is to publish regularly and not to give up; the more you do it, the better your content will get. I’ve always said that starting a blog is like going on a diet: most people give up after a few weeks, the majority have given up after a few months, and the few that continue in the long term realise that it’s not something that you can force yourself to do, it’s something that has to become a habit, something you do without thinking. You only realise the benefits after a couple of years.

For a long time my most popular post was OLAP Jokes, from August 2005. If you work (or worked) with MDX and SSAS Multidimensional it’s possible you may still get them, but even then some of them are very obscure. Who remembers arbitrary-shaped sets and the problems they cause? I barely can. I sometimes miss those early days of blogging when everyone was a lot less professional and content was a lot more eclectic; nowadays my content is a lot more uniform and on-topic. This was from the days before I had a consultancy and training business to promote; I wasn’t an MVP at this point. As you might expect, now I’m a Microsoft employee, I have to be a lot more careful about what I say on social media.

Indeed, the popularity of that last post was swiftly eclipsed by what became the most controversial post of my blogging career: this one on the announcement of SSAS Tabular and what this meant for SSAS Multidimensional. I think I had about fifteen thousand hits on that post within a couple of hours of publishing it and apparently customers were calling up Steve Ballmer to complain based on it. Reading it now I think my analysis of the situation was correct and my emotional reaction was understandable given that I’d based the last eleven years of my career on SSAS MD, but I was also guilty of using the following I’d built up by that point to stir up controversy. The feeling of power that you get from leading an online mob, however righteous the cause, can be intoxicating and is something you need to be wary of. I certainly got into a lot of hot water with Microsoft about this and I’m eternally grateful to Amir Netz for stepping in to resolve the situation.

After that I wrote fewer opinion pieces and discovered new technologies. I never really engaged as much with SSAS Tabular/Power Pivot and DAX as much as I had with SSAS MD and MDX (I left all that to Marco and Alberto, which turned out to be a bad business decision on my part) but I fell in love with Power Query and M. There are a few posts from those early days of Power Query which still get a lot of traffic today, for example my post on creating a date dimension table in M. Many other people have blogged about this subject over the years, many of them better than I did, but I was the first and that’s been my goal in the last few years: write about subjects that no-one else has written about before rather than provide definitive coverage of common problems. It’s not the way to get a really large readership but at that point in my life, when I was running my own consultancy and training business, I found it was a great way to reach the kind of people who were willing to pay for my services.

Sometimes this approach has led to valuable information being spread over multiple posts which makes it a lot harder to find. One example of this is my series of posts on memory errors: the first time I wrote about this, here in January 2020, was unexpectedly popular and detailed enough to be useful; I returned to this topic in June 2023 when Power BI Desktop started showing the same error and gave you settings to simulate different limits that might apply in the Power BI Service; finally, this summer, I wrote a series of in-depth posts (starting here) on all the different types of memory errors you might encounter in Power BI. In the past I might have pulled all this information together in a book; nowadays it’s more likely to turn into a user group presentation (in the way I did here for Power Query data privacy settings). I don’t do this often enough, though, and given the way Google has downgraded blogs in its search results in the last year in favour of forums, I should do it more often so that the information doesn’t get lost.

What about the future? I’ve never had a plan for what I do here so I have no idea what will happen. Part of the reason I have kept blogging is that I do what I enjoy rather than chase traffic, which in turn means writing is never a chore. That said I doubt I’ll be still writing this blog in another twenty years or even ten years; I don’t intend to retire any time soon but life at Microsoft is quite demanding so I hope to be doing something different and less stressful by then. I would like to say that I am extremely grateful to everyone who comes here to read what I write and who leaves me comments. Even more than writing and learning, I enjoy sharing what I have learned about, so when I meet people who’ve been helped by this blog it helps to motivate me to keep going. Thank you for reading!

Power BI Copilot And The Data Category Property

I have known for a while that setting the Data Category property on a column can influence the results you get from Power BI Copilot but I have struggled to find a scenario where it actually makes a noticeable difference. My sources on the Power BI Copilot team told me that Copilot does consider geographic categories (which are exposed in the Power BI Desktop UI) and date categories (which are available to set in tools like Tabular Editor but aren’t exposed in the Power BI Desktop UI; the full list of categories is documented here) but it’s only one factor among many. However, when writing my last blog post, I finally found a good example to show you all.

In my last blog post I showed how setting the Sort By Column property on the Launch column in this table:

…means that the following Copilot prompt:

Show Sales Amount by launch

…returns a column chart showing sales by launch year sorted by year in ascending order:

The fact that the column on the x axis is called just “Launch” and doesn’t mention anything about financial years, and that the contents of the column are text and not integers, means that Copilot doesn’t know that the contents of the column are financial years.

However, opening the model in Tabular Editor and setting the Data Category property on the Launch column to “Years”:

…means the same prompt now returns a line chart rather than a column chart:

[Interestingly this doesn’t happen if you set the Data Category to “FiscalYears”]

Although I’ve seen the behaviour change over the last few months, I think it’s the case that setting the Data Category property to one of the geography-related values can influence whether data is displayed using a map visual or not in some cases.

Why Setting Sort By Column Is Important For Power BI Copilot

As a Power BI semantic model developer, one of the first things you will learn about is using the Sort By Column property to make sure the values in a column – for example days of the week or months of the year – are sorted correctly. Recently, though, I came across a situation where it was important to set it when you might not expect to have to in order to get good results from Copilot.

Consider the following simple semantic model:

It contains a single table containing sales values broken down by the financial year that products were launched in:

You might think that is isn’t necessary to set the Sort By Column property on the Launch column because, even though it is a text value, sorting it alphabetically gives you the sort order you would expect. Clicking on this column in the Data pane in Desktop to automatically create a visual gives a table with the financial years in the correct order (ie in ascending order by year):

But selecting the Sales Amount measure changes the visual used and, crucially, changes the sort order:

Note that, now, the bar chart is sorted in descending order by the Sales Amount measure, rather than by year.

This type of automatic behaviour is important because it’s something Copilot will rely on. So, for example, the prompt:

Show Sales Amount by product launch date

Returns the same bar chart as a response:

While a prompt to create a new report page does something similar:

Show Sales Amount by product launch date as a new page

Note how, on this report page, both the line chart and the column chart are ordered by Sales Amount again. This is almost certainly not what your end users want to see.

So how can you get Copilot to generate visuals with Launch on the x axis sorted in year order? The solution is to add a new column (in this case called Launch YYYY and hidden) and use it to set the Sort By Column property for the Launch column:

Having done this the two prompts above return visuals sorted by the Launch column and not by the Sales Amount measure. So:

Show Sales Amount by product launch date

…returns

…and the prompt:

Show Sales Amount by product launch date as a new page

…returns:

This is a very simple example – in the real world you should have a separate date dimension table and follow all the other best practices for data modelling. The point is, though, that setting the Sort By Column property on a column means that sort order will always be respected in visuals created by Copilot unless you explicitly override it in your prompt.

Sending Alerts From Fabric Workspace Monitoring Using KQL Querysets And Activator

I’ve always been a big fan of using Log Analytics to analyse Power BI engine activity (I’ve blogged about it many times) and so, naturally, I was very happy when the public preview of Fabric Workspace Monitoring was announced – it gives you everything you get from Log Analytics and more, all from the comfort of your own Fabric workspace. Apart from my blog there are lots of example KQL queries out there that you can use with Log Analytics and Workspace Monitoring, for example in this repo or Sandeep Pawar’s recent post. However what is new with Workspace Monitoring is that if you store these queries in a KQL Queryset you can create alerts in Activator, so when something important happens you can be notified of it.

What type of things should you, as an admin, be notified of? I can think of lots, but to demonstrate what’s possible let’s take the example of DAX query errors. I created a simple semantic model and in it I created the following measure:

Measure Returning Error = ERROR("Demo Error!") 

It uses the DAX Error() function, so every time a report visual contains this measure it will return a custom error:

This makes it very easy to generate errors for testing; in the real world the kind of errors you would want to look out for are ones to do with broken DAX measures (maybe referring to other measures or columns that no longer exist) or ones where the query memory limit has been exceeded.

I published this semantic model to a workspace with Workspace Monitoring enabled and then created a report and deliberately created visuals that generated this error.

I then created a KQL Queryset and used the following KQL query to get all the Error events in the last hour for this exact error:

SemanticModelLogs
| where Timestamp > ago(1hr)
| where OperationName =="Error" 
//filter for exactly the custom error number generated by my DAX
| where StatusCode == "-1053163166"
| extend app = tostring(parse_json(ApplicationContext))
| project 
Timestamp,
EventText, 
ExecutingUser, 
modelid = extract_json("$.DatasetId", app), 
reportId = extract_json("$.Sources[0].ReportId", app),
visualId = extract_json("$.Sources[0].VisualId", app),
consumptionMethod = extract_json("$.Sources[0].HostProperties.ConsumptionMethod", app)

A few things to note about this query:

  • The filter on StatusCode allows me to only return the errors generated by the Error function – different errors will have different error numbers but there is no single place where these error numbers are documented, unfortunately.
  • The last half of the query parses the ApplicationContext column to get the IDs of the semantic model, report and visual that generated the error (something I blogged about here) and where the user was, for example the Power BI web application, when the error occurred (something I blogged about here).

Finally, I created an Activator alert from this KQL Queryset by clicking the “Set alert” button shown in the top right-hand corner of the screenshot above to send me an email every time this error occurred, checking (by running the KQL query) every hour:

I customised the contents of the email alert inside Activator:

And sure enough, after an hour, I started getting email alerts for each error:

The great thing about the combination of Workspace Monitoring, KQL and Activator is the flexibility you get. For example, generating one alert per error would probably result in too many alerts to keep track of; instead you could write your KQL query to aggregate the data and only get one alert per user and error type, or error type and visual. As more and more sources of data are added to Workspace Monitoring and more functionality is added – see the public roadmap for details – then being a Fabric admin will get easier and easier.

Refreshing A Power BI Semantic Model With Eventstreams, Pipelines And Activator

Following on from my last post where I showed how to send data from Power Automate to a Fabric Eventstream, in this post I’m going to show how to use it to solve one of my favourite problems: refreshing a Power BI semantic model that uses an Excel workbook stored in OneDrive as a source when that Excel workbook is modified.

Now before I go on I want to be clear that I know this is a ridiculously over-engineered and expensive (in terms of CUs) solution, and that you can do almost the same thing just using Power Automate or in several other different ways – see my colleague Mark Pryce-Maher’s recent videos on using Fabric Open Mirroring with Excel for example. I’m doing this to teach myself Fabric Eventstreams and Activator and see what’s possible with them. Please excuse any mistakes or bad practices.

To start off, I built a Power BI semantic model containing a single table, connected to an Excel workbook stored in OneDrive for Business:

I then built a Fabric Pipeline to refresh the semantic model, using the newly-enhanced Semantic Model Refresh activity:

Next, I created a new Fabric Eventstream and created a slightly more sophisticated version of the Power Automate flow in my previous post that ran every 30 seconds, checked to see if the Excel file was modified, and if it was, sent an event to the Eventstream:

The Compose action in this flow generates some JSON that is sent to the Eventstream and contains the name of the file that has been altered and a dummy value called EventCounter that always contains 1:

I then connected the Eventstream to a new Activator:

In the Activator, I created a new object called ExcelWorkbook. I used the FileName field as the unique identifier and added the EventCounter field as a property:

Finally I created a new rule on the EventCounter property that aggregated the events over 1 minute windows and then fired when the total number of events in each window changed to the value 0 and stayed at 0 for 1 minute:

This is where the real value of Activator’s rules comes in. As I said, it’s easy using other approaches to refresh a semantic model when an Excel file changes. The problem with doing this, however, is that Excel files in OneDrive for Business autosave on a regular basis and if you make several changes to the file over the space of several minutes, you’ll end up trying to refresh the semantic model all the time. This can lead to excessive CU usage on your capacity because you’re refreshing too often as well as situations where one refresh fails because another is already in progress. What the rule above does is only trigger a refresh when changes to the Excel workbook have been detected (so the Total is greater than 0), then there has been a minute where no changes have been detected (so the Total changes to 0) and there has been a further minute where no changes have been detected (so the Total stays at 0). Since Activator rules can trigger Fabric items, I hooked my rule up to the Pipeline shown above to run the refresh:

This screenshot of the Definition tab of the rule shows a pattern of events where there were four one-minute windows where the Excel file was edited and the rule fired three times:

The problem with this rule is that if the Excel file is edited constantly for a long period, so that there is no period of inactivity, the semantic model would not get refreshed until that period of activity is over, which might be too long to wait. I guess the solution would be another rule that detects periods of, say, 10 minutes when the file has been edited constantly and refreshes the model even if there is no period of inactivity.

All in all I think this shows how the combination of Power BI/Power Platform and new features in Fabric can be used to solve old problems in new, interesting ways. I’m looking forward to learning more about Fabric Real-Time Intelligence and Activator!

[Thanks to James Hutton from the Activator team for his help with this post]

Send Data From Power Automate To Fabric Using Eventstreams

Fabric’s Real-Time Intelligence features are, for me, the most interesting things to learn about in the platform. I’m not going to pretend to be an expert in them – far from it – but they are quite easy to use and they open up some interesting possibilities for low-code/no-code people like me. The other day I was wondering if it was possible to send events and data from Power Automate to Fabric using Eventstreams and it turns out it is quite easy to do.

Here’s a really simple illustration. I created a new Eventstream in a Fabric workspace, then clicked the “Use custom endpoint” option and published the Eventstream so it was ready to send data to:

Then, in Power Automate, I built a super-simple flow with a “Manually trigger a flow” trigger and then the “Send event” action from the Event Hubs connector:

This works because you can send data from Power Automate to the Eventstream custom endpoint using the Event Hub protocol. To set up the connection to the Eventstream in Power Automate I entered a name for the connection, chose the “Access Key” authentication type and then copied the “Connection string-primary key” value from the Eventstream custom endpoint, shown in the first screenshot above (you need to click the eye icon next to it in order to copy it to the clipboard) and pasted it into the “Connection String” property in the Power Automate connector:

Then I configured the “Send event” action like so:

I had to manually paste the Event Hub Name, shown in the first screenshot above, as a custom value because I got an error when I tried to use Power Automate’s dropdown box to find the name. I put the Timestamp from the “Manually trigger a flow” trigger into the Content property.

After saving the flow I was then able to trigger it from my browser or the Power Automate mobile app, and every time the flow ran it sent the time of the run to my Eventstream:

Very basic but it proves the point. What could you use this for? Power Automate has a huge number of connectors for Microsoft and third party services: for example you can trigger a flow when a file stored in OneDrive for Business is modified or when new content is published to an RSS feed. Sending these events and their related data on to a Fabric Eventstream not only means you can capture them for analysis in OneLake or an EventHouse but also send the events on to Fabric Activator, which can apply sophisticated rules so you can be alerted when something happens and specific conditions are met in Teams or via email, run Fabric items such as notebooks or even trigger Power Automate flows. I’ll explore some of these scenarios in more detail in future blog posts.

Power BI Copilot: Where Are The CUs Consumed?

If you’ve looked into using Copilot with Power BI, you’ll know that you need a capacity to use it: a P1/F64 or greater. Something I learned recently, though, is that the current Power BI Copilot capabilities are only accessible via a side pane to a report and it’s the report that needs to be stored in a workspace on a P1/F64+ capacity and that’s where the CUs for Copilot are consumed. If the report has a live connection to a semantic model stored in a workspace not on a capacity (which we refer to at Microsoft as “Shared”, but is widely known as “Pro”) then Copilot still works!

For example, I published a semantic model to a workspace stored on Shared/Pro:

I then built a report with a Live Connection to that semantic model and saved it to a workspace on a P1 capacity:

I then opened this report, used Power BI Copilot, and saw that a) it worked and b) the CUs were attributed to the workspace where the report was stored when I looked in the Capacity Metrics App:

This is particularly interesting if you have a large number of semantic models that are stored in Shared/Pro workspaces today but still want to use Power BI Copilot. It means you don’t have to move your semantic models into a workspace on a capacity, which in turn means that refreshing those models or querying them won’t consume CUs on a capacity, which in turn means that you’ll be able to support more users for Power BI Copilot on your capacities.

In the future you’ll have even more flexibility. In the sessions “Microsoft Fabric: What’s New And What’s Next” and “Boost productivity with Microsoft Fabric​, Copilot, and AI” at Ignite last week we announced a number of new Power BI Copilot capabilities coming soon, such as integration with AI Skills and a new “immersive” Copilot experience. We also announced Fabric AI Capacities, coming early 2025, which will allow you to direct all Copilot activity to certain capacities regardless of whether your reports or models are stored on Shared/Pro or Premium, or which capacity they are stored on (see more details from the 16:00 minute mark in the recording of the “Boost productivity with Microsoft Fabric, Copilot and AI” session).

Finding The Size Of All Of The Columns Touched By A DAX Query

I had meant to follow up my recent post on how to find the columns touched by a DAX query by writing one on how to use this technique to find the size of these columns in memory, so you can find the total size of the columns that need to be paged into memory when a DAX query runs on a Direct Lake semantic model. Before I could do that, though, my colleague Michael Kovalsky messaged me to say that not only had he taken the query from that first post and incorporated it in Semantic Link Labs, he’d done the work to get column sizes too. All that’s left for me to do, then, is give you some simple examples of how to use it.

To use Semantic Link Labs you just need to create a new Fabric notebook and install the library:

%pip install semantic-link-labs

After that you can use sempy_labs.get_dax_query_dependencies to get the columns touched by any DAX query, for example:

import sempy_labs as labs
labs.get_dax_query_dependencies(
        dataset = 'InsertSemanticModelName',
        workspace = 'InsertWorkspaceName',
        dax_string = "InsertDAXQuery",
    )

This returns a dataframe with one row for each column touched by the query, plus various statistics about the size of each column in memory.

If you’re working with a Direct Lake semantic model, though, in order to get the correct sizes of each column in memory the query itself will need to have been run beforehand; you can ensure that this happens by setting the optional parameter put_in_memory to True:

import sempy_labs as labs
labs.get_dax_query_dependencies(
        dataset = 'InsertSemanticModelName',
        workspace = 'InsertWorkspaceName',
        dax_string = "InsertDAXQuery",
        put_in_memory = True
    )

Last of all, if you don’t want a dataframe but just want a single number representing the total memory needed by all columns touched by a query, you can use sempy_labs.get_dax_query_memory_size, for example like this:

import sempy_labs as labs
labs.get_dax_query_memory_size(
        dataset = 'InsertSemanticModelName',
        workspace = 'InsertWorkspaceName',
        dax_string = "InsertDAXQuery"
    )

Yet more evidence that, for any Power BI user, Semantic Link and Semantic Link Labs are the best reasons for you to flip the switch to enable Fabric. To find out more about what they are capable check out this user group presentation.

Tuning Power BI Copilot With Summarize By

Continuing my occasional series of ways you can improve the results you get from Power BI Copilot by setting properties on your semantic model, in this post I’ll look at how setting the Summarize By property can have an impact.

Let’s say you have a semantic model that contains the following table of temperature readings at different English seaside resort towns over the summer:

Now let’s say you enter the following prompt in the Copilot pane on a blank report (this forces Copilot to query the semantic model rather than look for answers from visuals in the report):

show temperature by resort

Here’s the response:

As you can see, Power BI Copilot has summed up the temperatures for each resort and this does not make sense. You could argue that the prompt should be more precise and specify the aggregation method to use, but it’s unlikely that most end users would know to do this on their first attempt at prompting or even ever. You could also argue that it would be better to expose an explicit measure and hide the underlying Temperature (C) column in the model – indeed, this is what I would do – but the downside of this is that it may limit the type of analyses that an end user can do.

Let’s say that what you really wanted to see for this prompt is average temperature by resort, and as a semantic model developer you can ensure this happens by setting the Summarize By property on the Temperature (C) column to Average, either on the Column Tools tab in the ribbon or in the Model View:

After making this change, the same prompt now returns the average temperature for each resort:

As with some of the other properties such as Row Label and Key Column I’ve talked about in this series, the Summarize By property is not new but it’s a lot more important to set it when designing models for Copilot.

One last thing to note is that as well as specifying how a column should be aggregated, the Summarize By property can be used to specify that values in a column should never be aggregated at all. This is particularly useful for numeric columns that contain categorical data such as year or zip codes. You can do this by setting Summarize By to None in the model view:

…or, if you’re setting this property on the Column Tools tab in the ribbon, using the Don’t Summarize option: