Reading Delta Metadata In Power Query

There’s a new M function rolling out now that allows you to read metadata from Delta tables (at the time of writing it’s available in Dataflows Gen2 and will be available soon in Desktop). It builds on the DeltaLake.Table M function that allows you to read data from Delta tables and is similar to the Parquet.Metadata function that was released last year. Here’s an example of how to use it to get metadata from a Delta table in OneLake:

let
  Source = AzureStorage.DataLake(
    "https://onelake.dfs.fabric.microsoft.com/insertworkspaceidhere/insertlakehouseidhere/Tables/inserttablenamehere", 
    [HierarchicalNavigation = true]
  ), 
  ToDelta = DeltaLake.Metadata(
    DeltaLake.Table(Source)
  )
in
  ToDelta

The function returns a table of records containing the metadata from the Delta table such as the schema, how the table is partitioned, and whether the table is V-Ordered or not:

Improve Power Query Performance On CSV Files Containing Date Columns

A few weeks ago I replied to a question on reddit where someone was experiencing extremely slow performance when importing data from a CSV file using Power Query. The original poster worked out the cause of the problem and the solution themselves: they saw that removing all date columns from their query made their Power Query query much faster and that using the Date.FromText function and specifying the date format solved the problem. While I couldn’t reproduce the extreme slowness that was reported I was able to reproduce a performance difference between the two approaches and Curt Hagenlocher of the Power Query team confirmed that this was expected behaviour.

Let’s see an example. I created a CSV file with five date columns and one million rows, then created a Power Query query to import this data into Power BI Desktop using the default M code generated by the Power Query Editor:

let
  Source = Csv.Document(
    File.Contents("C:\GenerateDates.csv"),
    [
      Delimiter  = ",",
      Columns    = 5,
      Encoding   = 65001,
      QuoteStyle = QuoteStyle.None
    ]
  ),
  #"Promoted Headers" = Table.PromoteHeaders(
    Source,
    [PromoteAllScalars = true]
  ),
  #"Changed Type" = Table.TransformColumnTypes(
    #"Promoted Headers",
    {
      {"Extra Spaces", type date},
      {"Extra Spaces - 2", type date},
      {"Extra Spaces - 3", type date},
      {"Extra Spaces - 4", type date},
      {"Extra Spaces - 5", type date}
    }
  )
in
  #"Changed Type"

The dates in the CSV file were in the following format:

02  Jan   1901

…and this is important: there are two spaces between the day and the month name and three spaces between the month name and the year.

Using SQL Server Profiler I found that this query took around 14 seconds to run.

I then created a second query that, instead of using Table.TransformColumnTypes to set the data type on the columns, used Date.FromText and the Format option:

let
  Source = Csv.Document(
    File.Contents("C:\GenerateDates.csv"),
    [
      Delimiter  = ",",
      Columns    = 5,
      Encoding   = 65001,
      QuoteStyle = QuoteStyle.None
    ]
  ),
  #"Promoted Headers" = Table.PromoteHeaders(
    Source,
    [PromoteAllScalars = true]
  ),
  DateConversionFunction = (inputDate) as date =>
    Date.FromText(
      inputDate,
      [Format = "dd  MMM   yyyy"]
    ),
  ChangeDate = Table.TransformColumns(
    #"Promoted Headers",
    {
      {
        "Extra Spaces",
        DateConversionFunction,
        Date.Type
      },
      {
        "Extra Spaces - 2",
        DateConversionFunction,
        Date.Type
      },
      {
        "Extra Spaces - 3",
        DateConversionFunction,
        Date.Type
      },
      {
        "Extra Spaces - 4",
        DateConversionFunction,
        Date.Type
      },
      {
        "Extra Spaces - 5",
        DateConversionFunction,
        Date.Type
      }
    }
  )
in
  ChangeDate

This version of the query took around 10.5 seconds to run, so not a huge improvement but a noticeable one. It’s certainly not the 6/7x performance improvement seen on the reddit post but I’m sure different data, different date formats and different hardware might result in bigger differences.

I was told by Curt that when Power Query uses Table.TransformColumnTypes to parse date data from CSV files it tries a series of different date formats in order: first it tries ISO-8601 (for example 9th February 2025 would be “2025-02-09”), then a long date format, then a short date format, and finally it uses a generic .NET date parsing function which is slower than the others. It does this to make sure date parsing “just works” as often as possible. The dates in the example above, with the extra spaces, were deliberately designed to be slow for Table.TransformColumnTypes. When I tested on CSV files that contained dates in IS-8601 format I found that Table.TransformColumnTypes performed the same as Date.FromText.

So, to sum up, if you’re using CSV files containing date columns as a source for Power Query and you’re experiencing performance problems, try changing your M code to use Date.FromText instead of Table.TransformColumnTypes to set the data types on the date columns.

Limit The Impact Of Expensive Power BI Queries On Your Capacity By Reducing The Query Timeout

The recent announcement of Surge Protection gives Fabric/Power BI capacity admins a way to restrict the impact of background operations on a capacity, preventing them from causing throttling. However, at the time of writing, Surge Protection does not prevent users that are running expensive DAX or MDX queries – which are interactive operations – from causing problems on your capacity. Indeed, right now, there is no direct way to stop runaway queries from consuming a lot of CUs, although there is something you can do which will help a lot: reducing the query timeout.

Surge Protection doesn’t address the problem of expensive queries yet because Power BI only knows the CU usage of a DAX or MDX query when it has finished running – by which time it’s too late to do anything about it. In many cases, though, DAX or MDX queries that consume a lot of CUs are also slow. Therefore reducing the query timeout, which will kill any query that runs longer than a specified duration, will stop these queries from consuming so many CUs.

There are two default query timeouts that you should be aware of in Power BI. First, all DAX queries generated by a Power BI report have a 225 second timeout applied by the report itself. This timeout can be changed in Power BI Desktop but it cannot be changed on a published report in the Power BI Service. Second, you can set a timeout at the capacity level by changing the Query Timeout property in the admin portal. The default setting here is 3600 seconds (one hour). Unlike the first timeout, which only applies to the DAX queries generated by a Power BI report, this timeout applies to all queries run on any semantic model associated with the capacity, including the MDX queries generated by Excel PivotTables via Analyze In Excel. Setting this second timeout to less than 225 seconds means that it will take precedence over the first timeout. Therefore it’s the Query Timeout property on your capacity that you should set.

Hitting a timeout in a Power BI report will give the user a “Query has exceeded the available resources” error; clicking See Details/More Details will give you a message like this:

The XML for Analysis request timed out before it was completed. Timeout value: 10 sec.

Hitting the query timeout in an Excel PivotTable will give you the same message:

What value should you set the Query Timeout to? In my opinion no query should ever run for more than 30 seconds because anything slower will result in a poor experience for your end users – no-one wants to sit around for ages waiting for a report to render. I also think it should be possible to tune any semantic model so all queries run under 30 seconds if you know what you’re doing. That said, in the real world, setting a timeout of 30 seconds may be unrealistic: developers may not have the skills to tune their semantic models. As a result I find a timeout of 100 seconds is often a good compromise but you should experiment with different timeouts to see what the minimum value you can get away with is.

It’s important to note that reducing the query timeout will not stop every expensive query. This is because it’s perfectly possible to have very fast queries that consume a lot of CUs – for example when distinct count measures are used, and/or when there are very large data volumes and/or when there are complex but highly-optimised measures. Also there relatively rare cases where a query will carry on running beyond the duration specified by the timeout, because the Vertipaq engine only checks if the timeout has been exceeded at certain points in the code and depending on the query there could be several seconds (sometimes more) between these checks. Equally, some very slow queries may not use a lot of CUs and having them time out might cause unnecessary disruption. Overall, though, in my experience setting a timeout will stop enough expensive queries to make doing so worthwhile.

[Update: my colleague Akshai Mirchandani has just reminded me that you can also set the Query Timeout at the workspace level as a Server Property using SQL Server Management Studio, as detailed here. The property is called ServerTimeout. This gives you more flexibility than setting it for the whole capacity.]

Why DAX Is Better Than MDX For Bulk Extracts Of Data From Power BI

This is a post I’ve avoided writing for many years, and before I carry on let me make one thing clear:

Doing bulk extracts of data from a Power BI semantic model is a **really** bad idea

My colleague Matthew Roche wrote a great post on this topic a couple of years ago that is still relevant: using Power BI (or Analysis Services) as a data source for other systems, including other Power BI Import mode semantic models, is an anti-pattern. Power BI is optimised for small, analytical queries that return the amount of data that can be visualised on a single page. It is not optimised for queries that return millions of rows. Running this kind of query on a Power BI semantic model will be slow, is likely to run into timeouts and memory errors, and is also likely to cause CU spikes – and perhaps throttling – on a Premium capacity. If you want the data from a semantic model it’s much better to go back to the original data sources that the semantic model uses.

But

People still use Power BI semantic models as data sources all the time. This is either because they don’t know any better, because they can’t get access to the underlying data sources, or because they want to get the result of any DAX calculations on the model.

If you do need to extract large amounts of data from a semantic model I have one important piece of advice: write a DAX query to get the data and not an MDX query. There are two reasons for this:

Writing a DAX query to get granular data is usually a lot simpler than writing an MDX query
DAX queries that return large amounts of data are typically faster (and so less likely to hit timeouts), more CPU efficient (and therefore less likely to cause throttling on a capacity) and more memory efficient (and so less likely to cause memory errors)

The bad news is that the two client tools most often used to bulk extract data from Power BI, Excel PivotTables and Power Query using the Analysis Services connector and its query builder, generate MDX queries. What’s more, they don’t always generate the most efficient MDX queries either.

Let’s see an example. I have a semantic model in a Premium workspace with a table called Property Transactions with around a million rows in it. I connected to the model via the XMLA Endpoint using the “From SQL Server Analysis Services Database (Import)” option in Power Query in Excel:

…and then created a query to get the data from all the columns on the Property Transactions table plus one measure, called Count of Sales, using Power Query’s query builder:

While the query builder generated the MDX for me, you can see that it was not a simple query:

I ran a Profiler trace while this query ran and from the Execution Metrics I saw that:

The query took 54 seconds to complete
CPU Time was also 54 seconds
The approximate peak memory usage of the query was 626292KB

I then created a second Power Query query that used the following DAX query to get the same data, which I think you’ll agree is much more straightforward:

EVALUATE 
ADDCOLUMNS('Property Transactions', "Count of Sales", [Count of Sales])

[You have the option of entering a customer MDX or DAX query when you create your Power Query query]

This time, Execution Metrics showed me that:

The query took 6 seconds to complete
CPU Time was 6 seconds too
The approximate peak memory usage was 142493KB

So the DAX query was simple to write and maintain, took 11% of the time that the MDX query to run, used 11% of the CPU and 22% of the memory. That’s a big improvement. Even though I might be able to rewrite the MDX generated by Power Query to be more efficient there’s no way it would be as simple or as efficient as the DAX query.

[Thanks to Akshai Mirchandani for the information in this post]

TMDL View And Power BI Developer Productivity: An Example Using The Detail Rows Definition Property

For me the biggest new feature in the January 2025 release of Power BI Desktop is the new TMDL View; many other people like Marco are excited about it too. For more advanced Power BI developers (and honestly, I don’t think you need to be that advanced to get value out of it) it makes certain editing tasks for semantic models much simpler, and while I won’t be abandoning the main Power BI Desktop UI completely or stopping using external tools like Tabular Editor it is something I see myself using on a regular basis from now on.

One of the things it allows is the editing of semantic model properties and features that are not exposed by the Power BI Desktop UI but which are nonetheless supported by the engine, and which up to now you’d have had to use Tabular Editor to set. The announcement blog post mentions a few of these – perspectives (useful for the Personalize Visual feature for example) and the isAvailableInMdx property – but my favourite underused property is the Detail Rows Definition property of a measure, also known as Detail Rows Expression. If you have end users querying your model using Analyze In Excel it allows you to customise the results returned by an Excel PivotTable’s Show Details feature; as you might expect Marco and Alberto have an excellent, detailed article on it here. Setting this property allows you to control which columns and rows (so the correct rows are shown for non-trivial measures, as I described here) are returned, and if you can educate your users to use Show Details it can perform a lot better than a gigantic PivotTable to show detail-level data from your model.

What does the workflow for setting this property on a model in Power BI Desktop look like now? What benefits do TMDL View and all the other recent pro developer enhancements in Desktop bring for someone like me? Let’s say I have a measure called Count Of Sales in my semantic model and that I want to customise the columns and their names that are returned by Show Details for this measure. The Detail Rows Definition property takes a DAX expression that returns a table so the first step is to write that expression; now that we have DAX Query View I can do that without leaving Power BI Desktop and because I’m lucky enough to have access to Power BI Copilot (one of the perks of working for Microsoft) I can use that to write my DAX expression easily. I gave Copilot the prompt:

Write a query that gives me the Date, County, Town and Postcode columns columns from the Property Transactions table along with the Count of Sales measure. Rename the Date column to be "Sales Date".

…and it immediately gave me the DAX query I wanted without needing to faff around looking up the syntax to the SELECTCOLUMNS() function:

EVALUATE
  SELECTCOLUMNS(
    'Property Transactions',
    "Sales Date", 'Property Transactions'[Date], // Renaming Date column to Sales Date
    "County", 'Property Transactions'[County],
    "Town", 'Property Transactions'[Town],
    "Postcode", 'Property Transactions'[Postcode],
    "Count of Sales", [Count Of Sales] // Including the Count of Sales measure
  )

Next, I copied the DAX query minus the EVALUATE statement, switched over to the TMDL View pane, dragged and dropped the Count of Sales measure into a Script pane:

And then, underneath the existing measure definition, started typing detailRowsDefinition – the intellisense picked up what I was typing before I even had to finish:

I then tried to paste the DAX query into TMDL View and realised it didn’t like line breaks. Rather than trying to look up how to do this with the editor – which I’m sure is possible – I just switched back to DAX Query View, highlighted the query, entered the prompt:

format this query so there are no line breaks

…and it did the job for me! I copied the DAX table expression again and pasted it into TMDL View after the detailRowsDefinition property:

[I was pleasantly surprised that I didn’t have to escape any characters or mess around with double quotes]

Next I hit the Apply button and tested Analyze in Excel with a PivotTable and Show Details:

And bingo, I got the results I wanted:

Was all of this possible before? Absolutely. Is it much quicker and easier now with TMDL View, DAX Query View and Copilot? Absolutely. I’m a pretty experienced Power BI developer and I could certainly have written the DAX expression without Copilot, I have DAX Studio installed (which has pretty good query builder too) to write and test the query, and I have Tabular Editor to set the property. But being able to do all this just using Power BI Desktop makes me so much more productive.

Changing The Source Lakehouse Of Power BI Direct Lake Models In Deployment Pipelines

If you’re using deployment pipelines with Direct Lake semantic models in Power BI you’ll have found that when you deploy your model from one stage to another by default the model still points to the Lakehouse it was originally bound to. So, for example, if you deploy your model from your Development stage to your test stage, the model in the Test stage still points to the Lakehouse in the Development stage. The good news is that you can use the deployment rules feature of deployment pipelines to make sure the model in the Test stage points to a Lakehouse in the Test stage and in this post I’ll show you how.

To illustrate how to do this I created a workspace to represent a dev environment with a Lakehouse containing one table, called MyTable, containing the following data:

I then created a second workspace to represent a test environment with a Lakehouse containing a table with the same name and the same schema but with different data:

Then, back in the dev workspace I created a custom semantic model pointing to the mydata table and built a report on it to show the data:

Here’s what the dev workspace looked like:

I then created a deployment pipeline, assigned the two workspaces to the Development and Test stages of it, and configured it so that the semantic model and report would be deployed from Development to Test:

I then did a deployment and, as mentioned, although the report and the semantic model were moved to the Test stage workspace, the model still pointed to the Lakehouse in the Development stage workspace so the report showed the same data as before:

To fix this I created a deployment rule using the “Data source rules” option. This is where things get tricky. When you create the rule you can see that the custom model on the From side has a Database and a Server property to know which Lakehouse it is pointing to:

This means that on the To side you need to choose the “Other” option in the dropdown box and enter values for the Database and Server:

What do these properties mean?

The Database property is the ID of the SQL Endpoint of the Lakehouse the model is pointing to and you can find it by opening the SQL Endpoint UI of the Lakehouse in the browser and copying it from the part of the URL that immediately follows “lakehouses”:

The Server property is the SQL connection string of the SQL Endpoint, which can be found by clicking the Gear icon on the menu in the SQL Endpoint UI to open the Settings pane:

I changed the settings in the data source rule to the values from the Lakehouse in the Test workspace and then redeployed. After doing that the report in my Test workspace showed the data from the Lakehouse in the Test workspace (note: I had to manually refresh the report first to do this to flush out stale data from the cache):

Text Analysis With Power BI Copilot

Power BI reports are usually used to display numeric data. However it is reasonably common to have text data in a semantic model too and for this to be displayed on a report, and while visuals like the new text slicer allow you apply filters to the text, large amounts of text can still be difficult for end users to make sense of. Recently I was wondering if Power BI Copilot can be used to summarise text data (similar to what is possible with Excel Copilot) and it turns out that the answer is yes – with some limitations.

To test this I created a simple semantic model consisting of a single table containing comments from diners at various restaurants:

I then created a report with (1) a slicer on the Restaurant field, (2) a table visual to display the contents of the Feedback field for the selected restaurant, and (3) a Smart Narrative visual:

For the Smart Narrative visual I used the preview Copilot option, restricted it to look at just the table visual, and used the following prompt:

Under a title, in bold, of "Feedback Summary" provide a brief summary of the comments in the Feedback column in no more than two bullet points. Include some quotes from the Feedback column but do not just repeat what is there.

And guess what, it works:

[Notice that when you change the slicer selection you have to click the Refresh button on the Smart Narrative visual to get it to update]

BUT before you get too excited, there are a few things you need to understand:

At the time of writing the flavour of Power BI Copilot used by the Smart Narrative visual can only summarise data that is visible on the report page – if it isn’t visible, even if it is present in the semantic model, then it can’t be summarised.
It can only consider the first 30000 rows from any given table visual, and if it is pointed at multiple visuals then it can only consider 150000 rows in total.
For each table row, it can only consider the first 100 characters of text – anything after that will be truncated. This will be increased to 500 characters of text soon but even then this could be a big limitation (for example the text “If you want good food you have to pay. That said, it’s very good value for a Michelin-starred restaurant.” is 105 characters long). You could work around it by splitting the text up into multiple rows where the length is less than the limit, but I don’t know what impact this would have on the quality of the results.
For most scenarios cost would not be a worry (see here for my post on the cost of Copilot; it’s the number of words, not the number of characters, that is significant), but even with the character limits if you had a lot of users trying to analyse a table displaying thousands rows of text values that is filtered/sliced regularly then it could get expensive.

As a result of this you should only consider Power BI Copilot for summarising relatively short pieces of text; you should not store long documents in your semantic model and expect Power BI Copilot to be able to summarise them. If you want to show summaries of longer pieces of text your best bet is to create them upstream before the data reaches Power BI.

20th Blog Birthday

Every year on this date I write a post reflecting on the previous year but, as you’ve probably guessed from the title, this is a special year: it’s twenty years since I started writing this blog. Twenty years ago today on the 30th of December 2004 I was working at IMS Health in London, I was in the office, it was quiet, and since blogging was the cool new thing – and since the great Mosha Pasumansky had just started blogging – I thought I should start a blog of my own. It turned out to be one of the best decisions I ever made career-wise. As a result I thought it would be good to look back on some of my favourite, or some of the most significant, posts from over the years.

My first ever post did a good job of outlining my intentions: back then I spent a lot of time answering questions on the microsoft.public.sqlserver.olap newsgroup and I wanted somewhere to share solutions to common problems, so I could avoid having to explain the same things over and over again. I’ll be honest, though, the posts from that first year are not great quality: the topics are fairly random, the formatting is all over the place and it was a while before I even started to include screenshots in my posts. But everyone has to start somewhere and if I have any advice to offer for people starting out creating content – blogs, videos or whatever – today then it’s to not be self-conscious about what you put out there. The most important thing is to publish regularly and not to give up; the more you do it, the better your content will get. I’ve always said that starting a blog is like going on a diet: most people give up after a few weeks, the majority have given up after a few months, and the few that continue in the long term realise that it’s not something that you can force yourself to do, it’s something that has to become a habit, something you do without thinking. You only realise the benefits after a couple of years.

For a long time my most popular post was OLAP Jokes, from August 2005. If you work (or worked) with MDX and SSAS Multidimensional it’s possible you may still get them, but even then some of them are very obscure. Who remembers arbitrary-shaped sets and the problems they cause? I barely can. I sometimes miss those early days of blogging when everyone was a lot less professional and content was a lot more eclectic; nowadays my content is a lot more uniform and on-topic. This was from the days before I had a consultancy and training business to promote; I wasn’t an MVP at this point. As you might expect, now I’m a Microsoft employee, I have to be a lot more careful about what I say on social media.

Indeed, the popularity of that last post was swiftly eclipsed by what became the most controversial post of my blogging career: this one on the announcement of SSAS Tabular and what this meant for SSAS Multidimensional. I think I had about fifteen thousand hits on that post within a couple of hours of publishing it and apparently customers were calling up Steve Ballmer to complain based on it. Reading it now I think my analysis of the situation was correct and my emotional reaction was understandable given that I’d based the last eleven years of my career on SSAS MD, but I was also guilty of using the following I’d built up by that point to stir up controversy. The feeling of power that you get from leading an online mob, however righteous the cause, can be intoxicating and is something you need to be wary of. I certainly got into a lot of hot water with Microsoft about this and I’m eternally grateful to Amir Netz for stepping in to resolve the situation.

After that I wrote fewer opinion pieces and discovered new technologies. I never really engaged as much with SSAS Tabular/Power Pivot and DAX as much as I had with SSAS MD and MDX (I left all that to Marco and Alberto, which turned out to be a bad business decision on my part) but I fell in love with Power Query and M. There are a few posts from those early days of Power Query which still get a lot of traffic today, for example my post on creating a date dimension table in M. Many other people have blogged about this subject over the years, many of them better than I did, but I was the first and that’s been my goal in the last few years: write about subjects that no-one else has written about before rather than provide definitive coverage of common problems. It’s not the way to get a really large readership but at that point in my life, when I was running my own consultancy and training business, I found it was a great way to reach the kind of people who were willing to pay for my services.

Sometimes this approach has led to valuable information being spread over multiple posts which makes it a lot harder to find. One example of this is my series of posts on memory errors: the first time I wrote about this, here in January 2020, was unexpectedly popular and detailed enough to be useful; I returned to this topic in June 2023 when Power BI Desktop started showing the same error and gave you settings to simulate different limits that might apply in the Power BI Service; finally, this summer, I wrote a series of in-depth posts (starting here) on all the different types of memory errors you might encounter in Power BI. In the past I might have pulled all this information together in a book; nowadays it’s more likely to turn into a user group presentation (in the way I did here for Power Query data privacy settings). I don’t do this often enough, though, and given the way Google has downgraded blogs in its search results in the last year in favour of forums, I should do it more often so that the information doesn’t get lost.

What about the future? I’ve never had a plan for what I do here so I have no idea what will happen. Part of the reason I have kept blogging is that I do what I enjoy rather than chase traffic, which in turn means writing is never a chore. That said I doubt I’ll be still writing this blog in another twenty years or even ten years; I don’t intend to retire any time soon but life at Microsoft is quite demanding so I hope to be doing something different and less stressful by then. I would like to say that I am extremely grateful to everyone who comes here to read what I write and who leaves me comments. Even more than writing and learning, I enjoy sharing what I have learned about, so when I meet people who’ve been helped by this blog it helps to motivate me to keep going. Thank you for reading!

Power BI Copilot And The Data Category Property

I have known for a while that setting the Data Category property on a column can influence the results you get from Power BI Copilot but I have struggled to find a scenario where it actually makes a noticeable difference. My sources on the Power BI Copilot team told me that Copilot does consider geographic categories (which are exposed in the Power BI Desktop UI) and date categories (which are available to set in tools like Tabular Editor but aren’t exposed in the Power BI Desktop UI; the full list of categories is documented here) but it’s only one factor among many. However, when writing my last blog post, I finally found a good example to show you all.

In my last blog post I showed how setting the Sort By Column property on the Launch column in this table:

…means that the following Copilot prompt:

Show Sales Amount by launch

…returns a column chart showing sales by launch year sorted by year in ascending order:

The fact that the column on the x axis is called just “Launch” and doesn’t mention anything about financial years, and that the contents of the column are text and not integers, means that Copilot doesn’t know that the contents of the column are financial years.

However, opening the model in Tabular Editor and setting the Data Category property on the Launch column to “Years”:

…means the same prompt now returns a line chart rather than a column chart:

[Interestingly this doesn’t happen if you set the Data Category to “FiscalYears”]

Although I’ve seen the behaviour change over the last few months, I think it’s the case that setting the Data Category property to one of the geography-related values can influence whether data is displayed using a map visual or not in some cases.

Why Setting Sort By Column Is Important For Power BI Copilot

As a Power BI semantic model developer, one of the first things you will learn about is using the Sort By Column property to make sure the values in a column – for example days of the week or months of the year – are sorted correctly. Recently, though, I came across a situation where it was important to set it when you might not expect to have to in order to get good results from Copilot.

Consider the following simple semantic model:

It contains a single table containing sales values broken down by the financial year that products were launched in:

You might think that is isn’t necessary to set the Sort By Column property on the Launch column because, even though it is a text value, sorting it alphabetically gives you the sort order you would expect. Clicking on this column in the Data pane in Desktop to automatically create a visual gives a table with the financial years in the correct order (ie in ascending order by year):

But selecting the Sales Amount measure changes the visual used and, crucially, changes the sort order:

Note that, now, the bar chart is sorted in descending order by the Sales Amount measure, rather than by year.

This type of automatic behaviour is important because it’s something Copilot will rely on. So, for example, the prompt:

Show Sales Amount by product launch date

Returns the same bar chart as a response:

While a prompt to create a new report page does something similar:

Show Sales Amount by product launch date as a new page

Note how, on this report page, both the line chart and the column chart are ordered by Sales Amount again. This is almost certainly not what your end users want to see.

So how can you get Copilot to generate visuals with Launch on the x axis sorted in year order? The solution is to add a new column (in this case called Launch YYYY and hidden) and use it to set the Sort By Column property for the Launch column:

Having done this the two prompts above return visuals sorted by the Launch column and not by the Sales Amount measure. So:

Show Sales Amount by product launch date

…returns

…and the prompt:

Show Sales Amount by product launch date as a new page

…returns:

This is a very simple example – in the real world you should have a separate date dimension table and follow all the other best practices for data modelling. The point is, though, that setting the Sort By Column property on a column means that sort order will always be respected in visuals created by Copilot unless you explicitly override it in your prompt.

Chris Webb's BI Blog

Reading Delta Metadata In Power Query

Like this:

Improve Power Query Performance On CSV Files Containing Date Columns

Like this:

Limit The Impact Of Expensive Power BI Queries On Your Capacity By Reducing The Query Timeout

Like this:

Why DAX Is Better Than MDX For Bulk Extracts Of Data From Power BI

Like this:

TMDL View And Power BI Developer Productivity: An Example Using The Detail Rows Definition Property

Like this:

Changing The Source Lakehouse Of Power BI Direct Lake Models In Deployment Pipelines

Like this:

Text Analysis With Power BI Copilot

Like this:

20th Blog Birthday

Like this:

Power BI Copilot And The Data Category Property

Like this:

Why Setting Sort By Column Is Important For Power BI Copilot

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: