To demonstrate how to do this, I created a semantic model with two tables: one visible, called VisibleTable, and one hidden, calledHiddenTable.
I then published the semantic model, created a Data Pipeline and added a semantic model refresh activity; selected the connection, workspace and semantic model; waited for the Table(s) dropdown to populate (yes I know it’s slow, we’re working on it):
…and then, when it loaded, noted that only the visible table was shown in the dropdown:
I didn’t select anything and instead clicked “Add dynamic content” to use an expression to select the table instead:
Then in the Pipeline expression builder I entered the following:
@json('
[
{
"table":"HiddenTable"
}
]
')
Having done this I ran the Pipeline and just the hidden table was refreshed. Easy!
The expression needs to be a JSON array of table and partition names. Here’s an example showing how to refresh the table called HiddenTable and the sole partition of the table called VisibleTable (which also happens to be called VisibleTable) in the same refresh:
It’s useful to know how to construct the expression even if you don’t need to refresh hidden tables – for example, you might want to dynamically generate the list of tables or partitions to refresh with an expression.
I’m a big fan of Phil Seamark’s “visualise your refresh” pbix file, which allows you to analyse the different operations of an Import mode semantic model refresh operation (similar functionality has now been built into Semantic Link Labs now too). In particular I like how it allows you to visualise operations inside a refresh as a Gantt chart, which makes it easy to see how much parallelism you’re getting when you refresh – and as I discussed here, the amount of parallelism can have a massive effect on how long a refresh takes. However Gantt charts are just one way to visualise this data and after one of my colleagues showed me the KQL range() function I realised there was a different way to approach this problem using KQL and Fabric Workspace Monitoring.
Here’s the KQL query I came up with to find the number of partitions that are refreshing in parallel at any given point in time when a semantic model is refreshed:
SemanticModelLogs
//Filter by Timestamp for performance
| where Timestamp > ago(1h)
//Filter by the OperationId of the refresh
| where OperationId == "insert OperationId of refresh here"
//Only get the events fired when partition refresh finishes
| where OperationDetailName =="Process" and OperationName =="ProgressReportEnd"
//Find the start time of the partition refresh
| extend StartTime = datetime_add("Millisecond", -1 * DurationMs, Timestamp)
//Create a list of all the seconds where the refresh was in progress
| extend StartSecond = range(bin(StartTime,1s), bin(Timestamp,1s), 1s)
| mv-expand StartSecond
//Aggregate the data and get the number of partitions being refreshed in each one-second bin
| summarize PartitionsBeingRefreshed=count() by todatetime(StartSecond)
//Sort in ascending order by one-second bins
| order by StartSecond asc
What this query does is:
Filter the events in the SemanticModelLogs table to just those for the refresh whose OperationId you enter
Filter again so you only get the ProgressReportEnd events that are fired when a partition refresh finishes
Calculates the start time of the partition refresh by taking the finish time and subtracting the duration of the refresh
Creates a table with one row for each second that each partition refresh was in progress
Aggregates this table so you get the number of partition refresh operations that were in progress for each one second bin
All events associated with a single semantic model refresh can be identified by the value in the OperationId column in the SemanticModelLogs table in Workspace Monitoring. To find it, just run a KQL query on the SemanticModelLogs table to get all the events from the time period when your refresh took place; it will be easy to spot the events associated with the refresh by looking at the contents of the EventText column, so all you need to do is copy the value from the OperationId column from one of these events.
I refreshed a semantic model with six dimension tables and one fact table containing ten partitions and here’s what the query above showed when visualised as a line chart in a KQL Queryset:
As you can see, at first four partitions are refreshing in parallel; this quickly drops and then rises, and for the first half of the refresh six partitions are refreshing in parallel. At about the two-thirds point this drops to four partitions refreshing in parallel and as they complete the line chart drops to one. Since six partitions refreshing in parallel is the maximum I allowed for this refresh you can see it’s reasonably efficient.
To get more detail on what is being refreshed, here’s a variation of the KQL query above that doesn’t perform the final aggregation:
This data can then be used in a Power BI report and visualised as a stacked bar chart:
The advantage of this is that you can see which partitions are refreshing when: in this case you can see that it was the partitions from the dimension tables that refreshed first, followed by the partitions from the fact table.
If you’re not using Fabric Workspace Monitoring it should be easy to adapt the queries above to work with Power BI Log Analytics integration instead – it’s just a matter of changing a few column names.
There’s one problem I can think of with this approach: if there are multiple small partitions being refreshed that take under a second there’s a chance that you’ll see inflated parallelism numbers in the results at some points. I don’t think that’s a major problem though and overall I think this is quite a useful way to understand how much parallelism you’re getting during a refresh.
[Thanks to Matthew Farrow for the inspiration here – check out his excellent, detailed series of articles on LinkedIn starting here about understanding Fabric capacities and billing]
I’m sure you already know how to configure scheduled refresh for your semantic models in Power BI. While the options you have for controlling when refresh takes place are generally good enough – you can configure daily or weekly refreshes and set up to eight times a day for refreshes to take place – there are some scenarios it doesn’t work for, such as monthly refreshes. Up to now the workaround has been to use Power Automate to trigger refreshes (see here for an example) or to call the refresh API from another application. Now, with Fabric, you have a much better option for scheduling refreshes: Data Pipelines.
The semantic model refresh activity in Fabric Data Pipelines was released last year and at the time all the focus was on the extra control it gives you over what happens in a refresh: it allows you to refresh individual tables or partitions and control the amount of parallelism, for example; it also allows you to schedule your refresh after other ETL operations, which in Fabric will probably also be scheduled using Pipelines, have completed. What I want to draw your attention to is the fact that Fabric Data Pipelines use the new Fabric scheduler which offers more flexibility for controlling when they run.
There’s some documentation here on how to schedule a Data Pipeline run but it’s very straightforward to use. I created a Fabric Pipeline consisting of a single semantic model refresh activity like so:
..then hit the Schedule button on the toolbar, selected “Monthly” on the Repeat dropdown and configured it to run on the first Sunday of the month:
Apart from the option to run on the Nth instance of a given day of the week you can also run the Data Pipeline on a given day number of the month; you can also run every N months, add up to 10 times to run per day, and set a start and end date.
There are two other options for scheduling that aren’t available in the scheduler for semantic models: the ability to run the Data Pipeline every N hours or minutes.
Be warned though: refreshing your semantic model every few minutes is dangerous because it can result in excessive CU consumption on your capacity and maybe even throttling if you’re not careful.
The same options to run a Data Pipeline daily and weekly that exist in the scheduler for semantic models with one notable limitation: the semantic model scheduler allows you to specify up to 48 times to refresh every day for models stored on a Premium/Fabric capacity, whereas the Fabric scheduler used by Pipelines only allows you to specify 10 times per day.
Of course you need a capacity to be able to use Fabric Data Pipelines but orchestration activities only cost 0.0056CU hours per activity run, so using Pipelines to refresh a semantic model in this way will only use a tiny fraction of even the smallest capacity’s resources.
Even if you don’t think you’re interested in anything that Fabric offers beyond Power BI, it’s features like this that, in my opinion, still make it worthwhile to flip the switch to enable Fabric and make your life as a Power BI developer easier.
There’s a new M function rolling out now that allows you to read metadata from Delta tables (at the time of writing it’s available in Dataflows Gen2 and will be available soon in Desktop). It builds on the DeltaLake.Table M function that allows you to read data from Delta tables and is similar to the Parquet.Metadata function that was released last year. Here’s an example of how to use it to get metadata from a Delta table in OneLake:
let
Source = AzureStorage.DataLake(
"https://onelake.dfs.fabric.microsoft.com/insertworkspaceidhere/insertlakehouseidhere/Tables/inserttablenamehere",
[HierarchicalNavigation = true]
),
ToDelta = DeltaLake.Metadata(
DeltaLake.Table(Source)
)
in
ToDelta
The function returns a table of records containing the metadata from the Delta table such as the schema, how the table is partitioned, and whether the table is V-Ordered or not:
A few weeks ago I replied to a question on reddit where someone was experiencing extremely slow performance when importing data from a CSV file using Power Query. The original poster worked out the cause of the problem and the solution themselves: they saw that removing all date columns from their query made their Power Query query much faster and that using the Date.FromText function and specifying the date format solved the problem. While I couldn’t reproduce the extreme slowness that was reported I was able to reproduce a performance difference between the two approaches and Curt Hagenlocher of the Power Query team confirmed that this was expected behaviour.
Let’s see an example. I created a CSV file with five date columns and one million rows, then created a Power Query query to import this data into Power BI Desktop using the default M code generated by the Power Query Editor:
let
Source = Csv.Document(
File.Contents("C:\GenerateDates.csv"),
[
Delimiter = ",",
Columns = 5,
Encoding = 65001,
QuoteStyle = QuoteStyle.None
]
),
#"Promoted Headers" = Table.PromoteHeaders(
Source,
[PromoteAllScalars = true]
),
#"Changed Type" = Table.TransformColumnTypes(
#"Promoted Headers",
{
{"Extra Spaces", type date},
{"Extra Spaces - 2", type date},
{"Extra Spaces - 3", type date},
{"Extra Spaces - 4", type date},
{"Extra Spaces - 5", type date}
}
)
in
#"Changed Type"
The dates in the CSV file were in the following format:
02 Jan 1901
…and this is important: there are two spaces between the day and the month name and three spaces between the month name and the year.
Using SQL Server Profiler I found that this query took around 14 seconds to run.
I then created a second query that, instead of using Table.TransformColumnTypes to set the data type on the columns, used Date.FromText and the Format option:
This version of the query took around 10.5 seconds to run, so not a huge improvement but a noticeable one. It’s certainly not the 6/7x performance improvement seen on the reddit post but I’m sure different data, different date formats and different hardware might result in bigger differences.
I was told by Curt that when Power Query uses Table.TransformColumnTypes to parse date data from CSV files it tries a series of different date formats in order: first it tries ISO-8601 (for example 9th February 2025 would be “2025-02-09”), then a long date format, then a short date format, and finally it uses a generic .NET date parsing function which is slower than the others. It does this to make sure date parsing “just works” as often as possible. The dates in the example above, with the extra spaces, were deliberately designed to be slow for Table.TransformColumnTypes. When I tested on CSV files that contained dates in IS-8601 format I found that Table.TransformColumnTypes performed the same as Date.FromText.
So, to sum up, if you’re using CSV files containing date columns as a source for Power Query and you’re experiencing performance problems, try changing your M code to use Date.FromText instead of Table.TransformColumnTypes to set the data types on the date columns.
The recent announcement of Surge Protection gives Fabric/Power BI capacity admins a way to restrict the impact of background operations on a capacity, preventing them from causing throttling. However, at the time of writing, Surge Protection does not prevent users that are running expensive DAX or MDX queries – which are interactive operations – from causing problems on your capacity. Indeed, right now, there is no direct way to stop runaway queries from consuming a lot of CUs, although there is something you can do which will help a lot: reducing the query timeout.
Surge Protection doesn’t address the problem of expensive queries yet because Power BI only knows the CU usage of a DAX or MDX query when it has finished running – by which time it’s too late to do anything about it. In many cases, though, DAX or MDX queries that consume a lot of CUs are also slow. Therefore reducing the query timeout, which will kill any query that runs longer than a specified duration, will stop these queries from consuming so many CUs.
There are two default query timeouts that you should be aware of in Power BI. First, all DAX queries generated by a Power BI report have a 225 second timeout applied by the report itself. This timeout can be changed in Power BI Desktop but it cannot be changed on a published report in the Power BI Service. Second, you can set a timeout at the capacity level by changing the Query Timeout property in the admin portal. The default setting here is 3600 seconds (one hour). Unlike the first timeout, which only applies to the DAX queries generated by a Power BI report, this timeout applies to all queries run on any semantic model associated with the capacity, including the MDX queries generated by Excel PivotTables via Analyze In Excel. Setting this second timeout to less than 225 seconds means that it will take precedence over the first timeout. Therefore it’s the Query Timeout property on your capacity that you should set.
Hitting a timeout in a Power BI report will give the user a “Query has exceeded the available resources” error; clicking See Details/More Details will give you a message like this:
The XML for Analysis request timed out before it was completed. Timeout value: 10 sec.
Hitting the query timeout in an Excel PivotTable will give you the same message:
What value should you set the Query Timeout to? In my opinion no query should ever run for more than 30 seconds because anything slower will result in a poor experience for your end users – no-one wants to sit around for ages waiting for a report to render. I also think it should be possible to tune any semantic model so all queries run under 30 seconds if you know what you’re doing. That said, in the real world, setting a timeout of 30 seconds may be unrealistic: developers may not have the skills to tune their semantic models. As a result I find a timeout of 100 seconds is often a good compromise but you should experiment with different timeouts to see what the minimum value you can get away with is.
It’s important to note that reducing the query timeout will not stop every expensive query. This is because it’s perfectly possible to have very fast queries that consume a lot of CUs – for example when distinct count measures are used, and/or when there are very large data volumes and/or when there are complex but highly-optimised measures. Also there relatively rare cases where a query will carry on running beyond the duration specified by the timeout, because the Vertipaq engine only checks if the timeout has been exceeded at certain points in the code and depending on the query there could be several seconds (sometimes more) between these checks. Equally, some very slow queries may not use a lot of CUs and having them time out might cause unnecessary disruption. Overall, though, in my experience setting a timeout will stop enough expensive queries to make doing so worthwhile.
[Update: my colleague Akshai Mirchandani has just reminded me that you can also set the Query Timeout at the workspace level as a Server Property using SQL Server Management Studio, as detailed here. The property is called ServerTimeout. This gives you more flexibility than setting it for the whole capacity.]
This is a post I’ve avoided writing for many years, and before I carry on let me make one thing clear:
Doing bulk extracts of data from a Power BI semantic model is a **really** bad idea
My colleague Matthew Roche wrote a great post on this topic a couple of years ago that is still relevant: using Power BI (or Analysis Services) as a data source for other systems, including other Power BI Import mode semantic models, is an anti-pattern. Power BI is optimised for small, analytical queries that return the amount of data that can be visualised on a single page. It is not optimised for queries that return millions of rows. Running this kind of query on a Power BI semantic model will be slow, is likely to run into timeouts and memory errors, and is also likely to cause CU spikes – and perhaps throttling – on a Premium capacity. If you want the data from a semantic model it’s much better to go back to the original data sources that the semantic model uses.
But
People still use Power BI semantic models as data sources all the time. This is either because they don’t know any better, because they can’t get access to the underlying data sources, or because they want to get the result of any DAX calculations on the model.
So
If you do need to extract large amounts of data from a semantic model I have one important piece of advice: write a DAX query to get the data and not an MDX query. There are two reasons for this:
Writing a DAX query to get granular data is usually a lot simpler than writing an MDX query
DAX queries that return large amounts of data are typically faster (and so less likely to hit timeouts), more CPU efficient (and therefore less likely to cause throttling on a capacity) and more memory efficient (and so less likely to cause memory errors)
The bad news is that the two client tools most often used to bulk extract data from Power BI, Excel PivotTables and Power Query using the Analysis Services connector and its query builder, generate MDX queries. What’s more, they don’t always generate the most efficient MDX queries either.
Let’s see an example. I have a semantic model in a Premium workspace with a table called Property Transactions with around a million rows in it. I connected to the model via the XMLA Endpoint using the “From SQL Server Analysis Services Database (Import)” option in Power Query in Excel:
…and then created a query to get the data from all the columns on the Property Transactions table plus one measure, called Count of Sales, using Power Query’s query builder:
While the query builder generated the MDX for me, you can see that it was not a simple query:
I ran a Profiler trace while this query ran and from the Execution Metrics I saw that:
The query took 54 seconds to complete
CPU Time was also 54 seconds
The approximate peak memory usage of the query was 626292KB
I then created a second Power Query query that used the following DAX query to get the same data, which I think you’ll agree is much more straightforward:
EVALUATE
ADDCOLUMNS('Property Transactions', "Count of Sales", [Count of Sales])
[You have the option of entering a customer MDX or DAX query when you create your Power Query query]
This time, Execution Metrics showed me that:
The query took 6 seconds to complete
CPU Time was 6 seconds too
The approximate peak memory usage was 142493KB
So the DAX query was simple to write and maintain, took 11% of the time that the MDX query to run, used 11% of the CPU and 22% of the memory. That’s a big improvement. Even though I might be able to rewrite the MDX generated by Power Query to be more efficient there’s no way it would be as simple or as efficient as the DAX query.
[Thanks to Akshai Mirchandani for the information in this post]
For me the biggest new feature in the January 2025 release of Power BI Desktop is the new TMDL View; many other people like Marco are excited about it too. For more advanced Power BI developers (and honestly, I don’t think you need to be that advanced to get value out of it) it makes certain editing tasks for semantic models much simpler, and while I won’t be abandoning the main Power BI Desktop UI completely or stopping using external tools like Tabular Editor it is something I see myself using on a regular basis from now on.
One of the things it allows is the editing of semantic model properties and features that are not exposed by the Power BI Desktop UI but which are nonetheless supported by the engine, and which up to now you’d have had to use Tabular Editor to set. The announcement blog post mentions a few of these – perspectives (useful for the Personalize Visual feature for example) and the isAvailableInMdx property – but my favourite underused property is the Detail Rows Definition property of a measure, also known as Detail Rows Expression. If you have end users querying your model using Analyze In Excel it allows you to customise the results returned by an Excel PivotTable’s Show Details feature; as you might expect Marco and Alberto have an excellent, detailed article on it here. Setting this property allows you to control which columns and rows (so the correct rows are shown for non-trivial measures, as I described here) are returned, and if you can educate your users to use Show Details it can perform a lot better than a gigantic PivotTable to show detail-level data from your model.
What does the workflow for setting this property on a model in Power BI Desktop look like now? What benefits do TMDL View and all the other recent pro developer enhancements in Desktop bring for someone like me? Let’s say I have a measure called Count Of Sales in my semantic model and that I want to customise the columns and their names that are returned by Show Details for this measure. The Detail Rows Definition property takes a DAX expression that returns a table so the first step is to write that expression; now that we have DAX Query View I can do that without leaving Power BI Desktop and because I’m lucky enough to have access to Power BI Copilot (one of the perks of working for Microsoft) I can use that to write my DAX expression easily. I gave Copilot the prompt:
Write a query that gives me the Date, County, Town and Postcode columns columns from the Property Transactions table along with the Count of Sales measure. Rename the Date column to be "Sales Date".
…and it immediately gave me the DAX query I wanted without needing to faff around looking up the syntax to the SELECTCOLUMNS() function:
EVALUATE
SELECTCOLUMNS(
'Property Transactions',
"Sales Date", 'Property Transactions'[Date], // Renaming Date column to Sales Date
"County", 'Property Transactions'[County],
"Town", 'Property Transactions'[Town],
"Postcode", 'Property Transactions'[Postcode],
"Count of Sales", [Count Of Sales] // Including the Count of Sales measure
)
Next, I copied the DAX query minus the EVALUATE statement, switched over to the TMDL View pane, dragged and dropped the Count of Sales measure into a Script pane:
And then, underneath the existing measure definition, started typing detailRowsDefinition – the intellisense picked up what I was typing before I even had to finish:
I then tried to paste the DAX query into TMDL View and realised it didn’t like line breaks. Rather than trying to look up how to do this with the editor – which I’m sure is possible – I just switched back to DAX Query View, highlighted the query, entered the prompt:
format this query so there are no line breaks
…and it did the job for me! I copied the DAX table expression again and pasted it into TMDL View after the detailRowsDefinition property:
[I was pleasantly surprised that I didn’t have to escape any characters or mess around with double quotes]
Next I hit the Apply button and tested Analyze in Excel with a PivotTable and Show Details:
And bingo, I got the results I wanted:
Was all of this possible before? Absolutely. Is it much quicker and easier now with TMDL View, DAX Query View and Copilot? Absolutely. I’m a pretty experienced Power BI developer and I could certainly have written the DAX expression without Copilot, I have DAX Studio installed (which has pretty good query builder too) to write and test the query, and I have Tabular Editor to set the property. But being able to do all this just using Power BI Desktop makes me so much more productive.
If you’re using deployment pipelines with Direct Lake semantic models in Power BI you’ll have found that when you deploy your model from one stage to another by default the model still points to the Lakehouse it was originally bound to. So, for example, if you deploy your model from your Development stage to your test stage, the model in the Test stage still points to the Lakehouse in the Development stage. The good news is that you can use the deployment rules feature of deployment pipelines to make sure the model in the Test stage points to a Lakehouse in the Test stage and in this post I’ll show you how.
To illustrate how to do this I created a workspace to represent a dev environment with a Lakehouse containing one table, called MyTable, containing the following data:
I then created a second workspace to represent a test environment with a Lakehouse containing a table with the same name and the same schema but with different data:
Then, back in the dev workspace I created a custom semantic model pointing to the mydata table and built a report on it to show the data:
Here’s what the dev workspace looked like:
I then created a deployment pipeline, assigned the two workspaces to the Development and Test stages of it, and configured it so that the semantic model and report would be deployed from Development to Test:
I then did a deployment and, as mentioned, although the report and the semantic model were moved to the Test stage workspace, the model still pointed to the Lakehouse in the Development stage workspace so the report showed the same data as before:
To fix this I created a deployment rule using the “Data source rules” option. This is where things get tricky. When you create the rule you can see that the custom model on the From side has a Database and a Server property to know which Lakehouse it is pointing to:
This means that on the To side you need to choose the “Other” option in the dropdown box and enter values for the Database and Server:
What do these properties mean?
The Database property is the ID of the SQL Endpoint of the Lakehouse the model is pointing to and you can find it by opening the SQL Endpoint UI of the Lakehouse in the browser and copying it from the part of the URL that immediately follows “lakehouses”:
The Server property is the SQL connection string of the SQL Endpoint, which can be found by clicking the Gear icon on the menu in the SQL Endpoint UI to open the Settings pane:
I changed the settings in the data source rule to the values from the Lakehouse in the Test workspace and then redeployed. After doing that the report in my Test workspace showed the data from the Lakehouse in the Test workspace (note: I had to manually refresh the report first to do this to flush out stale data from the cache):
Power BI reports are usually used to display numeric data. However it is reasonably common to have text data in a semantic model too and for this to be displayed on a report, and while visuals like the new text slicer allow you apply filters to the text, large amounts of text can still be difficult for end users to make sense of. Recently I was wondering if Power BI Copilot can be used to summarise text data (similar to what is possible with Excel Copilot) and it turns out that the answer is yes – with some limitations.
To test this I created a simple semantic model consisting of a single table containing comments from diners at various restaurants:
I then created a report with (1) a slicer on the Restaurant field, (2) a table visual to display the contents of the Feedback field for the selected restaurant, and (3) a Smart Narrative visual:
For the Smart Narrative visual I used the preview Copilot option, restricted it to look at just the table visual, and used the following prompt:
Under a title, in bold, of "Feedback Summary" provide a brief summary of the comments in the Feedback column in no more than two bullet points. Include some quotes from the Feedback column but do not just repeat what is there.
And guess what, it works:
[Notice that when you change the slicer selection you have to click the Refresh button on the Smart Narrative visual to get it to update]
BUT before you get too excited, there are a few things you need to understand:
At the time of writing the flavour of Power BI Copilot used by the Smart Narrative visual can only summarise data that is visible on the report page – if it isn’t visible, even if it is present in the semantic model, then it can’t be summarised.
It can only consider the first 30000 rows from any given table visual, and if it is pointed at multiple visuals then it can only consider 150000 rows in total.
For each table row, it can only consider the first 100 characters of text – anything after that will be truncated. This will be increased to 500 characters of text soon but even then this could be a big limitation (for example the text “If you want good food you have to pay. That said, it’s very good value for a Michelin-starred restaurant.” is 105 characters long). You could work around it by splitting the text up into multiple rows where the length is less than the limit, but I don’t know what impact this would have on the quality of the results.
For most scenarios cost would not be a worry (see here for my post on the cost of Copilot; it’s the number of words, not the number of characters, that is significant), but even with the character limits if you had a lot of users trying to analyse a table displaying thousands rows of text values that is filtered/sliced regularly then it could get expensive.
As a result of this you should only consider Power BI Copilot for summarising relatively short pieces of text; you should not store long documents in your semantic model and expect Power BI Copilot to be able to summarise them. If you want to show summaries of longer pieces of text your best bet is to create them upstream before the data reaches Power BI.