Two New Power BI/Power Query Books

It’s time for another one of my occasional posts about free books that I’ve been given that you might be interested to check out. The usual disclaimers apply: these aren’t impartial, detailed reviews and the links contain an Amazon UK affiliate code so I get a kickback if you buy a copy.

Power Query Beyond The User Interface, by Chandeep Chhabra

Chandeep is of course famous on YouTube from the Goodly Power BI channel; I’ve met him at conferences and he’s just as friendly and passionate in real life. That passion shows through in this book. It’s aimed at people who already know Power Query and want to learn M and while it does a great job of that, I think anyone who writes M on a regular basis would also benefit from reading it. It’s packed with practical examples, well-written, everything is clearly explained and it covers more recent additions to the language that older books might not talk about. It’s focused on the M language and doesn’t cover topics like performance tuning but I think that focus is a good thing. Highly recommended for anyone serious about Power Query.

Architecting Power BI Solutions In Microsoft Fabric, by Nagaraj Venkatesan

It looks like the Packt marketing machine has gone into overdrive for this title because I see both Greg Lowe and Sandeep Pawar have already published their reviews, and I agree with their sentiments. Power BI (and even more so Fabric) is complicated and so there’s a huge demand for guidance around what all of the components do and how to put them together to create a solution. The team I work on at Microsoft, the Fabric CAT team, has a published guidance documentation here and other people have written books, blog posts and white papers addressing the same problem. This book is certainly a very useful addition to the existing literature. It covers newer topics like Power BI Copilot and some topics that are rarely if ever mentioned elsewhere, such as Power BI’s integration with Purview. As the other reviewers have mentioned, books like this always suffer from changes to the product making them out of date very quickly but that’s unavoidable. Also, being written by a Microsoft employee (and this is something I can relate to), it’s not very opinionated and doesn’t tell you which features of the product are good and which ones should be avoided. All in all, pretty good though.

Memory Overhead Of Distinct Count Measures In Power BI

The series of blog posts I wrote last year on semantic model memory usage, in particular this post on the query memory limit and the “This query uses more memory than the configured limit” error in Power BI, gets a lot of traffic. Since writing that post on the query memory limit I’ve written a few follow-ups on common mistakes that lead to increased query memory usage, such as this one on measures that never return a blank. Today’s post is sort of in that series but it isn’t about a design mistake – it’s just to point out that distinct count measures can be surprisingly memory-hungry.

To illustrate this I built a semantic model consisting of a single table with two columns and 99,999,000 rows, published it and ensured the Large Semantic Model format was enabled:

I created two measures:

Number Of Rows = COUNTROWS('MyTable')
Distinct Customers = DISTINCTCOUNT(MyTable[CustomerID])

Here’s what the model metrics looked like in DAX Studio:

The total model size in memory was 255MB.

I then ran the following DAX query to get the number of rows in the table for each of the 1800 dates in the Date column:

EVALUATE
SUMMARIZECOLUMNS(
    MyTable[Date],
    "Number Of Rows", [Number Of Rows]
)

The approximatePeakMemConsumptionKB from the Execution Metrics Profiler event showed that this query consumed 2109KB – so not much.

Then I ran a similar query to get the number of distinct customer IDs by date:

EVALUATE
SUMMARIZECOLUMNS(
    MyTable[Date],
    "Distinct Customers", [Distinct Customers]
)

The approximatePeakMemConsumptionKB metric for this query was 800325KB – so a lot more than the previous query. In fact even though this model was well under the 1GB size limit for a model not in Premium capacity, the query here used a lot more memory (782MB) than the size of the model itself in memory and it came close to the 1GB limit on the amount of memory a query can consume when the model is not in Premium capacity.

Is there something wrong here? Can the query or model be tuned to reduce memory usage? Not really, no – distinct count queries are almost always more memory intensive than other types of measures. I tested a number of different things such as forcing the use of hash encoding on the CustomerID column, partitioning (Phil Seamark suggested creating one partition for each of the 1800 dates and actually that did reduce memory consumption but it also made the queries extremely slow), changing the ordering of the source data to change how well each column was compressed, calculating the distinct count using the SUMX method, and nothing resulted in lower query memory usage.

What I did find for the model above was that the number of rows returned by the query influenced the memory consumption of the query. So reducing the number of dates returned on rows in my DAX query from 1800 to 366 resulted in approximatePeakMemConsumptionKB going down to 200278KB. So if you’re running into memory errors when running queries with distinct count measures the first thing you should ask yourself is whether you need to show so many distinct counts: I recently ran into this problem with a customer that wanted to plot a line chart of distinct values with dates on the x axis, and we solved the problem by only plotting one day per week for the time period shown on the chart instead of every date. The chart looked almost identical, the DAX query was a lot faster and the memory usage of the DAX query was a lot lower. Distinct count measures combined with table visuals with lots of rows can be dangerous.

The other thing you can do is see if you can remodel your data to turn a distinct count into a count because, as shown above, counts are a lot faster and memory efficient than distinct counts. For example, if you have a fact table containing line items for orders and you need to find the distinct count of order ids, then consider creating a second fact table at the order granularity so you can count the number of rows in it to find the number of distinct orders. This may increase the size of your model but it should certainly reduce your query memory consumption for many queries because you won’t need to do a distinct count.

Current Status Of Snowflake Query Tags In Power BI

Since the November 2024 Power BI release blog post announced that queries sent to Snowflake by Power BI include a query tag I’ve had a lot of questions from people who couldn’t see this happening or wanted to know what the query tags contained, so in this blog I thought I would outline the current status.

The query tagging feature for the Power BI Snowflake connector actually didn’t get released in November 2024 and even now, in April 2025, it’s only available for DirectQuery connections and Import mode refreshes that use the V1.0 connector (the V2.0 connector will support query tags soon). Here’s an example of what a query tag looks like for a SQL query generated by Power BI from a DirectQuery semantic model:

{"PowerQuery":true,"Host":"PBI_SemanticModel_MWC","HostContext":"PowerBIPremium-DirectQuery"}

And here’s what a query tag for a SQL query generated for a semantic model refresh:

{"PowerQuery":true,"Host":"PBI_SemanticModel_MWC","HostContext":"PowerBIPremium-Import"}

At the time of writing only SQL queries sent from the Power BI Service contain query tags, not those sent from Power BI Desktop. Also there is no way to customise the contents and unlike SQL queries sent to SQL Server-related sources there is no information on the report or visual that generated the SQL query. In the future some of these limitations may go away.

Fabric Data Agents: Unlocking The Full Power Of DAX For Data Analysis

Now that Fabric Data Agents (what used to be called AI Skills) can use Power BI semantic models as a data source I’ve been spending some time playing around with them, and while I was doing that I realised something – maybe something obvious, but I think still worth writing about. It’s that there are a lot of amazing things you can do in DAX that rarely get done because of the constraints of exposing semantic models through a Power BI report, and because Data Agents generate DAX queries they unlock that hitherto untapped potential for the first time. Up until now I’ve assumed that natural language querying of data in Power BI was something only relatively low-skilled end users (the kind of people who can’t build their own Power BI reports and who struggle with Excel PivotTables) would benefit from; now I think it’s something that will also benefit highly-skilled Power BI data analysts as well. That’s a somewhat vague statement, I know, so let me explain what I mean with an example.

Consider the following semantic model:

There are two dimension tables, Customer and Product, and a fact table called Sales with one measure defined as follows:

Count Of Sales = COUNTROWS('Sales')

There’s one row in the fact table for each sale of a Product to a Customer. Here’s all the data dumped to a table:

So, very simple indeed. Even so there are some common questions that an analyst might want to ask about this data that aren’t easy to answer without some extra measures or modelling – and if you don’t have the skills or time to do this, you’re in trouble. One example is basket analysis type questions like this: which customers bought Apples and also bought Lemons? You can’t easily answer this question with the model as it is in a Power BI report; what you’d need to do is create a disconnected copy of the Product dimension table so that a user can select Apples on the original Product dimension table and select Lemons on this new dimension, and then you’d need to write some DAX to find the customers who bought Apples and Lemons. All very doable but, like I said, needing changes to the model and strong DAX skills.

I published my semantic model to the Service and created a Data Agent that used that model as a source. I added two instructions to the Data Agent:

Always show results as a table, never as bullet points
You can tell customers have bought a product when the Count of Sales measure is greater than 0

The first instruction I added because I got irritated by the way Data Agent shows the results with bullet points rather than as a table. The second probably wasn’t necessary because in most cases Data Agent knew that the Sales table represented a sale of a Product to a Customer, but I added it after one incorrect response just to make that completely clear.

I then asked the Data Agent the following question:

Show me customers who bought apples and who also bought lemons

And I got the correct response:

In this case it solved the problem in two steps, writing a DAX query to get the customers who bought lemons and writing another DAX query to get the customers who bought apples and finding the intersection itself:

At other times I’ve seen it solve the problem more elegantly in a single query and finding the customers who bought apples and lemons using the DAX Intersect() function.

I then asked a similar question:

For customers who bought apples, which other products did they buy?

And again, I got the correct answer:

In this case it ran five separate DAX queries, one for each customer, which I’m not thrilled about but again at other times it solved the problem in a single DAX query more elegantly.

Next I tried to do some ABC analysis:

Group customers into two categories: one that contains all the customers with just one sale, and one that contains all the customers with more than one sale. Show the total count of sales for both categories but do not show individual customer names.

And again I got the correct answer:

I could go on but this post is long enough already. I did get incorrect answers for some prompts and also there were some cases where the Data Agent asked for more details or a simpler question – but that’s what you’d expect. I was pleasantly surprised at how well it worked, especially since I don’t have any previous experience with using AI for data analysis, crafting prompts or anything like that. No complex configuration was required and I didn’t supply any example DAX queries (in fact Data Agents don’t allow you to provide example queries for semantic models yet) or anything like that. What does this all mean though?

I’m not going to argue that your average end user is going to start doing advanced data analysis with semantic models using Data Agents. The results were impressive and while I think Data Agents (and Copilot for that matter) do a pretty good job with simpler problems, I wouldn’t want anyone to blindly trust the results for more advanced problems like these. However if you’re a data analyst who is already competent with DAX and is aware that they always need to verify the results they get from Data Agent, I think this kind of DAX vibe-coding has a lot of value. Imagine you’re a data analyst and you’re asked that question about which products customers who bought apples also bought. You could search the web, probably find this article by the Italians, get scared, spend a few hours digesting it, create a new semantic model with all the extra tables and measures you need, and then finally get the answer you want. Maybe you could try to write a DAX query from scratch that you can run in DAX Studio or DAX Query View, but that requires more skill because no-one blogs about solving problems like this by writing DAX queries. Or you could ask a Data Agent, check the DAX query it spits out to make sure it does what you want, and get your answer much, much faster and easier. I know which option I’d choose.

To finish, let me answer a few likely questions:

Why are you doing this with Fabric Data Agents and not Power BI Copilot?

At the time of writing Data Agents, the Power BI Copilot that you access via the side pane in a report and Power BI Copilot in DAX Query View all have slightly different capabilities. Power BI Copilot in the side pane (what most people think of as Power BI Copilot) couldn’t answer any of these questions when I asked them but I didn’t expect it to because even though it can now create calculations it can still only answer questions that can be answered as a Power BI visual. Copilot in DAX Query View is actually very closely related to the Data Agent’s natural language-to-DAX functionality (in fact at the moment it can see and use more model metadata than Data Agent) and unsurprisingly it did a lot better but the results were still not as good as Data Agent. Expect these differences to go away over time and everything I say here about Data Agents to be equally applicable to Power BI Copilot.

This isn’t anything new or exciting – I see people posting about using AI for data analysis all the time on LinkedIn, Twitter etc. What’s different?

Fair point. I see this type of content all the time too (for example in the Microsoft data community Brian Julius and my colleague Mim always have interesting things to say on this subject) and I was excited to read the recent announcement about Analyst agent in M365 Copilot. But typically people are talking about taking raw data and analysing it in Python or generating SQL queries. What if your data is already in Power BI? If so then DAX is the natural way of analysing it. More importantly there are many advantages to using AI to analyse data via a semantic model: all the joins are predefined, there’s a lot of other rich metadata to improve results, plus all those handy DAX calculations (and one day DAX UDFs) that you’ve defined. You’re much more likely to get reliable results when using AI on top of a semantic model compared to something that generates Python or SQL because a lot more of the hard work has been done in advance.

Is this going to replace Power BI reports?

No, I don’t think this kind of conversational BI is going to replace Power BI reports, paginated reports, Analyze in Excel or any of the other existing ways of interacting with data in Power BI. I think it will be a new way of analysing data in Power BI. And to restate the point I’ve been trying to make in this post: conversational BI will not only empower low-skilled end users, it will also empower data analysts, who may not feel they are true “data scientists” but who do have strong Power BI and DAX skills, to solve more advanced problems like basket analysis or ABC analysis much more easily.

Using Excel Copilot To Import Data With Power Query

Although it was announced in this blog post on the Microsoft 365 Insider blog recently, you might have missed the news that Excel Copilot can now generate Power Query queries. There are limitations for now: it can only be used to connect to other Excel files stored in OneDrive or SharePoint and it can’t do any transformations in the queries it creates, but it’s still exciting news nonetheless. Well the kind of news I get excited by at least.

Since the announcement blog post didn’t give many details of how it works let’s see an example of it in action. Let’s say you have an Excel workbook called SprocketsWidgetsSales.xlsx that contains a table of data showing sales of sprockets and widgets – the products your company sells – by country:

Now let’s say you create a new, blank workbook and open the Copilot pane. Entering the prompt:

Search for data on sales of sprockets and widgets

…gives you the data from the first workbook in the response:

At the bottom you can see a citation reference pointing to the workbook containing the source data and clicking that reference opens that workbook in Excel Online, but we don’t want to do that, we want to load the data into the current workbook using Power Query. Clicking on “Show tables to import” shows a preview of all the Excel tables (in this case there’s only one) in the workbook:

Expanding “Show import query” shows the M code for the Power Query query it can generate:

And clicking “Import to new sheet” creates that Power Query query and runs it:

You can see the Power Query query it creates in the Queries & Connections pane and edit it in the Power Query Editor like any other query:

Here’s the output of the query in a table on a new worksheet:

Of course now you have the table of data on your worksheet you can do other things like:

chart this data by country and product

…or ask questions like:

which country had the lowest sales of sprockets?

…and other things that you’d expect Copilot to be able to do. But the key thing is that Copilot is can now generate Power Query queries! I’m looking forward to see how this feature improves in the future.

Speed Up Dataflow Publishing/ Validation Times In Power BI And Fabric

If you’re working with slow data sources in Power BI/Fabric dataflows then you’re probably aware that validation (for Gen1 dataflows) or publishing (for Gen2 dataflows) them can sometimes take a long time. If you’re working with very slow data sources then you may run into the 10 minute timeout on validation/publishing that is documented here. For a Gen1 dataflow you’ll see the following error message if you try to save your dataflow and validation takes more than 10 minutes:

Failed to analyze issues in the query

For a Gen2 Dataflow, where you can save the Dataflow and publishing takes place in the background, you’ll see the following error in your workspace:

Dataflow publish failed

Apart from tuning your data source and tuning your queries, what can you do about this? Well one of the things that happens when you publish a dataflow is that it works out the columns returned, and the data types of those columns, for all of the queries in the dataflow. It does this by trying to run the queries until they return data by applying a top 0 row filter to them; if you can make that faster then validation/publishing will be faster. Obviously query folding is important here because that top 0 filter should fold, as are more obscure, source-specific settings like this one for ODBC sources. However, there is another trick that you can use if you are happy writing some moderately complicated M code – the trick I blogged about here for making Power Query in Power BI Desktop faster.

Let’s see an example with Dataflows Gen2. Conside the following M code which returns a table with three columns and is deliberately written to take 11 minutes and 1 second to return (see this post for more details on how to create artificially slow Power Query queries).

let
  Source = Function.InvokeAfter(
    () => 
    #table(
      type table
      [
        #"Number Column"=number,
        #"Text Column"=text,
        #"Date Column"=date
      ],
      {
        {1,"Hello",#date(2016,1,1)},
        {2,"World",#date(2017,12,12)}
      }
      )
    
    , 
    #duration(0, 0, 11, 1)
  )
in
  Source

As you would expect, trying to publish a Gen1 or Gen2 dataflow that uses this query will fail because it takes more than 10 minutes before it returns any rows. However in this case – as in most cases – you know what columns the query returns so it’s possible to use the Table.View M function to intercept the zero-row filter applied during validation/publishing and return a table with no rows in and the columns that the query above returns. You can do this by adding two extra steps in the M code like so:

let
  Source = Function.InvokeAfter(
    () => 
    #table(
      type table
      [
        #"Number Column"=number,
        #"Text Column"=text,
        #"Date Column"=date
      ],
      {
        {1,"Hello",#date(2016,1,1)},
        {2,"World",#date(2017,12,12)}
      }
      )
    
    , 
    #duration(0, 0, 11, 1)
  ),
  TableTypeToReturn = 
    type table
      [
        #"Number Column"=number,
        #"Text Column"=text,
        #"Date Column"=date
      ],
  OverrideZeroRowFilter = Table.View(
    null, 
    [
    GetType = () => 
      TableTypeToReturn, 
    GetRows = () => 
      Source, 
    OnTake = (count as number) => 
      if count = 0 then 
      #table(
        TableTypeToReturn, 
        {}
      ) 
    else 
     Table.FirstN(Source, count)]
  )
in
  OverrideZeroRowFilter

The first step added here, called TableTypeToReturn, defines the columns and data types of the table returned by the query; if you use this technique yourself, you will need to alter it so it returns the columns and data types of your query. You can read more about #table and table types here and I have a function that will automatically generate this code from an existing query for you here. The second step, called OverrideZeroRowFilter, looks for situations where a Top N filter is being applied and if N=0 returns a table of the type defined in the previous step with zero rows. For a more detailed explanation see that original blog post.

This new version of the query validates/publishes immediately, although it still takes 11 minutes and 1 second to refresh. Of course if you use this technique and then change your query so that different columns or data types are returned you have to update the extra code every time, which can be fiddly, but if you’re running into a timeout then you don’t have any choice and even if validation/publishing is slow it’s probably worth the extra effort.

Using Excel Solver With Power Pivot

After years of meaning to learn how to use Excel Solver, this week I’ve finally made a start: I want to use it to work out the optimal way of distributing workspaces across capacities in a Fabric tenant from the point of view of CU usage. I’m a long way from knowing how to do this properly (I’ll blog about it when I’m ready) but one of the first things I found is that while there are lots of resources on the internet showing how to use Solver, there are no examples of how to use Solver when your source data is stored in the Excel Data Model, aka Power Pivot. Getting that data onto the worksheet is fairly straightforward, but what if you also need Solver to change how that data is sliced and diced? It turns out that not hard to do if you know how to use cube functions.

To work out how to do this, I loaded the following table into the Excel Data Model:

I then created a simple measure called Total Profit that sums the values in the Profit column:

Total Profit:=SUM(Sales[Profit])

I then created a worksheet that looked like this:

Here are the cell formulas:

…and here’s a brief explanation of what’s happening here:

Cell C2 contains the name of the connection that all the cube functions use, which is always “ThisWorkbookDataModel”
The range B5:B7 contains CubeSet functions that return the sets of all countries, products and years. For example the formula in B5 is
CUBESET($C$2, “[Sales].[Country].[Country].MEMBERS”, “Set of Countries”)
The range C5:C7 contains CubeSetCount functions that return the number of members in the sets in the range B5:B7. For example the formula in C5 is
CUBESETCOUNT(B5)
The range C10:C12 contains integers that represent the ordinal of one of the members in each set. For example there are three members in the set of all countries, France, Germany and UK, and the integer 3 represents the third item in that set, ie the UK.
The range D10:D12 contains CubeRankedMember functions that take these ordinals and returns the members in each set at the given ordinal. For example the formula in D10 is
CUBERANKEDMEMBER($C$2, B5, C10)
Cell C14 contains a single CubeValue function that returns the value for the Total Profit measure for the country, product and year returned in D10:D12. It is
IFERROR(CUBEVALUE($C$2, “[Measures].[Total Profit]”, $D$10, $D$11, $D$12),””)

The problem to solve is this: what combination of country, product and year returns the maximum value for Total Profit? This is a pretty stupid problem to use Solver for: you can see from the source data that obviously it’s Germany, Pears and 2024, and even if you couldn’t see it from looking at the data you could write some DAX to do this quite easily. The formulas and techniques shown here, though, will be applicable to other real-life scenarios.

Here’s how I configured Solver:

The objective is to maximise the value in cell C14, the cell containing the value of the Total Profit measure for the selected country, product and year.
The values to change are the values in C10:C12, the cells that contain the ordinals of the selected country, product and year. There are three types of constraints on the allowed inputs in these cells
- These cells must contain integers
- The minimum allowed value is 1
- The maximum allowed value is the value in the related cell in C5:C7 which tells you the number of items in the set of countries, products and years
The solving method is Evolutionary

After clicking Solve and waiting a bit, I got the right answer back, 18:

One more time before someone leaves a comment: the point of this post is not to show how to use Solver to solve a realistic problem, the point is to show how you can use Solver to change integer values in cells which in turn change how values from the Excel Data Model/Power Pivot are sliced when using Excel cube functions.

Also, if you’re using cube functions with Solver in this way, you’re probably going to need to use cube functions with dynamic arrays in the way I showed in this series to make them more PivotTable-like; once you’ve got the data onto the worksheet you’ll be able to use Solver in the normal way.

Finally – if this works for Power Pivot, wouldn’t it also work with cube functions connected to a published Power BI semantic model, Azure Analysis Services or on-prem Analysis Services? Technically yes, but I would be very worried about the number of MDX queries generated when Solver runs. If you’re using Power Pivot the worst thing that could happen is that you overload your PC; if you send too many queries to Power BI, AAS or SSAS you could cause problems for other users.

Refreshing Hidden Tables In Power BI Semantic Models With Fabric Data Pipelines

Following on from my recent post about refreshing semantic models with Fabric Data Pipelines and the semantic model refresh activity, a few people asked me how to refresh hidden tables because they are not displayed in the Pipeline configuration UI. I got the answer from my colleague Alex Powers (aka reddit celebrity u/itsnotaboutthecell) who kindly allowed me to blog about it.

To demonstrate how to do this, I created a semantic model with two tables: one visible, called VisibleTable, and one hidden, calledHiddenTable.

I then published the semantic model, created a Data Pipeline and added a semantic model refresh activity; selected the connection, workspace and semantic model; waited for the Table(s) dropdown to populate (yes I know it’s slow, we’re working on it):

…and then, when it loaded, noted that only the visible table was shown in the dropdown:

I didn’t select anything and instead clicked “Add dynamic content” to use an expression to select the table instead:

Then in the Pipeline expression builder I entered the following:

@json('
    [
        {
            "table":"HiddenTable"
        }
    ]
')

Having done this I ran the Pipeline and just the hidden table was refreshed. Easy!

The expression needs to be a JSON array of table and partition names. Here’s an example showing how to refresh the table called HiddenTable and the sole partition of the table called VisibleTable (which also happens to be called VisibleTable) in the same refresh:

@json('
    [
        {
            "table":"HiddenTable"
        },
        {
            "table": "VisibleTable",
            "partition": "VisibleTable"
        }
    ]
')

It’s useful to know how to construct the expression even if you don’t need to refresh hidden tables – for example, you might want to dynamically generate the list of tables or partitions to refresh with an expression.

Another Way To Visualise Import Mode Semantic Model Partition Refresh Parallelism

I’m a big fan of Phil Seamark’s “visualise your refresh” pbix file, which allows you to analyse the different operations of an Import mode semantic model refresh operation (similar functionality has now been built into Semantic Link Labs now too). In particular I like how it allows you to visualise operations inside a refresh as a Gantt chart, which makes it easy to see how much parallelism you’re getting when you refresh – and as I discussed here, the amount of parallelism can have a massive effect on how long a refresh takes. However Gantt charts are just one way to visualise this data and after one of my colleagues showed me the KQL range() function I realised there was a different way to approach this problem using KQL and Fabric Workspace Monitoring.

Here’s the KQL query I came up with to find the number of partitions that are refreshing in parallel at any given point in time when a semantic model is refreshed:

SemanticModelLogs
//Filter by Timestamp for performance
| where Timestamp > ago(1h)
//Filter by the OperationId of the refresh
| where OperationId == "insert OperationId of refresh here"
//Only get the events fired when partition refresh finishes
| where OperationDetailName =="Process" and OperationName =="ProgressReportEnd"
//Find the start time of the partition refresh
| extend StartTime = datetime_add("Millisecond", -1 * DurationMs, Timestamp)
//Create a list of all the seconds where the refresh was in progress
| extend StartSecond = range(bin(StartTime,1s), bin(Timestamp,1s), 1s)
| mv-expand StartSecond
//Aggregate the data and get the number of partitions being refreshed in each one-second bin
| summarize PartitionsBeingRefreshed=count() by todatetime(StartSecond)
//Sort in ascending order by one-second bins
| order by StartSecond asc

What this query does is:

Filter the events in the SemanticModelLogs table to just those for the refresh whose OperationId you enter
Filter again so you only get the ProgressReportEnd events that are fired when a partition refresh finishes
Calculates the start time of the partition refresh by taking the finish time and subtracting the duration of the refresh
Creates a table with one row for each second that each partition refresh was in progress
Aggregates this table so you get the number of partition refresh operations that were in progress for each one second bin

All events associated with a single semantic model refresh can be identified by the value in the OperationId column in the SemanticModelLogs table in Workspace Monitoring. To find it, just run a KQL query on the SemanticModelLogs table to get all the events from the time period when your refresh took place; it will be easy to spot the events associated with the refresh by looking at the contents of the EventText column, so all you need to do is copy the value from the OperationId column from one of these events.

I refreshed a semantic model with six dimension tables and one fact table containing ten partitions and here’s what the query above showed when visualised as a line chart in a KQL Queryset:

As you can see, at first four partitions are refreshing in parallel; this quickly drops and then rises, and for the first half of the refresh six partitions are refreshing in parallel. At about the two-thirds point this drops to four partitions refreshing in parallel and as they complete the line chart drops to one. Since six partitions refreshing in parallel is the maximum I allowed for this refresh you can see it’s reasonably efficient.

To get more detail on what is being refreshed, here’s a variation of the KQL query above that doesn’t perform the final aggregation:

SemanticModelLogs
| where Timestamp > ago(1h)
| where OperationId == "54d42645-9672-409a-844c-42403526b596"
| where OperationDetailName =="Process" and OperationName =="ProgressReportEnd"
| extend StartTime = datetime_add("Millisecond", -1 * DurationMs, Timestamp)
| extend StartSecond = range(bin(StartTime,1s), bin(Timestamp,1s), 1s)
| mv-expand StartSecond
| project StartSecond = todatetime(StartSecond), Partition = replace_strings(substring(EventText, 24), dynamic(["<oii>", "</oii>"]),dynamic(["",""]))

This data can then be used in a Power BI report and visualised as a stacked bar chart:

The advantage of this is that you can see which partitions are refreshing when: in this case you can see that it was the partitions from the dimension tables that refreshed first, followed by the partitions from the fact table.

If you’re not using Fabric Workspace Monitoring it should be easy to adapt the queries above to work with Power BI Log Analytics integration instead – it’s just a matter of changing a few column names.

There’s one problem I can think of with this approach: if there are multiple small partitions being refreshed that take under a second there’s a chance that you’ll see inflated parallelism numbers in the results at some points. I don’t think that’s a major problem though and overall I think this is quite a useful way to understand how much parallelism you’re getting during a refresh.

[Thanks to Matthew Farrow for the inspiration here – check out his excellent, detailed series of articles on LinkedIn starting here about understanding Fabric capacities and billing]

Monthly Power BI Semantic Model Refreshes With Fabric Data Pipelines

I’m sure you already know how to configure scheduled refresh for your semantic models in Power BI. While the options you have for controlling when refresh takes place are generally good enough – you can configure daily or weekly refreshes and set up to eight times a day for refreshes to take place – there are some scenarios it doesn’t work for, such as monthly refreshes. Up to now the workaround has been to use Power Automate to trigger refreshes (see here for an example) or to call the refresh API from another application. Now, with Fabric, you have a much better option for scheduling refreshes: Data Pipelines.

The semantic model refresh activity in Fabric Data Pipelines was released last year and at the time all the focus was on the extra control it gives you over what happens in a refresh: it allows you to refresh individual tables or partitions and control the amount of parallelism, for example; it also allows you to schedule your refresh after other ETL operations, which in Fabric will probably also be scheduled using Pipelines, have completed. What I want to draw your attention to is the fact that Fabric Data Pipelines use the new Fabric scheduler which offers more flexibility for controlling when they run.

There’s some documentation here on how to schedule a Data Pipeline run but it’s very straightforward to use. I created a Fabric Pipeline consisting of a single semantic model refresh activity like so:

..then hit the Schedule button on the toolbar, selected “Monthly” on the Repeat dropdown and configured it to run on the first Sunday of the month:

Apart from the option to run on the Nth instance of a given day of the week you can also run the Data Pipeline on a given day number of the month; you can also run every N months, add up to 10 times to run per day, and set a start and end date.

There are two other options for scheduling that aren’t available in the scheduler for semantic models: the ability to run the Data Pipeline every N hours or minutes.

Be warned though: refreshing your semantic model every few minutes is dangerous because it can result in excessive CU consumption on your capacity and maybe even throttling if you’re not careful.

The same options to run a Data Pipeline daily and weekly that exist in the scheduler for semantic models with one notable limitation: the semantic model scheduler allows you to specify up to 48 times to refresh every day for models stored on a Premium/Fabric capacity, whereas the Fabric scheduler used by Pipelines only allows you to specify 10 times per day.

Of course you need a capacity to be able to use Fabric Data Pipelines but orchestration activities only cost 0.0056CU hours per activity run, so using Pipelines to refresh a semantic model in this way will only use a tiny fraction of even the smallest capacity’s resources.

Even if you don’t think you’re interested in anything that Fabric offers beyond Power BI, it’s features like this that, in my opinion, still make it worthwhile to flip the switch to enable Fabric and make your life as a Power BI developer easier.

Author: Chris Webb

Two New Power BI/Power Query Books

Like this:

Memory Overhead Of Distinct Count Measures In Power BI

Like this:

Current Status Of Snowflake Query Tags In Power BI

Like this:

Fabric Data Agents: Unlocking The Full Power Of DAX For Data Analysis

Like this:

Using Excel Copilot To Import Data With Power Query

Like this:

Speed Up Dataflow Publishing/ Validation Times In Power BI And Fabric

Like this:

Using Excel Solver With Power Pivot

Like this:

Refreshing Hidden Tables In Power BI Semantic Models With Fabric Data Pipelines

Like this:

Another Way To Visualise Import Mode Semantic Model Partition Refresh Parallelism

Like this:

Monthly Power BI Semantic Model Refreshes With Fabric Data Pipelines

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: