Another way to visualise Import model semantic model partition refresh parallelism

I’m a big fan of Phil Seamark’s “visualise your refresh” pbix file, which allows you to analyse the different operations of an Import mode semantic model refresh operation (similar functionality has now been built into Semantic Link Labs now too). In particular I like how it allows you to visualise operations inside a refresh as a Gantt chart, which makes it easy to see how much parallelism you’re getting when you refresh – and as I discussed here, the amount of parallelism can have a massive effect on how long a refresh takes. However Gantt charts are just one way to visualise this data and after one of my colleagues showed me the KQL range() function I realised there was a different way to approach this problem using KQL and Fabric Workspace Monitoring.

Here’s the KQL query I came up with to find the number of partitions that are refreshing in parallel at any given point in time when a semantic model is refreshed:

SemanticModelLogs
//Filter by Timestamp for performance
| where Timestamp > ago(1h)
//Filter by the OperationId of the refresh
| where OperationId == "insert OperationId of refresh here"
//Only get the events fired when partition refresh finishes
| where OperationDetailName =="Process" and OperationName =="ProgressReportEnd"
//Find the start time of the partition refresh
| extend StartTime = datetime_add("Millisecond", -1 * DurationMs, Timestamp)
//Create a list of all the seconds where the refresh was in progress
| extend StartSecond = range(bin(StartTime,1s), bin(Timestamp,1s), 1s)
| mv-expand StartSecond
//Aggregate the data and get the number of partitions being refreshed in each one-second bin
| summarize PartitionsBeingRefreshed=count() by todatetime(StartSecond)
//Sort in ascending order by one-second bins
| order by StartSecond asc

What this query does is:

Filter the events in the SemanticModelLogs table to just those for the refresh whose OperationId you enter
Filter again so you only get the ProgressReportEnd events that are fired when a partition refresh finishes
Calculates the start time of the partition refresh by taking the finish time and subtracting the duration of the refresh
Creates a table with one row for each second that each partition refresh was in progress
Aggregates this table so you get the number of partition refresh operations that were in progress for each one second bin

All events associated with a single semantic model refresh can be identified by the value in the OperationId column in the SemanticModelLogs table in Workspace Monitoring. To find it, just run a KQL query on the SemanticModelLogs table to get all the events from the time period when your refresh took place; it will be easy to spot the events associated with the refresh by looking at the contents of the EventText column, so all you need to do is copy the value from the OperationId column from one of these events.

I refreshed a semantic model with six dimension tables and one fact table containing ten partitions and here’s what the query above showed when visualised as a line chart in a KQL Queryset:

As you can see, at first four partitions are refreshing in parallel; this quickly drops and then rises, and for the first half of the refresh six partitions are refreshing in parallel. At about the two-thirds point this drops to four partitions refreshing in parallel and as they complete the line chart drops to one. Since six partitions refreshing in parallel is the maximum I allowed for this refresh you can see it’s reasonably efficient.

To get more detail on what is being refreshed, here’s a variation of the KQL query above that doesn’t perform the final aggregation:

SemanticModelLogs
| where Timestamp > ago(1h)
| where OperationId == "54d42645-9672-409a-844c-42403526b596"
| where OperationDetailName =="Process" and OperationName =="ProgressReportEnd"
| extend StartTime = datetime_add("Millisecond", -1 * DurationMs, Timestamp)
| extend StartSecond = range(bin(StartTime,1s), bin(Timestamp,1s), 1s)
| mv-expand StartSecond
| project StartSecond = todatetime(StartSecond), Partition = replace_strings(substring(EventText, 24), dynamic(["<oii>", "</oii>"]),dynamic(["",""]))

This data can then be used in a Power BI report and visualised as a stacked bar chart:

The advantage of this is that you can see which partitions are refreshing when: in this case you can see that it was the partitions from the dimension tables that refreshed first, followed by the partitions from the fact table.

If you’re not using Fabric Workspace Monitoring it should be easy to adapt the queries above to work with Power BI Log Analytics integration instead – it’s just a matter of changing a few column names.

There’s one problem I can think of with this approach: if there are multiple small partitions being refreshed that take under a second there’s a chance that you’ll see inflated parallelism numbers in the results at some points. I don’t think that’s a major problem though and overall I think this is quite a useful way to understand how much parallelism you’re getting during a refresh.

[Thanks to Matthew Farrow for the inspiration here – check out his excellent, detailed series of articles on LinkedIn starting here about understanding Fabric capacities and billing]

Another Way To Visualise Import Mode Semantic Model Partition Refresh Parallelism

Like this:

Published by Chris Webb

One thought on “Another Way To Visualise Import Mode Semantic Model Partition Refresh Parallelism”

Leave a ReplyCancel reply

Share this:

Like this:

Published by Chris Webb

One thought on “Another Way To Visualise Import Mode Semantic Model Partition Refresh Parallelism”

Leave a ReplyCancel reply

Discover more from Chris Webb's BI Blog