Two years ago I wrote a detailed post on how to do performance testing for Direct Lake semantic models. In that post I talked about how important it is to run worst-case scenario tests to see how your model performs when there is no model data present in memory, and how it was possible to clear all the data held in memory by doing a full refresh of the semantic model. Recently, however, a long-awaited performance improvement for Direct Lake has been released which means a full semantic model refresh may no longer page all data out of memory – which is great, but which also makes running performance tests a bit more complicated.
First of all, what is this new improvement? It’s called Incremental Framing and you can read about it in the docs here. Basically, instead of clearing all data out of memory when you do a full refresh of a Direct Lake model, the model now checks each Delta table it uses to see whether the data in it has actually changed. If it hasn’t changed then there’s no need to clear any data from that table out of memory. Since there’s a performance overhead to loading data into memory when a query runs this means that you’re less likely to encounter this overhead, and queries (especially for models where the data in some tables changes frequently) will be faster overall. I strongly recommend you to read the entire docs page carefully though, not only because it contains a lot of other useful information, but also because you might be loading data into your lakehouses in a way that prevents this optimisation from working.
Let me show you an example of this by revisiting a demo from a session I’ve done at several user groups and conferences on Power BI model memory usage (there are several recordings of it available, such as this one). Using a Direct Lake semantic model consisting of a single large table with 20 columns containing random numbers, if I use DAX Studio’s Model Metrics feature when there is no data held in memory and with the Direct Lake Behaviour setting in DAX Studio’s Options dialog set to ResidentOnly (to stop Model Metrics from loading data from all columns into memory when it runs):

Then when you run Model Metrics the size of each column in the semantic model is negligible and the Temperature and Last Accessed for all model columns are blank:

The, if I run a query that asks for data from just one column (in this case the column called “1”) from this table like this:
EVALUATE ROW("Test", DISTINCTCOUNT('SourceData'[1]))
Then rerun Model Metrics then the size in memory for that column changes, because of course it has been loaded into memory in order to run the query:

Zooming in on the Model Metrics table columns from the previous screenshot that show the size in memory:

And here are the Temperature and Last Accessed columns from the same screenshot which are no longer blank:

Since the query had to bring the column into memory before it could run, the DAX query took around 5.3 seconds. Running the same query after that, even after using the Clear Cache button in DAX Studio, took about only 0.8 seconds because the data needed for the query was already resident in memory.
OK, so far nothing has changed in terms of behaviour. However if you do a full refresh from the Power BI UI without making any changes to the underlying Delta tables:

And then rerun the Model Metrics, nothing changes and the data is still in memory! As a result the DAX query above still only takes about 0.8 seconds.
So how do you get that worst-case performance again? As mentioned in the docs here, you now need to do a refresh of type clearValues followed by a full refresh. You can’t do a refresh of type clearValues in the Power BI UI though, so the easiest way to do is to use a Fabric notebook and Semantic Link Labs. Here’s how. First install Semantic Link Labs:
%pip install semantic-link-labs
Then use the following code in a notebook cell to do a refresh of type clearValues followed by a full refresh:
import sempy_labs as labs
WorkspaceName = "Insert Workspace Name Here"
SemanticModelName = "Insert Semantic Model Name Here"
# run a refresh of type clearValues first
labs.refresh_semantic_model(dataset=SemanticModelName, workspace=WorkspaceName, refresh_type="clearValues")
# then a refresh of type full
labs.refresh_semantic_model(dataset=SemanticModelName, workspace=WorkspaceName, refresh_type="full")

After doing this on my model, Model Metrics shows that the column called “1” that was previously in memory is no longer resident:


…and the query above once again takes 5 seconds to run.
So, as you can see, if you’re doing performance testing of a Direct Lake model you now need to make sure you do a refresh of type clearValues and a full refresh of your model before each test to ensure no data is resident in memory and get worst-case performance readings, in addition to testing performance on a cold cache and a warm cache.
Thanks for the blog. I’m assuming it is relevant to both DirectLake on SQL and DirectLake on OL.
Do you happen to know if there is an easy way to omit some of the delta partitions from a semantic model in a given table? Would love to manually flag some partitions as being unnecessary when the data is framed and transcoded into memory. (eg. fiscal years older than five years can be omitted for most day-to-day user queries).
I asked in reddit once and it sounds like the PG has no plans to support partitioning at the semantic model layer, since the partitioning is implemented in the deltatable. But the deltatable partitioning is NOT very flexible. As far as I’m aware there is not any flexibility to allow us to be selectively pick the data that we want to expose via the model. IE. There is no “predicate pushdown” that would inspect user queries and selectively determine what data will participate in framing/transcoding (because those are things that happen at an earlier stage).
Side question for you about Excel pivot tables against a DL-on-OL model. These models don’t ever send direct queries back to another query engine. From an Excel standpoint they should be very similar to an import model. Yet the PG has disallowed the most valuable features of the “PivotTable Analyze” features in Excel. I don’t think they understand the value in allowing users to define custom measures and sets. As we start converting our import models to DL-on-OL it will be very unfortunate to sacrifice pivot table functionality in this way. It is also not clear why we have to make the compromise, given the fact that there is no direct query fallback. Hopefully you can help convince that team of the value in having a powerful pivot table experience.
Thanks again.
Yes, it is relevant to all types of Direct Lake: DL/SQL and DL/OL.
Regarding partitioning, we’ve had some discussions about ways to achieve what you’re talking about but there are no explicit plans yet.
Regarding measures and sets, I didn’t test on DL/OL but I’ve just created a PivotTable on a DL/SQL model and those features were available…?
The appear to be available but after you get to the very end and try to confirm the creation of the calculated measure (or set), it will fail.
There was a longer discussion about this on reddit with dbrownems (SQL CAT?) and datazoems (ASWL PM?) Search for “DirectLake on OneLake – another unexpected gotcha in Excel”
I was hoping to get some transparency from that team into why the functionality is disallowed for DL-on-OL. My hunch is that they just needed to remove the error message itself – and things would work fine. (ie. because sets and calc measures work fine when used in an isolated MDX query).
… It is definitely not clear us why the DL-on-OL is limited in these additional ways that don’t affect import models. The two MS folks on reddit seemed to want to leave it as a mystery. I was hoping you would be more inclined than they were to help me solve this mystery.
I am perfectly willing to wait for a resolution, since I realize that DL-on-OL is still in preview. But it would be helpful to understand the source of the mystery as well. I was already blazing ahead and deploying these types of models to production, but now I’m really going to start tapping the brakes. I don’t know if I should tell the users that we will push forward again in the next three months or three years. If this wasn’t so mysterious then I’d be able to give them a better answer.
FYI, I am a little bothered whenever the Excel experiences for OLAP are degraded in any way. Excel has ALWAYS given us a very high-quality and productive place to interact with OLAP data (heck, even the pivot table experience in Excel 2003 still seems pretty impressive, and was a large part of Microsoft’s success in the BI space).
Ah yes, I see. I think the point to make is that this is not a deliberate decision on our part – reading the thread and knowing the people involved, I can see that no-one expected calculated members and sets not to work in Excel, and even if someone probably knew that session-scoped calculations wouldn’t work for Direct Lake models they didn’t realise the implications. I don’t know how much work would be needed to get this working but it will definitely take more than a month and the work will have to get prioritised relative to all the other things the team is working on.
OK, I’ll leave it in your capable hands to fight for Excel pivot table users. BTW, I really appreciate the concept of DL-on-OL. Conceptually it is like they are giving us the ability to control the internal storage structure of a cube, and almost reminds me of the multidimensional days where we were able to define our own aggregation designs, or whatever. The other nice thing about DL-on-OL is that we now have a “back-door” to send data into a model, without having to use an expensive Power Query ETL. … Taking a step back even further, I’m getting the impression that some of these product enhancements are focusing on the needs of pro-code developers once again. I’ve been waiting a long time for Microsoft to give pro-code developers a bit more TLC!