Role-Playing Dimensions In Fabric Direct Lake Semantic Models Revisited

Back in September 2024 I wrote a blog post on how to create multiple copies of the same dimension in a Direct Lake semantic model without creating copies of the underlying Delta table. Not long after that I started getting comments that people who tried following my instructions were getting errors, and while some bugs were fixed others remained. After asking around I have a workaround (thank you Kevin Moore) that will avoid all those errors, so while we’re waiting for the remaining fixes here are the details of the workaround.

Let’s say you have a Direct Lake on OneLake semantic model with two tables, a fact table called Conversation and a dimension table called Person. The Conversation fact table has one row for a conversation between two people, but at this point there is only one Person dimension in the model with a relationship from the FromPersonId column on Conversation to the PersonId column on Person:

How can you add a second copy of the Person dimension table without duplicating the data in OneLake?

In Power BI Desktop, while editing the semantic model, go to TMDL View and in the Data pane on the right hand side switch to the Model pane:

Expand Expressions and drag it into the TMDL pane to script it out. It should look something like this:

Then you need to make two changes:

  • On the line that starts “expression”, line 3 in the screenshot above, change the name of the expression to something new and unique
  • Delete the line that contains the lineage tag, line 8 in the screenshot above

Here’s what it should look like after:

Click Apply and this will create a duplicate Expression in the model. This is the trick that makes everything else work.

Next, create a new script in TMDL View and drag the Person dimension into it to script it out.

Then make the following changes to the script:

  • On the line that starts “table”, line 3 in the screenshot above, change the name of the table to something new and unique
  • One the line that starts “expressionSource”, line 31 in the screenshot above, change the name of the source expression to that of the new Expression created earlier
  • Delete all lines with lineage tags, ie those that start “lineageTag”
  • Add one line at the end with the text “changedProperty = Name”

Here’s what it should look like after:

Click Apply and this will create a copy of the dimension in the semantic model.

Then, back in Diagram View, you’ll see the new dimension table but with a warning saying that it hasn’t been refreshed. The next step is to refresh the model using the Schema and Data option:

At this point the new dimension table can be used like any other table, so you can create the relationship between the ToPersonId column on Conversation and the PersonId column on the new ToPerson dimension:

Generating Excel Reports With Fabric Dataflows Gen2

So many cool Fabric features get announced at Fabcon that it’s easy to miss some of them. The fact that you can now not only generate Excel files from Fabric Dataflows Gen2, but that you have so much control over the format that you can use this feature to build simple reports rather than plain old data dumps, is a great example: it was only mentioned halfway through this blog post on new stuff in Dataflows Gen2 Nonethless it was the Fabcon feature announcement that got me most excited. This is because it shows how Fabric Dataflows Gen2 have gone beyond being just a way to bring data into Fabric and are now a proper self-service ETL tool where you can extract data from a lot of different sources, transform it using Power Query, and load it to a variety of destinations both inside Fabric and outside it (such as CSV files, Snowflake and yes, Excel).

The documentation for the new Excel destination, which you can find here, is extremely detailed indeed so I thought it would be useful to show a simple example of how you can now use Dataflows Gen2 to build an Excel report. First of all I created a query using the Enter Data source that returned a table with some sales data in:

let
Source = Table.FromRows(
Json.Document(
Binary.Decompress(
Binary.FromText(
"i45WciwoyEktVtJRMlWK1YlW8i9KzEsH8w0NwAIBqYlFYK4RmOtelFgAkbZUio0FAA==",
BinaryEncoding.Base64
),
Compression.Deflate
)
),
let
_t = ((type nullable text) meta [Serialized.Text = true])
in
type table [Product = _t, Sales = _t]
),
#"Changed column type" = Table.TransformColumnTypes(
Source,
{{"Product", type text}, {"Sales", Int64.Type}}
)
in
#"Changed column type"

I then created a query called ReportTitle that contained the text for my report’s title:

let
Source = #table({"Title"},{{"My Sales Report"}})
in
Source

…and a query called FruitSalesOverview that passes the data from the FruitSales query to the FabricAI.Prompt M function to generate a text summary of it:

let
Source = FabricAI.Prompt("Summarise the fruit sales data in 20 words or less", FruitSales),
ToTable = #table({"Summary"}, {{Source}})
in
ToTable

The last query I created, called Output, generated a navigation table in the format described in the docs to describe the Excel output: the report title in a range starting in cell B1, the report summary in a range starting in cell B3, the sales data in a table starting in cell B5 and a bar chart showing the sales data.

let
excelDocument = #table(
type table [
Sheet = nullable text,
Name = nullable text,
PartType = nullable text,
Properties = nullable record,
Data = any
],
{
// Report title
{"Sales", "Title", "Range", [StartCell = "B1", SkipHeader = true], ReportTitle},
// Copilot-generated summary
{"Sales", "Summary", "Range", [StartCell = "B3", SkipHeader = true], FruitSalesOverview},
// Table containing sales data
{
"Sales",
"SalesTable",
"Table",
[StartCell = "B5", TableStyle = "TableStyleMedium9"],
FruitSales
},
//Column chart containing sales data
{
"Sales",
"SalesChart",
"Chart",
[
ChartType = "Column",
ChartTitle = "Fruit Sales",
DataSeries = [AxisColumns = {"Product"}, ValueColumns = {"Sales"}]
],
#table({}, {}) meta [Name = "SalesTable"]
}
}
)
in
excelDocument

I then set the Data Destination of the Output query to use the New File option to create an Excel file and save it to the Files section of a Fabric Lakehouse. The use of the Advanced format option meant that the navigation table returned by the Output query was used to determine the structure of the resulting Excel file.

After refreshing the Dataflow I downloaded the resulting Excel workbook from my Lakehouse. Here’s what it looked like:

Pretty fun. Does it give you full control over the format of the Excel file? No, not quite. Is it a somewhat code-heavy approach? Yes, but I suppose in the age of AI that doesn’t matter so much since you’re unlikely to write your own code (although, being old-school, I adapted the code above from the docs manually). Most importantly: is this a better way of dumping data to Excel and/or generating simple Excel reports in Fabric than paginated reports? Good question, especially since Power Query is now available in paginated reports. I suspect the answer is that although paginated reports are harder to build (though you can generate rdl with AI too, I’ve done it) and that it’s harder to control what a paginated report rendered as an Excel file looks like, paginated reports may still have the edge if you want an Excel report but I’m not sure – factors like CU cost and how long it takes to generate an Excel file using each approach would also need to be taken into account. If you just want to dump data to Excel, however, Dataflows Gen2 are probably a better option now.

Query Folding And Staging In Fabric Dataflows Gen2

A few years ago I wrote this post on the subject of staging in Fabric Dataflows Gen2. In it I explained what staging is, how you can enable it for a query inside a Dataflow, and discussed the pros and cons of using it. However one thing I never got round to doing until this week is looking at how you can tell if query folding is happening on staged data inside a Dataflow – which turns out to be harder to do than you might think.

Consider the following simple Dataflow consisting of two queries:

The first query, called StageData, reads data from a 6GB, 17 million row CSV file containing open data from the English Prescribing Data dataset. It returns two columns called PRACTICE_NAME and TOTAL_QUANTITY from that CSV file:

Staging is enabled on this query:

The second query, called GroupBy, takes the data returned by the StageData query and does a Group By operation to get the sum of the values in the TOTAL_QUANTITY column for each distinct value in PRACTICE_NAME:

The output of this query was loaded to a Fabric Warehouse:

The scenario is basically the same as the one from my previous post but with a much larger data volume, and the idea was to test again whether it was faster to stage the data and do the Group By on the staged data or to not stage the data and do the Group By while reading the data direct from the source.

It turned out that, once again, staging made performance worse (don’t worry, I have other tests that show it can help performance). But the point about staging is that by loading the data from a query into a hidden Fabric Lakehouse, managed by the Dataflow (which is what is meant by “staging”), any subsequent operations on this data are faster because query folding can take place against the SQL Endpoint of this hidden Lakehouse – and at the time of writing this post there’s no way of knowing from the Dataflows Editor whether query folding is taking place. Right-clicking on the step that does the Group By operation shows that the “View data source query” option is greyed out but this only tells you that you the Editor doesn’t know if folding is taking place:

In fact other things in the UI, such as the query plan and the query folding indicators, suggest incorrectly that folding is not taking place:

So I thought: if query folding is taking place then the Group By will result in a SQL query run against the hidden Lakehouse, so maybe I can see this SQL query somewhere? Unfortunately since the Lakehouse is hidden you can’t get to it through the Fabric web interface. But then I remembered that you can connect to a Fabric workspace using good old SQL Server Management Studio (instructions on how to can be found here). And when I connected using SSMS I could see two hidden objects created by by Dataflow called StagingLakehouseForDataflows and StagingWarehouseForDataflows:

I was then able to run a SQL query using the queryinsights.exec_requests_history DMV against StagingWarehouseForDataflows, filtered for the time range when my Dataflow refresh was taking place:

SELECT start_time,statement_type, command, total_elapsed_time_ms
FROM
queryinsights.exec_requests_history
WHERE
start_time>'insert DF refresh start time here'
AND end_time<'insert DF refresh end time here'
ORDER BY start_time desc

…and saw the following INSERT INTO statement that did the Group By operation I was expecting to see along with how long it took:

insert into
[StagingWarehouseForDataflows_20260223144925].[dbo].[1847c1263c7d4318a91dd6cd73ce48c6_2930fd3c_002D2a62_002D4518_002Dafbf_002D249e7af54403]
([Column1], [Column2])
select [_].[Column1] as [Column1],
convert(float, [_].[Total Quantity]) as [Column2]
from (
select [rows].[Column1] as [Column1],
sum([rows].[Column2]) as [Total Quantity]
from [StagingLakehouseForDataflows_20260223144911].[dbo].[1847c1263c7d4318a91dd6cd73ce48c6_179186be_002D367d_002D4924_002Da8ba_002Dd1f220415e3a]
as [rows]
group by [Column1] )
as [_]

So, a useful tip if you’re performance tuning a Dataflow even if it’s a bit of a pain to do. Hopefully in the future we’ll be able to see the SQL generated when query folding takes place against a staged table.

[Thanks to Miguel Escobar for his help with this]

A Closer Look At Preview-Only Steps In Fabric Dataflows

I have been spending a lot of time recently investigating the new performance-related features that have rolled out in Fabric Dataflows over the last few months, so expect a lot of blog posts on this subject in the near future. Probably my favourite of these features is Preview-Only steps: they make such a big difference to my quality of life as a Dataflows developer.

The basic idea (which you can read about in the very detailed docs here) is that you can add steps to a query inside a Dataflow that are only executed when you are editing the query and looking at data in the preview pane; when the Dataflow is refreshed these steps are ignored. This means you can do things like add filters, remove columns or summarise data while you’re editing the Dataflow in order to make the performance of the editor faster or debug data problems. It’s all very straightforward and works well.

The more I thought about this feature, though, the more I wondered about how it actually works and how it can be used in more complex queries that involve hand-written M code – so I decided to do a few tests. First of all, consider the following M query:

let
Step1 = 1,
Step2 = Step1 - 1
in
Step2

This query returns a numeric value: 0.

At this point it will return the same value in the editor when viewing the preview of the output and when the Dataflow refreshes. But if you right-click on Step2 and select “Enable only in previews” then the query still returns 0 in the preview but will return 1 when the Dataflow refreshes. You will also see this message displayed in the preview pane:

This data preview uses preview-only steps.
Results may differ when running the dataflow.

This makes sense because the query follows a linear pattern: the output references Step2 which in turn references Step1 so if you disable Step2 then the the output simply skips it and returns the value of Step1.

But what if your M code is more complex? For example consider this query:

let
x = 5,
y = 8,
xtimesy = x*y,
outputtable = #table(type table[xy=number],{{xtimesy}})
in
outputtable

Here’s what this returns in the editor: a table that contains the value 40.

What if you disable the step called “y”?

Here’s what I saw in my Warehouse when I loaded the output of the query to it:

I had no idea what the output would be before I saw it but, thinking about it, I assume what has happened is that since the step “y” has been disabled, “y” simply returns the value of the previous step in the query, “x”, and therefore the output becomes 5*5=25. This understanding of how preview-only steps work is backed-up by the fact that it is not possible to set step “x” to be preview-only. The option to do so is greyed out:

This all leads on to a pattern I used in a Dataflow this week and which I can see myself using a lot more in the future. While being able to disable steps when the Dataflow refreshes is very useful, sometimes what you need is to write some conditional logic in your M code that does one thing if you’re in the editor and another thing if the Dataflow is refreshing. Here’s a slightly modified version of the first query above which I have called IsRefresh in my Dataflow:

let
RefreshValue = 1,
PreviewValue = RefreshValue-1
in
PreviewValue

With the PreviewValue step set to be preview-only:

…then what we have here is a query that returns 1 when the Dataflow is refreshing and 0 when you’re viewing its output in the editor. Here’s an example of how it can be used in a query:

let
Source = #table(
type table [
NumericValue = number,
TextValue = text
],
{
{
IsRefresh,
if IsRefresh = 1 then
"This is a refresh"
else
"This is a preview"
}
}
)
in
Source

This query returns the following when you preview the output in the editor:

But when the Dataflow refreshes, the output in the destination is this:

Report On SAP And Salesforce Data In Fabric With Business Process Solutions

If you want to build a reporting solution on SAP (S/4HANA or ECC) or Salesforce data in Fabric and don’t want to build everything from scratch then you should check out Business Process Solutions. It’s a free, Microsoft-developed solution currently in public preview; the announcement blog post from last year is here and you can find all the docs here. It’s implemented as a Fabric custom workload which means that you can deploy it to a new workspace easily with just a few clicks, although there is of course a bit of configuration needed so it can connect to your data sources.

After you’ve done that it will generate all the Fabric items (pipelines, semantic models, Power BI reports etc) needed and you can concentrate on analysing your data. I know the team that is building Business Process Solutions and they are very smart so I’m sure what they’ve built is well designed.Check it out!

Monitor Fabric Costs With Fabric Cost Analysis

Following on from my blog post a few months ago about cool stuff in the Fabric Toolbox, there is now another really useful solution available there that anyone with Fabric capacities should check out: Fabric Cost Analysis (or FCA). If you have Fabric capacities it’s important to be able to monitor your Azure costs relating to them, so why not monitor your Fabric costs using a solution built using Fabric itself? This is what the folks behind FCA (who include Romain Casteres, author of this very useful blog post on FinOps for Fabric, plus Cédric Dupui, Manel Omani and Antoine Richet) decided to build and share freely with the community.

A lot of enhancements are planned for the future – I’m told there’s another big release planned for the end of October 2025 – but it’s already quite mature. Not only do you get all the ETL to extract cost data from Azure and Power BI reports you need but there’s also a Fabric Data Agent so you can query your data in natural language!

Aren’t Fabric costs straightforward? Don’t you just pay a flat fee for each capacity? In a lot of cases yes, but there can be complications: you might be pausing and resuming capacities or scaling them up or down. There can also be other costs you might not realise you’re incurring. For example if you’ve ever read any of Matthew Farrow’s excellent content on Fabric costs, such as this post on the cost implications of pausing a capacity, and wondered whether you’ve been charged extra for pausing a throttled capacity, then FCA can answer that question:

Like FUAM this is not an official Microsoft product but a solution accelerator with no official support, but it’s definitely a much better option than building something yourself or not monitoring your costs at all.

First Look At Fabric Graph: Analysing Power BI Import Mode Refresh Job Graphs

The new Fabric Graph database is now rolling out and should be available to everyone within the next few weeks if you can’t see it already. The key to learning a new data-related technology is, I think, to have some sample data that you’re interested in analysing. But if you’re a Power BI person why would a graph database be useful or interesting? Actually I can think of two scenarios: analysing dependencies between DAX calculations and the tables and columns they reference using the data returned by INFO.CALCDEPENDENCY function (see here for more details on what this function does); and the subject of this blog post, namely analysing Import mode refresh job graphs.

I’m sure even some of the most experienced Power BI developers reading this are now wondering what an Import mode refresh job graph is, so let me remind you a series of three posts I wrote early in 2024 on extracting the job graph events from a refresh using Semantic Link Labs and visualising them, understanding the concepts of blocking and waiting in an Import mode refresh, and crucially for this post how to save job graph information to a table in OneLake. Here’s a quick explanation of what a refresh job graph is using the model from the second post though. Let’s say you have an Import mode semantic model consisting of the following three tables:

X and Y are tables that contain a single numeric column. XYUnion is a calculated table which unions the tables X and Y and has the following DAX definition:

XYUnion = UNION(X,Y)

If you refresh this semantic model the following happens:

  1. The Power BI engine creates a job to refresh the semantic model
  2. This in turn kicks off two jobs to refresh the tables X and Y
  3. Refreshing table X kicks off jobs to refresh the partitions in table X and the attribute hierarchies in table X
  4. Refreshing table Y kicks off jobs to refresh the partitions in table Y and the attribute hierarchies in table Y
  5. Once both table X and Y have been refreshed the calculated table XYUnion can be refreshed, which in turn kicks off jobs to refresh the attribute hierarchies in table XYUnion

So you can see that refreshing an Import mode model results in the creation of refresh jobs for individual objects which have a complex chain of dependencies between them. If you want to tune an Import mode model refresh then understanding this chain of dependencies can be really useful. Running a Profiler trace while a refresh is happening and capturing the Job Graph trace events gives you all the data needed to do this.

I refreshed the model shown above and saved the Job Graph data for it to two tables in OneLake using the code in this post. The first table was called RefreshJobs and contained one row for each job created during the refresh:

The second table contained all the dependencies between the jobs and was called Links:

I then created a new Graph model in my workspace, clicked Get Data, selected the lakehouse where the two tables above were stored, selected those two tables:

…and clicked Load. Then in the model editor I clicked Add Node and created a node called RefreshJobs from the RefreshJobs table:

And then I clicked Add Edge and created an edge called DependsOn from the Links table:

Next I clicked Save to load all the data into the graph model (this very similar refreshing a semantic model).

This resulted in a simple graph model which represented the recursive relationship between the jobs in a refresh:

I was then able to use the query builder to create a diagram showing all the dependencies between the jobs:

It’s not as pretty as the tools for viewing DGML files I showed in this post last year but it has the advantage of allowing you to filter on the properties of nodes and edges. If you know GQL (and I wrote my first GQL query all of two hours ago…) you can write queries to do much more advanced types of analysis. Here’s a GQL query I managed to write which returns all the jobs that depend on the job with the JobId 1, and all the jobs that depend on those jobs:

MATCH 
(source_Job:Job)-[DependsOn:DependsOn]->{1,2}(target_Job:Job)
WHERE target_Job.jobId=1
RETURN 
target_Job.jobId,
target_Job.description,
source_Job.jobId,
source_Job.description

This is really cool stuff and in particular I would love to learn a bit more GQL to understand how the dependencies between objects and the amount of parallelism possible during a refresh affect refresh performance. If I get the time to do so I’ll write more blog posts!

Linking Fabric Warehouse SQL Queries And Spark Jobs To The Capacity Metrics App

Following on from my post two weeks ago about how to get the details of Power BI operations seen in the Capacity Metrics App using the OperationId column on the Timepoint Detail page, I thought it was important to point out that you can do the same thing with TSQL queries against a Fabric Warehouse/SQL Endpoint and with Spark jobs. These two areas of Fabric are outside my area of expertise so please excuse any mistakes or simplifications, but I know a lot of you are Fabric capacity admins so I hope you’ll find this useful.

First of all, how do you find the details of TSQL queries run against a Fabric Warehouse or SQL Endpoint that you see in the Capacity Metrics App? This is actually documented here, but let’s see a simple example. For TSQL queries run on a Fabric Warehouse or SQL Endpoint, the contents of the OperationId column represent the Distributed Statement Id of a query. In the Timepoint Detail page, in the Background Operations table with the OperationId column selected in the Optional Columns dropdown, you can take a single SQL Endpoint Query operation and copy the OperationId value (in this case 5BE63832-C0C7-457D-943B-C44FD49E5145):

…and then paste it into a TSQL query against the queryinsights.exec_requests_history DMV like so:

SELECT distributed_statement_id, start_time, command
FROM queryinsights.exec_requests_history 
WHERE distributed_statement_id = '5BE63832-C0C7-457D-943B-C44FD49E5145';

…and you can get the actual SQL query that was run plus lots of other useful information:

For Spark jobs, the OperationId of an operation in the Capacity Metrics App represents the Livy Id of the job. Here’s an operation of type Notebook Run seen in the Background Operations table on the Timepoint Detail page:

In this case the OperationId value of 41c5dc84-534d-4d21-b3fd-7640705df092 of the job matches the Livy Id of the job seen on the Run Details tab in the Fabric Monitoring Hub:

Unfortunately at the time of writing other Fabric workloads do not yet emit an OperationId or, if they do, may not emit an OperationId that can be linked back to other monitoring data. But as always, if things change or I find out more, I’ll let you know.

How To Get The Details Of Power BI Operations Seen In The Capacity Metrics App

It’s the week of Fabcon Europe and you’re about to be overwhelmed with new Fabric feature announcements. However there is a new blink-and-you’ll-miss-it feature that appeared in the latest version of the Fabric Capacity Metrics App (released on 11th September 2025, version 47) that won’t get any fanfare but which I think is incredibly useful – it allows you to link the Power BI operations (such as queries or refreshes) you see in the Capacity Metrics App back to Workspace Monitoring, Log Analytics or Profiler so you can get details such as the query text.

Let’s say you’re in the Capacity Metrics App in the existing Timepoint Detail page. On the top right hand corner of both the “Interactive operations for time range” and “Background operations for time range” tables there is a dropdown box that allows you to display additional columns. This box now contains an option to display the OperationId column:

After you’ve added this column you’ll see it contains a GUID:

There is also, incidentally, a new page in preview called Timepoint Item Detail (preview) which is reached through the new Timepoint Summary (preview) page and which will eventually replace the Timepoint Detail page. If you haven’t seen this you should check it out: I think it’s a big improvement. This also has a dropdown box that allows you to show the OperationId column in its versions of the “Interactive operations for time range” and “Background operations for time range” tables.

This page also has a dropdown box at the top that allows you to filter operations by OperationId.

The OperationId is a unique identifier for each Power BI operation. Right clicking on this value and selecting Copy/Copy value to copy it:

…means that you can use this value to cross-reference with the log data you find in Workspace Monitoring, Log Analytics or Profiler. For example I have a workspace with Workspace Monitoring enabled and found the following OperationId in the Capacity Metrics App: a7a2d4d4-2a9b-4535-b65a-a0cc0389d821. The following KQL query run on Workspace Monitoring:

let
OperationIdFromCapacityMetrics = "a7a2d4d4-2a9b-4535-b65a-a0cc0389d821";
SemanticModelLogs
| where OperationId == OperationIdFromCapacityMetrics
| where OperationName == "QueryEnd"
| project Timestamp, ItemName, EventText, DurationMs, OperationDetailName
| order by Timestamp

…returns the query text for the DAX query associated with this interactive operation:

There’s a lot of other information you can get by writing KQL queries from Workspace Monitoring (for some examples see here) such as the IDs of the visual and the report that generated the query. If you’re using Log Analytics or Profiler there is no OperationId column – it’s called XmlaRequestId in Log Analytics and RequestId in Profiler – but the same information is available there too.

This is very useful for admins trying to identify why a capacity is overloaded: it means that you can now see the details of expensive queries or refresh operations and understand why they are causing problems. Make sure you upgrade your Capacity Metrics App to the latest version and enable Workspace Monitoring on all important workspaces so you can do this!

[Thanks to Tim Bindas and Lukasz Pawlowski for letting me know about this]

What Happens When Power BI Direct Lake Semantic Models Hit Guardrails?

Direct Lake mode in Power BI allows you to build semantic models on very large volumes of data, but because it is still an in-memory database engine there are limits on how much data it can work with. As a result it has rules – called guardrails – that it uses to check whether you are trying to build a semantic model that is too large. But what happens when you hit those guardrails? This week one of my colleagues, Gaurav Agarwal, showed me the results of some tests that he did which I thought I would share here.

Before I do that though, a bit more detail about what these guardrails are. They are documented in the table here, they vary by Fabric capacity SKU size and there are four of them which are limits on:

  • The number of Parquet files per Delta table
  • The number of rowgroups per Delta table
  • The number of rows per table
  • The total size of the data used by the semantic model on disk

There is also a limit on the amount of memory that can be used by a semantic model, something I have blogged about extensively, but technically that’s not a guardrail.

Remember also that there are two types of Direct Lake mode (documented here): the original Direct Lake mode called Direct Lake on SQL Endpoints that I will refer to as DL/SQL and which has the ability to fall back to DirectQuery mode, and the newer version called Direct Lake on OneLake that I will refer to as DL/OL and which cannot fall back to DirectQuery.

For his tests, Guarav built a Fabric Warehouse containing a single table. He then added more and more rows to this table to see how a Direct Lake semantic model built on this Warehouse behaved. Here’s what he found.

If you build a DL/SQL model that exceeds one of the guardrails for the capacity SKU that you are using then, when you refresh that model, the refresh will succeed and you will see the following warning message in the model Refresh History:

We noticed that the source Delta tables exceed the resource limits of the Premium or Fabric capacity requiring queries to fallback to DirectQuery mode. Ensure that the Delta tables do not exceed the capacity's guardrails for best query perf.

This means that even though the refresh has succeeded, because the model has exceeded one of the guardrails then it will always fall back to DirectQuery mode – with all the associated performance implications.

If your DL/SQL model exceeds one of the guardrails for the largest Fabric SKU, an F2048, then refresh will fail but you will still be able to query the model and the model will again fall back to DirectQuery mode. For his tests, Guarav loaded 52 billion rows into a table; the guardrail for the maximum number of rows in a table for an F2048 is 24 billion rows. The top-level message you get when you refresh in this case is simply:

An error occurred while processing the semantic model.

Although if you look at the details you’ll see a more helpful message:

We cannot refresh the semantic model because of a Delta table issue that causes framing to fail. The source Delta table '<oii>billionrows</oii>' has too many parquet files, which exceeds the maximum guardrails. Please optimize the Delta table. See https://go.microsoft.com/fwlink/?linkid=2316800 for guardrail details.

The DAX TableTraits() function, which another colleague, Sandeep Pawar, blogged about here, can also tell you the reason why a DL/SQL semantic model is falling back to DirectQuery mode. Running the following DAX query on the 52 billion row model:

EVALUATE TABLETRAITS()

…returned the following results:

This shows that the table called billionrows actually exceeds the guardrails for the number of files, the number of rowgroups and the number of rows.

What about DL/OL models though? Since they cannot fall back to DirectQuery mode, when you try to build or refresh a DL/OL semantic model that exceeds a guardrail you’ll get an error and you won’t be able to query your semantic model at all. For example here’s what I saw in Power BI Desktop when I tried to use the 52 billion row table in a DL/OL model:

Something went wrong connecting to this item. You can open the item in your browser to see if there is an issue or try connecting again.
We cannot refresh the semantic model because of a Delta table issue that causes framing to fail. The source Delta table 'billionrows' has too many parquet files, which exceeds the maximum guardrails. Please optimize the Delta table. 

All of this behaviour makes sense if you think about it, even though I wouldn’t have known how things work exactly until I had seen it. Some behaviour may change in the future to make it more intuitive; if that happens I will update this post.

[Thanks to Gaurav for showing me all this – check out his podcast and the India Fabric Analytics and AI user group that he helps run on LinkedIn. Thanks also to Akshai Mirchandani and Phil Seamark for their help]