For as long as I’ve been using Power BI – which has been from the beginning – the advice about which storage mode to choose has been the same: use Import mode unless you have a really, really good reason to use DirectQuery mode and even then you’re probably wrong and should use Import mode. Import mode was always a lot faster and a lot easier to tune. Marco’s advice in this LinkedIn post from last year pretty much summed up my attitude and that of every other Power BI expert out there:

DirectQuery was only for situations where you either:
- Had more data than you could fit in memory, which was very, very rare, or
- Where you needed to see truly real-time data in your reports, which business stakeholders always claim they want but they rarely ever need
The addition of Direct Lake as a third storage mode in Fabric didn’t change the situation. While there are some really good reasons to use Direct Lake instead of Import mode (which I should really cover in a separate blog post because I don’t think they are explained properly anywhere), it didn’t change the advice around DirectQuery and I still didn’t think DirectQuery was a good option for most people.
I have now changed my mind. Import or Direct Lake should still be the default for most projects but DirectQuery is now the best choice when you’re working with larger data volumes that could still be handled by Import or Direct Lake. This is a controversial statement, I know, and it’s based on experiences with customers that I can’t talk about directly, and as always there are some important details, so let me explain myself.
My new advice is this. If:
- You are starting a new project that is 100% on Fabric and all of your data will be loaded into a Fabric Lakehouse or Warehouse, and
- You have the following:
- Fact tables with more than a couple of billion rows in them, and
- Dimension tables with more than a couple of million rows in them, and/or
- Distinct count measures on columns with more than a couple of million distinct values in
Then you should start with a composite model design that uses:
- DirectQuery mode fact tables on the Warehouse or SQL Endpoint of your Lakehouse
- Dual mode dimension tables (which means they can swap between Import mode and DirectQuery mode when necessary)
- Aggregation tables based on your fact tables that are either in Import mode or DirectQuery mode
Why am I recommending this? Well let me take Marco’s objections to DirectQuery mode one by one:
Is Import mode or Direct Lake mode always faster than DirectQuery? For the composite model scenario described above, no, not always. Even if your Import mode or Direct Lake mode model fits within the memory limits of the Fabric capacity that you’re using, then a composite model and DirectQuery fact tables can be almost as fast and may be faster. This is because:
- If your query hits an aggregation table, which is potentially a lot smaller than the fact table, then it will be faster than if it hits an Import mode fact table. Of course if your aggregation tables are in Import mode then naturally your query will be faster than if it hits an Import mode fact table, so you could argue that this is an Import mode scenario anyway. But the native aggregation functionality is not available in Import mode or Direct Lake mode at the time of writing, it’s only available if the underlying table is in DirectQuery mode. And yes I know you can simulate aggregations in Import mode with some clever DAX but that clever DAX can also carry an overhead.
- The Fabric Warehouse engine, which not only powers Fabric Warehouse but also the SQL Endpoint of a Lakehouse, has improved a lot in the last year and is now incredibly fast. When the new GPU acceleration and other upcoming performance features land then it will be even faster. For example, it’s already faster at doing distinct counts than the Vertipaq engine used by Import or Direct Lake models.
- The Warehouse engine also has some architectural advantages over Vertipaq when it comes to concurrency. Now Vertipaq is already really, really good at concurrency and features like semantic model scale-out make it even better, but for the large data volumes mentioned above and when you have more than 5-10 concurrent users then Warehouse is better. It’s actually quite rare to have more than 5-10 genuinely concurrent users (ie users that are running queries at exactly the same time) in a BI solution but it does happen, for example at month-ends.
Is DirectQuery for the rich? Is DirectQuery always more expensive than Import mode or Direct Lake mode? Actually no, not if you’re using a composite model in Fabric for the scenarios we’re talking about.
Let’s say you’re building a large Import mode or Direct Lake mode model in Fabric and you hit the limits of the capacity SKU that you’re using. It could be that you’re hitting the memory limit for your SKU or one of the other Direct Lake guardrails. Or it could be that during your load testing (and you should always do load testing) you hit the CU limits of your capacity. What do you do? Well you can and should do some tuning to see if you can avoid hitting those limits. Or you can scale up to the next capacity size, although that might be expensive. The third option is to use a composite model with DirectQuery fact tables in the way I’ve described.
The first advantage of a composite model with DirectQuery fact tables here is that because only your dimension tables and any Import mode aggregation tables you have are subject to the memory limits for Power BI models that each capacity SKU enforces, you’re much less likely to hit those limits.
The second advantage of a composite model with DirectQuery is that is that if you hit your fact table then Power BI will generate SQL queries against the Warehouse engine and the SQL queries generated are counted as background operations which are then smoothed over 24 hours. DAX queries on Import mode or Direct Lake mode models are counted as interactive operations and are smoothed over 5 to 64 minutes. Queries against a DirectQuery table will still consume some interactive CUs and of course if you hit a Dual mode dimension table or an Import mode aggregation that will also consume interactive CUs, but the Storage Engine is where most of the CUs get burned in an Import or Direct Lake mode model. This is the main reason why a composite model approach can handle more concurrent users: you’re less likely to run into the CU limits for the capacity SKU you’re using because of the difference in how smoothing works.
What’s more, these advantages mean that even if you don’t hit the limits for the capacity SKU you’re using, you could still save money by using a composite model because it could allow you to use a smaller capacity.
You could argue that both of these advantages are purely accounting tricks, results of the rules that we at Microsoft have imposed on how Fabric capacities work, and you’d have a point. But it’s unlikely these rules will change anytime soon.
Is DirectQuery still difficult to manage? Yes, this objection still stands in my opinion. If you need to tune an Import model you need an expert in tuning Import models. They’re rare but they exist – you can call Marco for example or take one of his courses. If you need to tune a composite model like the one I’ve described you need someone who can tune an Import model, someone who can tune a DirectQuery model (which is really rare) and someone who can tune your Fabric Warehouse or SQL Endpoint. You might not have that combination of skills in your team, and if you do then it would make maintenance and development more expensive – which undermines the “DirectQuery is cheaper” argument a bit.
Finally, there are a couple of other important questions that need to be addressed.
I have Databricks, Snowflake or some other database and I’d like to use that instead of Fabric Warehouse or the Lakehouse SQL Endpoint. Is that a good idea? No, and I’m not just saying that because I work on the Fabric team and I want you to believe that Warehouse/SQL Endpoint is faster or better than them. It’s because the underlying architecture of Fabric and the connector that Power BI uses to connect to Fabric Warehouse/SQL Endpoint means that DirectQuery on Warehouse/SQL Endpoint is significantly faster and more scalable, even apart from the performance of the SQL queries themselves. More optimisations are planned to make the “better together” story even more compelling. And no, before anyone suggests it, we’re not deliberately trying to hobble the performance of other, non-Fabric databases – many other Microsoft data sources share the same architectural disadvantages when it comes to DirectQuery as Snowflake and Databricks. And while it is possible to make DirectQuery perform well for any data source, my point is that DirectQuery on Fabric Warehouse is a special case.
Also, much anecdotal evidence suggests that using Power BI in DirectQuery mode on non-Fabric sources can be more expensive than Import mode because you need to pay to use those other sources as well as pay for your Fabric capacity. DirectQuery mode on non-Fabric databases is more expensive in CU terms than DirectQuery on Fabric Warehouse/SQL Endpoint because of those architectural differences I mentioned and I’ve seen it even be more expensive than an Import model in terms of CU consumption. Meanwhile, as I’ve said, for larger volumes DirectQuery on Warehouse/SQL Endpoint can be cheaper than Import mode or Direct Lake.
Are there any other advantages to DirectQuery mode? Yes but maybe not ones you care about. Do you want genuinely dynamic calculated columns? It’s been possible for years with DirectQuery (see here – the example uses KQL but similar things are possible in SQL). One day I’ll get my demo showing how to do proper time zone conversion, handling daylight savings time, in DirectQuery. My colleague Mark Pryce-Maher has a nice demo of calling Fabric AI functions in a DirectQuery model. But these are all niche use cases.
Your size recommendations for when to use DirectQuery above are very vague. Can’t you be more precise? No, because so much depends on the design of your semantic model and reports and the nature of your data. In an ideal world you would still always choose Import mode or Direct Lake mode as your default and only shift to a DirectQuery composite model when necessary but doing that halfway through a project is quite disruptive. If you want to test the performance of Import mode, Direct Lake mode and DirectQuery mode for your project then go ahead, but if you don’t then for the volumes I’m talking about there’s a very good chance DirectQuery is the best option.
Last of all, I need to stress that this recommendation could change in the future. While Fabric Warehouse is getting a lot of improvements and optimisations, so is Direct Lake mode, so the price and performance characteristics of both will change a lot and that means the decision about which one is the best choice for larger data volumes may change too. I promise to update this post if and when that happens.
I’m gonna save this post preciously!
Let me share a funny experience.
I tried this exact approach, except that I was using BigQuery instead of Fabric Warehouse / Lakehouse as my source, with a composite model and a large fact table in DirectQuery.
I built a matrix.
And I blew through my Bigquery daily quotas within 5 minutes of loading.
Why? Because the BigQuery connector generates one query per matrix row instead of a simple “SELECT col_1, SUM()… GROUP BY…”. So it ended up scanning my entire table multiple times.
My conclusion: you need to trust your connector before applying this approach. And obviously, staying within Fabric goes a long way toward avoiding these kinds of behaviors.
I tried DirectQuery a few days ago, but it didn’t fully support the new enhanced time intelligence feature when the source wasn’t in Fabric.
If my source is a Lakehouse/Warehouse, are there any Power BI features that only work in Import mode, making DirectQuery a disqualifying factor?
There is still one additional aspect for using direct query – security. If you have the need to utilize and leverage the security setup you’ve already done in the source systems – direct query + passing through credentials is the way to go.
I agree (!!) with what you wrote! 🙂
Just one caveat: I’d say that for your baseline, this point is critical: “Distinct count measures on columns with more than a couple of million distinct values…”
Without Distinct Count, I wouldn’t consider DirectQuery for fact tables with fewer than 10 billion rows. Much should be said about the data model, too. In complex models with m-to-m relationships and/or bidirectional filters, it becomes a huge “it depends” because of the need to materialize an intermediate table that must be processed by the formula engine.
I wouldn’t change my mind, though: start with Import, consider Direct Lake, evaluate DirectQuery if you have strong near-real-time requirements (below 5min latency between events in large tables and report updates) or if the dataset is too big to fit in the capacity you have available.
Anecdotical experience says 70-80% of DirectQuery models I’ve seen didn’t have a good reason to be there.
In fact it’s in that 1-10 billion row range that I have seen that change where DQ is the right choice now – I was very surprised but it’s definitely true. It’s 75% the difference between smoothing for interactive vs background operations and 25% new performance features for Fabric Warehouse/SQL Endpoint that are the reason for this. We should talk offline 🙂
Great post, and the BigQuery disclaimer is spot on!
I’d add one more horror to the “trust your connector” list: beyond the one-query-per-row matrix issue, the BigQuery connector also loves sending SELECT * with 100+ columns when you only asked for 3. Full table scan, every time, for every visual.
The composite model pattern is powerful — but it lives or dies by how smart the connector is at translating DAX to SQL. On Fabric Warehouse that translation is battle-tested. On BigQuery, you’re basically hoping for the best. 🐕
Test your generated SQL in Query Diagnostics before blowing through your daily quota in 5 minutes. Ask me how I know. 😅
Power BI sends wide queries for all sources – but only for queries inside a subselect. The outer query only asks for the columns it needs. Any good database query optimiser should be able to handle this – is it a problem for BigQuery?
Fair point on the subselect structure, Chris.
But in BigQuery’s columnar engine, this:
SELECT a, b, c FROM (SELECT a, b, f, g, h, j, y… FROM toto)
…isn’t the same slot-time as SELECT a, b, c FROM toto.
The inner query explicitly lists 15 columns — BigQuery reads all of them from storage first. The outer projection doesn’t undo that scan. As Google puts it: “Only select the columns necessary, especially in inner queries” (
BigQuery Query Optimization guide
). That’s precisely the pattern Power BI’s connector generates. 🙂
It would be great to see benchmarking on datasets of different sizes.
I am evaluating using fabric in my org using f64 trial. Not done a scientific comparison, but found DirectQuery on Lakehouse to be as quick as DirectLake. This is based on a fact with just over a billion rows with simple star schema.
I tried as a bit of a stress test a semantic model with two tables of a billion rows and a one to one join. DirectLake fails, but DirectQuery is “usable” depending on performance expectations. How it does the join on a billion rows so fast, I do not understand – often <3 seconds, for example aggregating measure on table A and filtering on table B. I had a quick look at the SQL generated and yes, it looks "clever".
I have not tried dual mode and wonder if that is even necessary. I have a fairly low tolerance for too much performance tweaking in semantic models!
Dual mode is necessary for two reasons. First, because some queries (eg ones to populate slicers) only hit dimension tables then answering these queries from an Import mode table is the quickest way to do so (because there are no joins and the tables are small – max a few million rows). Second, and most importantly, in DQ one of the main bottlenecks is the number of connections Power BI can open back to the data source. If you can answer a query from an Import mode table then that’s one less SQL query that needs to be run and so one less connection that is busy.
Thanks. Had not considered the number of SQL connections
Hi Chris, did you run any tests when comparing DQ/DL against Mirroring artifacts routing to Unity Catalog?
Yes, but you will have to wait a few weeks before I can talk about this
What a joke.. Kyle Hale at Databricks was spot on
https://www.linkedin.com/posts/kylehale_another-month-another-absolute-bs-post-by-share-7477729834161332225-j2OD/
Looks like Kyle deleted his post before I could read it unfortunately