How Query Folding And The New Power BI Dataflows Connector Can Help Dataset Refresh Performance

Chris Webb Dataflows, Power BI, Power BI Desktop, Power Query August 22, 2021August 22, 2021 3 Minutes

You may have noticed that a new dataflows connector was announced in the August 2021 release of Power BI Desktop, and that it now supports query folding between a dataset and a dataflow – which you may be surprised to learn was not possible before. In this post I thought I’d take a look at how much of an improvement in performance this can make to dataset refresh performance.

For my tests I created a new PPU workspace and a dataflow, and made sure the Enhanced Compute Engine was turned on for the dataflow on the Settings page:

Query folding will only happen if the Enhanced Compute Engine is set to “On”, and won’t happen with the “Optimized” setting. The Enhanced Compute Engine is only available with PPU and Premium.

For my data source I used a CSV file with a million rows in and seven integer columns. I then created two tables in my dataflow like so:

The Source table simply connects to the CSV file, uses the first row as the headers, then sets the data type on each column. The second table called Output – which contains no tranformations at all – is needed for the data to be stored in the Enhanced Compute Engine, and the lightning icon in the top-left corner of the table in the diagram shows this is the case.

Next, in Power BI Desktop, I created a Power Query query that used the old Power BI dataflows connector:

If you have any existing datasets that connect to dataflows, this is the connector you will have used – it is based on the PowerBI.Dataflows function. My query connected to the Output table and filtered the rows to where column A is less than 100. Here’s the M code, slightly edited to remove all the ugly GUIDs:

let
    Source = PowerBI.Dataflows(null),
    ws = Source{[workspaceId="xxxx"]}[Data],
    df = ws{[dataflowId="yyyy"]}[Data],
    Output1 = df{[entity="Output"]}[Data],
    #"Filtered Rows" = Table.SelectRows(Output1, each [A] < 100)
in
    #"Filtered Rows"

Remember, this connector does not support query folding. Using this technique to measure how long the query ran when the results from the query were loaded into the dataset, I could see it took almost 12.5 seconds to get the data for this query:

In fact the performance in Desktop is worse: when refresh was taking place, I could see Power BI downloading 108MB of data even though the original source file is only 54MB.

Why is the data downloaded twice? I strongly suspect it’s because of this issue – because, of course, no query folding is happening. So the performance in Desktop is really even worse.

I then created the same query with the new dataflows connector:

This connector uses the PowerPlatform.Dataflows function; it’s not new, but what is new is that you can now access Power BI dataflows using it.

Here’s the M code, again cleaned up to remove GUIDS:

let
    Source = PowerPlatform.Dataflows(null),
    Workspaces = Source{[Id="Workspaces"]}[Data],
    ws = Workspaces{[workspaceId="xxxx"]}[Data],
    df = ws{[dataflowId="yyyy"]}[Data],
    Output_ = df{[entity="Output",version=""]}[Data],
    #"Filtered Rows" = Table.SelectRows(Output_, each [A] < 100)
in
    #"Filtered Rows"

When this query was loaded into the dataset, it only took 4 seconds:

This is a lot faster, and Power BI Desktop was a lot more responsive during development too.

It’s reasonable to assume that query folding is happening in this query and the filter on [A]<100 is now taking place inside the Enhanced Compute Engine rather than in Power BI Desktop. But how can you be sure query folding is happening? The “View Native Query” option is greyed out, but of course this does not mean that query folding is not happening. However, if you use Query Diagnostics, hidden away in the Data Source Query column of the detailed diagnostics query, you can see a SQL query with the WHERE clause you would expect:

In conclusion, you can see that the new dataflows connector can give you some big improvements for dataset refresh performance and a much better development experience in Power BI Desktop. Query folding support also means that you can now use dataset incremental refresh when using a dataflow as a source. However, you will need to use Premium or PPU, you may also need to make some changes to your dataflow to make sure it can take advantage of the Enhanced Compute Engine, and you will also need to update any existing Power Query queries to use the new connector. I think the potential performance gains are worth making these changes though. If you do make these changes in your dataflows and find that it helps, please leave a comment!

Published by Chris Webb

My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel. View all posts by Chris Webb

Published August 22, 2021August 22, 2021

30 thoughts on “How Query Folding And The New Power BI Dataflows Connector Can Help Dataset Refresh Performance”

Pingback: Improving Dataset Refresh with Query Folding and the Dataflows Connector – Curated SQL
quaia says:

August 24, 2021 at 2:21 pm

hi Chris, thanks for sharing your knowledge,

Today some companies face a big issue regarding premium capacity due to overhead and utilization.

A large part of the impact comes from refresh time and many refreshing datasets in the same time.

as far as I know, a normal dataflow don’t use hot memory from capacity over time. When I leave many dataflows with ECE enabled is there a constant memory consumption of the capacity even if you don’t have a dataset refreshing that are using them as a source (don’t consider directquery to dataflows)?

Loading...

Reply
Michael T. says:

August 24, 2021 at 3:45 pm

Thanks for explaining the advantages of the new connector, Chris! Great stuff as always. I’m curious, is updating a dataset that’s using the older connector just as simple as replacing “PowerBI.Dataflows” in the Source section with “PowerPlatform.Dataflows”, then close and apply?… or is the process different than that? Cheers!

Loading...

Reply
Saravana Kumar says:

August 25, 2021 at 6:50 am

Hi Chris,
It’s nice to have Query folding available for Dataflow but Dataset Incremental refresh is not working when some of the old existing records getting updated.

Loading...

Reply
Fred S. says:

August 25, 2021 at 4:21 pm

On my side, swapping the legacy connector with this one has some limitations:
1. The connection from PBI Desktop is super slow;
2. Loading Date-only fields from a dataflow converts them to a DateTime field, forcing you to run an extra datatype conversion (I don’t remember though if such conversions break query folding)

To me, I reverted to legacy dataflows and will consider migrating again once the code base is stable enough.

Loading...

Reply
1. Curt Hagenlocher says:
  
  September 18, 2021 at 3:02 am
  
  Okay, we’ve found a performance problem mostly limited to the editor where the “top N” operator isn’t being folded to the compute back-end. This can result in a significantly worse editing experience. The fix will be in the October release, and we’re going to try to port it to the September release as a QFE as well.
  
  Loading...
  
  Reply
  1. Fred S. says:
    
    September 18, 2021 at 8:46 am
    
    Thanks Curt for the follow- up!
    
    Loading...
Fred S. says:

August 25, 2021 at 4:54 pm

Hi Chris,

Well, I noticed at least one “regression” compared to the legacy PBI dataflow connector: Date-only records are imported as Datetime (i.e. records are all suffixed with 12:00:00 AM).

While adding an extra data type conversion at the M level within PBI desktop solves it, what’s the point? Also, I don’t remember if such conversion breaks folding or not.

I am therefore reverting all my dataflows to the legacy connector, which does not have this issue. I will wait for a more stable code base.

Loading...

Reply
1. Curt Hagenlocher says:
  
  September 14, 2021 at 7:26 pm
  
  We should be able to fix this for the October release. That said, if you’re just loading the data into some destination — whether a data model or another data flow — it should load correctly as a date because the column type is correct and loading is driven by the column type and not the cell types.
  
  Loading...
  
  Reply
2. Bank says:
  
  October 9, 2021 at 2:17 am
  
  Does the new connector show loading progress after clicking close & apply (i.e how many rows or mb that is loaded) ?
  
  Loading...
  
  Reply
Pingback: Connecting to a Dataflow by Name - Greyskull Analytics
Adolfo says:

September 13, 2021 at 6:57 pm

I agree with Fred S., the new connector is horribly slow. In my test case with about 1.2 million rows and about 50 columns, it’s bad to the point of unusable. Even the preview in Power Query takes ages to complete.

Loading...

Reply
Varun says:

September 20, 2021 at 2:07 pm

Thanks for the article.

I’ve experienced similar issues to some of the commenters, where loading data with the new Dataflows connector via PowerBI desktop does not even complete for me.

I have about 20 million rows and after 20 minutes of attempting to connect to PowerPlatformDataflows, I receive “An error happened while reading data from the provider: ‘Cannot write more bytes to the buffer than the configured maximum buffer size. 2147483647’. I attempted restricting the data with RangeStart and Range end parameters, but even one day of data times out.

Any help is much appreciated. Thank you

Loading...

Reply
Shuvro Mitra says:

September 20, 2021 at 8:09 pm

The error “Cannot write more bytes to the buffer than the configured maximum buffer size. 2147483647” is a known issue in the August release of Power BI Desktop and the fix will be available in the September release.

Loading...

Reply
garrett_the_fox says:

January 25, 2022 at 9:25 am

FYI the Lineage View works correctly in Power BI workspace when using the Power BI Dataflows connector. When using the Power Platofrm Dataflows connector, lineage view stops working correctly…Please fix Microsfot!

Loading...

Reply
1. Chris Webb says:
  
  January 25, 2022 at 11:45 am
  
  Yes, this is a known issue unfortunately. We don’t have a timeline for the fix yet.
  
  Loading...
  
  Reply
phil says:

February 1, 2022 at 8:38 am

Have you tried to use this new connector in pipelines ?

For me, it doesn’t change the connection between stages.
Same problem that the lineage ??

It was ok with the older one.

Loading...

Reply
1. Chris Webb says:
  
  February 1, 2022 at 12:56 pm
  
  I think this is another known issue with the new connector that we’re working on fixing.
  
  Loading...
  
  Reply
Robert says:

April 25, 2022 at 10:50 pm

The connector now seems useable in Excel by starting a blank query and typing =PowerPlatform.Dataflows()

However, it does not have any query folding. Anybody know when that will be resolved?

Loading...

Reply
Magnus Lander says:

June 30, 2022 at 2:10 pm

Seems like neither PowerBI.Dataflows() or PowerPlatform.Dataflows() can achieve query folding. PowerPlatform.Dataflows() is indeed a lot faster, but using PowerPlatform.Dataflows() in the Power BI service shows that any navigation steps (like choosing a workspace or dataflow) results in the query not being able to be folded.

Loading...

Reply
1. Chris Webb says:
  
  June 30, 2022 at 2:14 pm
  
  Don’t worry about the steps that choose a workspace or dataflow. I don’t even know if the step folding indicators work properly with this connector yet.
  
  Loading...
  
  Reply
Paul says:

August 5, 2022 at 9:01 am

Hi Chris, I’ve been experimenting with the dataflows connector with a large dataset (over 20million rows) to take advantage of query folding. Sometimes I get an error message part way through when importing the data. The error is:

“Failed to save modifications to the server. Error returned: ‘OLE DB or ODBC error: [DataSource.Error] An error happened while reading data from the provider: ‘Internal error DirectQueryUnexpectedEndOfStream.’.’.

With some queries where I’ve selected limited columns and narrower date ranges I’ve retried and it’s worked successfully. However, when I’ve tried to download a dataset with around 10 million rows and 30 columns it is failing every time. Are you familiar with this message from this connector and do you have an ideas how I can dig into what the problem is? I haven’t seen it with the standard Power BI dataflows connector before but we are keen to take advantage of the query folding.

Loading...

Reply
1. Chris Webb says:
  
  August 5, 2022 at 10:55 am
  
  This is a known issue that we’re working on fixing
  
  Loading...
  
  Reply
  1. Paul says:
    
    August 5, 2022 at 10:58 am
    
    Thanks for the reply Chris, that’s useful to know it’s working on being fixed. Do you have any rough timelines on how soon it will be resolved?
    
    Loading...
  2. Chris Webb says:
    
    August 5, 2022 at 3:11 pm
    
    I don’t, sorry
    
    Loading...
  3. Johan Schelin says:
    
    November 24, 2022 at 11:05 am
    
    Hi Chris! We are experiencing this error as well. Do you know if the issue is solved in MS end?
    
    Loading...
2. Alexis Echague says:
  
  November 24, 2022 at 7:38 pm
  
  Hi Chris! Thanks for the blog. I’m suffering the same ‘DirectQueryUnexpectedEndOfStream’ message that Paul mentioned. Even I was tracking it with a MS support team. Do you know any update about it? Or if there is some way to avoid it? I’ve detected that this issue occurs when a client (dataflow or power bi) tries to send “foldable” M steps. When there is nothing, the data refresh ok. Thanks!
  
  Loading...
  
  Reply
Phil says:

November 7, 2022 at 1:27 pm

Great article.
Now that dataflows has made it to Excel is there anyway to use queryfolding when joining a dataflow to a table of values entered by the user?

We have a dataflow with about 1.5 million records, this is imported to Excel and merged to a table in the Excel file populated by the user (inner join). It works but is very slow as the full dataset has to be downloaded before the merge limits the results to the small amount the user needs.

Loading...

Reply
1. Alexis Echague says:
  
  November 24, 2022 at 7:37 pm
  
  Hi Chris! Thanks for the blog. I’m suffering the same ‘DirectQueryUnexpectedEndOfStream’ message that Paul mentioned. Even I was tracking it with a MS support team. Do you know any update about it? Or if there is some way to avoid it? I’ve detected that this issue occurs when a client (dataflow or power bi) tries to send “foldable” M steps. When there is nothing, the data refresh ok. Thanks!
  
  Loading...
  
  Reply
Anonymous says:

May 27, 2024 at 2:17 pm

Hi Chris,
In the last screenshot the Query Diagnostic show Resource: “Microsoft.PowerBI.Dataflows….”. Are you sure this correspond to the new connector? When I do the Diagnostics thing I see Resource: “PowerPlatformDataflows”.

Anyway not sure what I did but suddenly my filtered rows step had the “Native Query” option enabled and I see the select with where clause and everything. Hope this confirmes the query folding is working.

Loading...

Reply