Speed Up Power BI Refresh By Increasing The Amount Of Memory On Your On-Premises Data Gateway Machine

Chris Webb On Premises Data Gateway, Power BI, Power Query February 13, 2022February 13, 2022 2 Minutes

If your Power BI dataset needs to connect to an on-premises data source it will need to connect via an On-Premises Data Gateway; what’s more, if you have a Power Query query that combines data from cloud and on-premises data sources, then Power BI needs to connect to all data sources used (even if they are cloud sources) via an On-Premises Data Gateway. And when Power BI connects to a data source via a gateway all the transformation work done by the Power Query engine takes place on the machine where the gateway is installed.

As a result of all this the specification of the machine where the gateway is installed has an impact on the performance of any dataset refreshes that use it. So how powerful does the machine with the gateway installed on it need to be? That’s a tough question because, as you can probably guess, it depends on a lot of different factors: how many datasets get refreshed in parallel, how often, how complex the transformations used are, if you’re using Import mode or DirectQuery, and so on. There’s a great docs article describing how to go about sizing your gateway machine here. Unsurprisingly, the more memory and CPU cores you have available the better refresh performance is likely to be and the more refreshes can take place in parallel.

There is one important thing to point out that is not obvious though: increasing the amount of memory on your gateway machine can improve refresh performance even if it doesn’t look like the machine is under memory or CPU pressure. This is because the total amount of memory made available for a single refresh is calculated relative to the overall amount of memory available on the gateway machine. I’ve written about how the Power Query engine uses memory a few times: this post describes how each refresh can use a fixed maximum amount of memory and how performance suffers if your refresh needs to use more; this post shows how increasing the amount of memory Power Query can use for a single refresh can increase refresh performance dramatically. In short, the more memory on your gateway machine the more memory is available for each individual refresh and – if the Power Query engine needs it, for example if you’re sorting, merging, pivoting/unpivoting, buffering or doing group-bys on large tables and query folding is not taking place – the faster each refresh will be.

You do have the option of changing some properties (listed here) on the gateway to try to influence this behaviour. However since the algorithms involved are not documented and may change at any time, not all the relevant properties are documented, and working out what the optimal settings are yourself is very difficult, I don’t recommend doing this. It’s a lot easier just to increase the amount of memory and CPU on the gateway machine and let the gateway work out how these resources should be used. I’m not saying that you should blindly increase your memory and CPU as much as you can, though – you should test to see what the impact on refresh performance is (the gateway logs will be useful here) and whether that impact is worth the extra cost.

Bonus tip: another easy way to improve gateway refresh performance is to enable the StreamBeforeRequestCompletes property on the gateway. It’s documented here and a few people (see here and here for example) have already blogged about how much this has helped them.

Published by Chris Webb

My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel. View all posts by Chris Webb

Published February 13, 2022February 13, 2022

27 thoughts on “Speed Up Power BI Refresh By Increasing The Amount Of Memory On Your On-Premises Data Gateway Machine”

Antony says:

February 13, 2022 at 3:27 pm

Hi Chris,

When you say: “in fact, if any of the data sources your dataset uses is on-premises then Power BI will need to connect to all of the data sources (even if they are cloud-based sources) via an On-Premises Data Gateway”.

Are you sure? We routinely have datasets that connect trough multiple sources: both on-prem and cloud and it doesn’t force us to configure the cloud based data sources on the gateway.

E.g. : azure explorer where we configure the credentials directly on the dataset and SQL server which is on our gateway. It works.

Loading...

Reply
1. lbendlin says:
  
  February 13, 2022 at 3:59 pm
  
  What Chris meant to say was ” if you have merge operations between on-prem and cloud data sources”
  
  Loading...
  
  Reply
2. Chris Webb says:
  
  February 13, 2022 at 4:38 pm
  
  You’re correct – sorry, I wasn’t clear, I meant when you have queries that combine data from cloud and on-prem sources. I’ve updated the post.
  
  Loading...
  
  Reply
  1. antonyzblog says:
    
    February 13, 2022 at 8:47 pm
    
    Thanks for the information, I understand better now.
    
    Loading...
lbendlin says:

February 13, 2022 at 4:01 pm

I don’t see this one mentioned anywhere – what is it about?

False

Loading...

Reply
1. lbendlin says:
  
  February 13, 2022 at 4:02 pm
  
  Damn you autocorrect
  
  setting name=”EnableFastCombine” serializeAs=”String”
  value False /value
  /setting
  
  Loading...
  
  Reply
2. lbendlin says:
  
  February 13, 2022 at 4:06 pm
  
  setting name=”EnableFastCombine”
  
  Loading...
  
  Reply
  1. Chris Webb says:
    
    February 13, 2022 at 4:39 pm
    
    It turns off data privacy checks, but this is only supported in the Personal Gateway.
    
    Loading...
Alex Groberman says:

February 13, 2022 at 5:18 pm

Hey Chris, this is a great post! On a slightly tangential topic, if we use an on-premise gateway to refresh datasets stored in a premium (gen2) capacity, does this reduce the load on the capacity in terms of utilization compared to not using a gateway?

Thanks very much!

-Alex

Loading...

Reply
1. Chris Webb says:
  
  February 13, 2022 at 5:20 pm
  
  Good question: yes, it should, but how much will depend on how expensive the transformations you’re doing are.
  
  Loading...
  
  Reply
2. lbendlin says:
  
  February 13, 2022 at 5:51 pm
  
  Don’t forget that you have already paid for the capacity. Might as well use it to the fullest extent. Transferring load to a gateway is like paying for a gym membership…
  
  Loading...
  
  Reply
David Beavon says:

February 14, 2022 at 2:11 pm

Chris, have you ever interacted with the Microsoft VNET (“managed”) gateway? We use that to refresh some of our data that is hosted in Azure SQL. I would love to see a blog from you on that.

It is still a “preview” feature as I understand. The feature is very (very, very) challenging to use.

There are no dials or knobs on it. The gateway is always hosted in your PBI tenant (even if your capacity and data are elsewhere). They don’t give you visibility to see your own gateway logs. The error messages that are exposed in the portal are virtually meaningless. And the gateway frequently fails for random reasons (network and memory bugs) that we can’t investigate on our own without calling for PBI support.

The upshot is that it is still “free”, (and that is one less server that we have to manage on-premises). I’m guessing that after GA they will try to recoup some of the investment… but at this rate, that might not happen for a year or two.

Recently I heard they had increased the available memory on the VNET managed gateways, but they didn’t elaborate…. the memory issues in this version of the gateway are unpredictable. One week our refresh operations are fine and the next week a lot of them will fail for no apparent reason.

Loading...

Reply
1. antonyzblog says:
  
  February 14, 2022 at 3:58 pm
  
  I second that!
  
  Loading...
  
  Reply
  1. Chris Webb says:
    
    February 14, 2022 at 4:05 pm
    
    I think a lot of the things you describe about the VNet gateway are related to it being a preview feature still – it needs some more work before it’s ready for GA. I’m not an expert in Azure networking though so I can’t really comment on it or what it does, unfortunately.
    
    Loading...
antonyzblog says:

February 14, 2022 at 3:56 pm

Hi Chris,

The way I understand it, using a gateway on a datasource always means that the data is transmitted through the gateway, regardless of the fact that it is needed or not.

Is there a way to set up a gateway so that it keeps the credentials of the connection, but the data itself doesn’t go through the gateway infrastructure? We tested the vnet gateway, but we saw a drop of performance vs setting the connection directly on the dataset.

We want to be able to stock the credentials somewhere so that the PowerBI Developers won’t need to have them, but we still want the data to go directly from an Azure Datasource straight to PowerBI without routing it trough another infrastructure.

Thanks,
Antony

Loading...

Reply
1. Chris Webb says:
  
  February 14, 2022 at 4:07 pm
  
  That’s a reasonable thing to want to do, but sharing credentials for a data source is a security minefield for us. Even if you were happy with using a VNet gateway it wouldn’t solve the problem for developers in Power BI Desktop.
  
  Loading...
  
  Reply
  1. Antony says:
    
    February 14, 2022 at 6:30 pm
    
    PowerBI developers could have the credentials for the DEV and QA environment, but not for the production env. Switching a parameter on the dataset in PowerBI service would allow them to use the production datasource without knowing the password.
    
    Can you explain what you mean by “sharing credneitlas for a data source is a security minefield for us.” ?
    
    Loading...
  2. Sam Loud says:
    
    February 15, 2022 at 10:29 am
    
    But Chris, surely the ‘sharing credentials for a data source’ part of this is a solved problem. Gateway does it today. Can’t we just divorce that bit from the ‘doing all the transformation work on the gateway’ bit?
    
    Loading...
2. Sam Loud says:
  
  February 15, 2022 at 10:27 am
  
  That’s *exactly* what I want Antony. 99% of my data sources are cloud, so I really don’t need the ‘punch down to our Data Centre’ part of gateway, but I love the ‘store credentials securely’ part of gateway.
  
  Loading...
  
  Reply
  1. Antony says:
    
    February 15, 2022 at 2:43 pm
    
    Happy to see that I’m not alone with this needm Sam! I don’t understand why this is not available.
    
    Loading...
David Beavon says:

May 16, 2022 at 11:19 pm

Would it be possible to get a separate blog from you about “StreamBeforeRequestComplete”?
I’m interested in how this changes the behavior of queries (ie. possible unintended consequences), and whether we can expect to see a change in every query that currently spools to the “spooler” directory.

I am a bit nervous about using it. The config file lists this as an “internal setting” (do not change).

That said, it sounds like a very welcome feature, and it seems like one that all PQ developers should understand. I asked the question here as well:

https://community.powerbi.com/t5/Power-Query/StreamBeforeRequestCompletes-Guidance-for-Mashup-Container/m-p/2518807#M76355

Loading...

Reply
1. Chris Webb says:
  
  May 17, 2022 at 12:56 pm
  
  I’ll add it to my list! It is called out in the docs as something you can change: https://docs.microsoft.com/en-us/data-integration/gateway/service-gateway-performance#optimize-performance-by-streaming-data So I don’t think you need to be too worried about the “internal setting” warning. However, you’re right, it may have some disadvantages as well as advantages. I’ll see what I can find out.
  
  Loading...
  
  Reply
Sam says:

October 25, 2022 at 5:35 pm

Hi Chris, Thanks a lot for this amazing article. I think most people don’t give much attention to the gateway hardware configuration, which is crucial for data refresh and query performance. It would be great if you write an article about hardware sizing the gateway server. For instance, our IT just want follow the base recommendation in Microsoft website ( for example: 8 GB memory) but we have more than 50 models (import ) on our P3 capacity with overall size of 130 GB that refreshes 3 to 5 times a week.

Loading...

Reply
1. lbendlin says:
  
  October 25, 2022 at 7:54 pm
  
  You will have to check the traffic on your gateways. The requirements for Direct Query and Import Mode are very different. Import Mode requires lots of memory, and sometimes lots of disk space. Direct Query can live with much lower specced setups (like what MS recommends).
  
  Whatever you do, don’t skimp on the network interface. Make that as fast as you can. 10 Gbps or better. It seems to make the biggest difference in both scenarios.
  
  Loading...
  
  Reply
  1. Sam says:
    
    October 25, 2022 at 10:33 pm
    
    Thanks for the recommendation, I will request for better network interface. Also, I am pushing for at least 32 gb ram since all our models are import mode.
    
    Loading...
MagnumBI says:

November 25, 2022 at 5:52 pm

Good evening Mr Web,
my name is Leonard, I wanted to know if you have any news regarding the mysterious parameter to disable spooling and activate streaming via gateway.
We are indeed having frequent problems updating datasets and dataflows via the gateway.
We are often receiving disk space shortage errors from our system administrators and have tried setting the StreamBeforeRequestCompletes parameter on the gw configuration file.
Unfortunately after restarting the gw we noticed that the gw did not change the behaviour and continued to create temporary files in the spooler folder clogging up the disk memory.

Could we have more information about this? How come this setting is not by default? Why does gw also spooling in addition to creating other temporary files for local transformations?
Could it be a bug due to our gw having a somewhat outdated version (December 2021)?
We did something wrong during the settings and reboot even though we followed the documentation: https://learn.microsoft.com/en-us/data-integration/gateway/service-gateway-configure-disk-space

thanks

Loading...

Reply
Pingback: Why load testing Power BI is important