Speed Up Power BI Refresh By Increasing The Amount Of Memory On Your On-Premises Data Gateway Machine

If your Power BI dataset needs to connect to an on-premises data source it will need to connect via an On-Premises Data Gateway; what’s more, if you have a Power Query query that combines data from cloud and on-premises data sources, then Power BI needs to connect to all data sources used (even if they are cloud sources) via an On-Premises Data Gateway. And when Power BI connects to a data source via a gateway all the transformation work done by the Power Query engine takes place on the machine where the gateway is installed.

As a result of all this the specification of the machine where the gateway is installed has an impact on the performance of any dataset refreshes that use it. So how powerful does the machine with the gateway installed on it need to be? That’s a tough question because, as you can probably guess, it depends on a lot of different factors: how many datasets get refreshed in parallel, how often, how complex the transformations used are, if you’re using Import mode or DirectQuery, and so on. There’s a great docs article describing how to go about sizing your gateway machine here. Unsurprisingly, the more memory and CPU cores you have available the better refresh performance is likely to be and the more refreshes can take place in parallel.

There is one important thing to point out that is not obvious though: increasing the amount of memory on your gateway machine can improve refresh performance even if it doesn’t look like the machine is under memory or CPU pressure. This is because the total amount of memory made available for a single refresh is calculated relative to the overall amount of memory available on the gateway machine. I’ve written about how the Power Query engine uses memory a few times: this post describes how each refresh can use a fixed maximum amount of memory and how performance suffers if your refresh needs to use more; this post shows how increasing the amount of memory Power Query can use for a single refresh can increase refresh performance dramatically. In short, the more memory on your gateway machine the more memory is available for each individual refresh and – if the Power Query engine needs it, for example if you’re sorting, merging, pivoting/unpivoting, buffering or doing group-bys on large tables and query folding is not taking place – the faster each refresh will be.

You do have the option of changing some properties (listed here) on the gateway to try to influence this behaviour. However since the algorithms involved are not documented and may change at any time, not all the relevant properties are documented, and working out what the optimal settings are yourself is very difficult, I don’t recommend doing this. It’s a lot easier just to increase the amount of memory and CPU on the gateway machine and let the gateway work out how these resources should be used. I’m not saying that you should blindly increase your memory and CPU as much as you can, though – you should test to see what the impact on refresh performance is (the gateway logs will be useful here) and whether that impact is worth the extra cost.

Bonus tip: another easy way to improve gateway refresh performance is to enable the StreamBeforeRequestCompletes property on the gateway. It’s documented here and a few people (see here and here for example) have already blogged about how much this has helped them.

27 thoughts on “Speed Up Power BI Refresh By Increasing The Amount Of Memory On Your On-Premises Data Gateway Machine

  1. Hi Chris,

    When you say: “in fact, if any of the data sources your dataset uses is on-premises then Power BI will need to connect to all of the data sources (even if they are cloud-based sources) via an On-Premises Data Gateway”.

    Are you sure? We routinely have datasets that connect trough multiple sources: both on-prem and cloud and it doesn’t force us to configure the cloud based data sources on the gateway.

    E.g. : azure explorer where we configure the credentials directly on the dataset and SQL server which is on our gateway. It works.

    1. You’re correct – sorry, I wasn’t clear, I meant when you have queries that combine data from cloud and on-prem sources. I’ve updated the post.

  2. Hey Chris, this is a great post! On a slightly tangential topic, if we use an on-premise gateway to refresh datasets stored in a premium (gen2) capacity, does this reduce the load on the capacity in terms of utilization compared to not using a gateway?

    Thanks very much!

    -Alex

    1. Don’t forget that you have already paid for the capacity. Might as well use it to the fullest extent. Transferring load to a gateway is like paying for a gym membership…

  3. Chris, have you ever interacted with the Microsoft VNET (“managed”) gateway? We use that to refresh some of our data that is hosted in Azure SQL. I would love to see a blog from you on that.

    It is still a “preview” feature as I understand. The feature is very (very, very) challenging to use.

    There are no dials or knobs on it. The gateway is always hosted in your PBI tenant (even if your capacity and data are elsewhere). They don’t give you visibility to see your own gateway logs. The error messages that are exposed in the portal are virtually meaningless. And the gateway frequently fails for random reasons (network and memory bugs) that we can’t investigate on our own without calling for PBI support.

    The upshot is that it is still “free”, (and that is one less server that we have to manage on-premises). I’m guessing that after GA they will try to recoup some of the investment… but at this rate, that might not happen for a year or two.

    Recently I heard they had increased the available memory on the VNET managed gateways, but they didn’t elaborate…. the memory issues in this version of the gateway are unpredictable. One week our refresh operations are fine and the next week a lot of them will fail for no apparent reason.

      1. I think a lot of the things you describe about the VNet gateway are related to it being a preview feature still – it needs some more work before it’s ready for GA. I’m not an expert in Azure networking though so I can’t really comment on it or what it does, unfortunately.

  4. Hi Chris,

    The way I understand it, using a gateway on a datasource always means that the data is transmitted through the gateway, regardless of the fact that it is needed or not.

    Is there a way to set up a gateway so that it keeps the credentials of the connection, but the data itself doesn’t go through the gateway infrastructure? We tested the vnet gateway, but we saw a drop of performance vs setting the connection directly on the dataset.

    We want to be able to stock the credentials somewhere so that the PowerBI Developers won’t need to have them, but we still want the data to go directly from an Azure Datasource straight to PowerBI without routing it trough another infrastructure.

    Thanks,
    Antony

    1. That’s a reasonable thing to want to do, but sharing credentials for a data source is a security minefield for us. Even if you were happy with using a VNet gateway it wouldn’t solve the problem for developers in Power BI Desktop.

      1. PowerBI developers could have the credentials for the DEV and QA environment, but not for the production env. Switching a parameter on the dataset in PowerBI service would allow them to use the production datasource without knowing the password.

        Can you explain what you mean by “sharing credneitlas for a data source is a security minefield for us.” ?

      2. But Chris, surely the ‘sharing credentials for a data source’ part of this is a solved problem. Gateway does it today. Can’t we just divorce that bit from the ‘doing all the transformation work on the gateway’ bit?

    2. That’s *exactly* what I want Antony. 99% of my data sources are cloud, so I really don’t need the ‘punch down to our Data Centre’ part of gateway, but I love the ‘store credentials securely’ part of gateway.

      1. Happy to see that I’m not alone with this needm Sam! I don’t understand why this is not available.

  5. Would it be possible to get a separate blog from you about “StreamBeforeRequestComplete”?
    I’m interested in how this changes the behavior of queries (ie. possible unintended consequences), and whether we can expect to see a change in every query that currently spools to the “spooler” directory.

    I am a bit nervous about using it. The config file lists this as an “internal setting” (do not change).

    That said, it sounds like a very welcome feature, and it seems like one that all PQ developers should understand. I asked the question here as well:

    https://community.powerbi.com/t5/Power-Query/StreamBeforeRequestCompletes-Guidance-for-Mashup-Container/m-p/2518807#M76355

  6. Hi Chris, Thanks a lot for this amazing article. I think most people don’t give much attention to the gateway hardware configuration, which is crucial for data refresh and query performance. It would be great if you write an article about hardware sizing the gateway server. For instance, our IT just want follow the base recommendation in Microsoft website ( for example: 8 GB memory) but we have more than 50 models (import ) on our P3 capacity with overall size of 130 GB that refreshes 3 to 5 times a week.

    1. You will have to check the traffic on your gateways. The requirements for Direct Query and Import Mode are very different. Import Mode requires lots of memory, and sometimes lots of disk space. Direct Query can live with much lower specced setups (like what MS recommends).

      Whatever you do, don’t skimp on the network interface. Make that as fast as you can. 10 Gbps or better. It seems to make the biggest difference in both scenarios.

      1. Thanks for the recommendation, I will request for better network interface. Also, I am pushing for at least 32 gb ram since all our models are import mode.

  7. Good evening Mr Web,
    my name is Leonard, I wanted to know if you have any news regarding the mysterious parameter to disable spooling and activate streaming via gateway.
    We are indeed having frequent problems updating datasets and dataflows via the gateway.
    We are often receiving disk space shortage errors from our system administrators and have tried setting the StreamBeforeRequestCompletes parameter on the gw configuration file.
    Unfortunately after restarting the gw we noticed that the gw did not change the behaviour and continued to create temporary files in the spooler folder clogging up the disk memory.

    Could we have more information about this? How come this setting is not by default? Why does gw also spooling in addition to creating other temporary files for local transformations?
    Could it be a bug due to our gw having a somewhat outdated version (December 2021)?
    We did something wrong during the settings and reboot even though we followed the documentation: https://learn.microsoft.com/en-us/data-integration/gateway/service-gateway-configure-disk-space

    thanks

Leave a Reply to lbendlinCancel reply