Speed Up Power BI Refresh By Increasing The Amount Of Memory On Your On-Premises Data Gateway Machine

If your Power BI dataset needs to connect to an on-premises data source it will need to connect via an On-Premises Data Gateway; what’s more, if you have a Power Query query that combines data from cloud and on-premises data sources, then Power BI needs to connect to all data sources used (even if they are cloud sources) via an On-Premises Data Gateway. And when Power BI connects to a data source via a gateway all the transformation work done by the Power Query engine takes place on the machine where the gateway is installed.

As a result of all this the specification of the machine where the gateway is installed has an impact on the performance of any dataset refreshes that use it. So how powerful does the machine with the gateway installed on it need to be? That’s a tough question because, as you can probably guess, it depends on a lot of different factors: how many datasets get refreshed in parallel, how often, how complex the transformations used are, if you’re using Import mode or DirectQuery, and so on. There’s a great docs article describing how to go about sizing your gateway machine here. Unsurprisingly, the more memory and CPU cores you have available the better refresh performance is likely to be and the more refreshes can take place in parallel.

There is one important thing to point out that is not obvious though: increasing the amount of memory on your gateway machine can improve refresh performance even if it doesn’t look like the machine is under memory or CPU pressure. This is because the total amount of memory made available for a single refresh is calculated relative to the overall amount of memory available on the gateway machine. I’ve written about how the Power Query engine uses memory a few times: this post describes how each refresh can use a fixed maximum amount of memory and how performance suffers if your refresh needs to use more; this post shows how increasing the amount of memory Power Query can use for a single refresh can increase refresh performance dramatically. In short, the more memory on your gateway machine the more memory is available for each individual refresh and – if the Power Query engine needs it, for example if you’re sorting, merging, pivoting/unpivoting, buffering or doing group-bys on large tables and query folding is not taking place – the faster each refresh will be.

You do have the option of changing some properties (listed here) on the gateway to try to influence this behaviour. However since the algorithms involved are not documented and may change at any time, not all the relevant properties are documented, and working out what the optimal settings are yourself is very difficult, I don’t recommend doing this. It’s a lot easier just to increase the amount of memory and CPU on the gateway machine and let the gateway work out how these resources should be used. I’m not saying that you should blindly increase your memory and CPU as much as you can, though – you should test to see what the impact on refresh performance is (the gateway logs will be useful here) and whether that impact is worth the extra cost.

Bonus tip: another easy way to improve gateway refresh performance is to enable the StreamBeforeRequestCompletes property on the gateway. It’s documented here and a few people (see here and here for example) have already blogged about how much this has helped them.

22 responses

  1. Hi Chris,

    When you say: “in fact, if any of the data sources your dataset uses is on-premises then Power BI will need to connect to all of the data sources (even if they are cloud-based sources) via an On-Premises Data Gateway”.

    Are you sure? We routinely have datasets that connect trough multiple sources: both on-prem and cloud and it doesn’t force us to configure the cloud based data sources on the gateway.

    E.g. : azure explorer where we configure the credentials directly on the dataset and SQL server which is on our gateway. It works.

  2. Hey Chris, this is a great post! On a slightly tangential topic, if we use an on-premise gateway to refresh datasets stored in a premium (gen2) capacity, does this reduce the load on the capacity in terms of utilization compared to not using a gateway?

    Thanks very much!


  3. Chris, have you ever interacted with the Microsoft VNET (“managed”) gateway? We use that to refresh some of our data that is hosted in Azure SQL. I would love to see a blog from you on that.

    It is still a “preview” feature as I understand. The feature is very (very, very) challenging to use.

    There are no dials or knobs on it. The gateway is always hosted in your PBI tenant (even if your capacity and data are elsewhere). They don’t give you visibility to see your own gateway logs. The error messages that are exposed in the portal are virtually meaningless. And the gateway frequently fails for random reasons (network and memory bugs) that we can’t investigate on our own without calling for PBI support.

    The upshot is that it is still “free”, (and that is one less server that we have to manage on-premises). I’m guessing that after GA they will try to recoup some of the investment… but at this rate, that might not happen for a year or two.

    Recently I heard they had increased the available memory on the VNET managed gateways, but they didn’t elaborate…. the memory issues in this version of the gateway are unpredictable. One week our refresh operations are fine and the next week a lot of them will fail for no apparent reason.

  4. Hi Chris,

    The way I understand it, using a gateway on a datasource always means that the data is transmitted through the gateway, regardless of the fact that it is needed or not.

    Is there a way to set up a gateway so that it keeps the credentials of the connection, but the data itself doesn’t go through the gateway infrastructure? We tested the vnet gateway, but we saw a drop of performance vs setting the connection directly on the dataset.

    We want to be able to stock the credentials somewhere so that the PowerBI Developers won’t need to have them, but we still want the data to go directly from an Azure Datasource straight to PowerBI without routing it trough another infrastructure.


    • That’s a reasonable thing to want to do, but sharing credentials for a data source is a security minefield for us. Even if you were happy with using a VNet gateway it wouldn’t solve the problem for developers in Power BI Desktop.

      • PowerBI developers could have the credentials for the DEV and QA environment, but not for the production env. Switching a parameter on the dataset in PowerBI service would allow them to use the production datasource without knowing the password.

        Can you explain what you mean by “sharing credneitlas for a data source is a security minefield for us.” ?

      • But Chris, surely the ‘sharing credentials for a data source’ part of this is a solved problem. Gateway does it today. Can’t we just divorce that bit from the ‘doing all the transformation work on the gateway’ bit?

    • That’s *exactly* what I want Antony. 99% of my data sources are cloud, so I really don’t need the ‘punch down to our Data Centre’ part of gateway, but I love the ‘store credentials securely’ part of gateway.

      • Happy to see that I’m not alone with this needm Sam! I don’t understand why this is not available.

  5. Would it be possible to get a separate blog from you about “StreamBeforeRequestComplete”?
    I’m interested in how this changes the behavior of queries (ie. possible unintended consequences), and whether we can expect to see a change in every query that currently spools to the “spooler” directory.

    I am a bit nervous about using it. The config file lists this as an “internal setting” (do not change).

    That said, it sounds like a very welcome feature, and it seems like one that all PQ developers should understand. I asked the question here as well:


Leave a Reply to antonyzblog Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: