Gateways And Measuring Power Query CPU Usage During Power BI Dataset Refresh

After last week’s post on measuring Power Query CPU usage during dataset refresh, someone asked an obvious question that I should have addressed: does using a gateway change anything? After all, if you’re using a gateway to connect to an on-premises data source then all the Power Query queries transforming the data from that source will be executed on the gateway machine and not in the Power BI Service.

Let’s do a quick test to find out. I couldn’t use the same Power Query query I used in last week’s post (it turns out you can’t force the use of a gateway when there isn’t an external data source) so instead I used another dataset that connects to a large CSV stored in ADLSgen2 storage and does a group by operation – something which is guaranteed to be very expensive in terms of CPU for Power Query.

Here’s what Profiler shows for the refresh operation when no gateway is used:

The refresh took around 30 seconds and used around 44 seconds of CPU time.

Here’s what Profiler shows when the refresh does use a gateway:

The refresh takes a lot longer, around 103 seconds (as you would expect – instead of loading the data from ADLSgen2 storage in the cloud to the Power BI Service, it has to take a round trip via the gateway on my PC) but the important thing is that the CPU time is now very low – 141 milliseconds.

So, as you might expect, the CPU time for refreshes that use an on-premises data gateway is not shown in Profiler traces because, as I said, all the work done by the Power Query engine is done on the gateway machine and not in the Power BI Service. Making refreshes use a gateway, even when you don’t need to, can be a way of taking load off a Power BI Premium capacity if it’s overloaded.

This in turn raises the question of how you measure Power Query CPU usage on a gateway? As far as I know it isn’t possible for individual Power Query queries (I could be wrong though), although the gateway logs do allow you to capture CPU usage for the whole machine. Better gateway monitoring tools are on the way but this seems like a good time to mention my colleague Rui Romano’s open source gateway monitoring solution (article | repo) which makes understanding the gateway logs a lot easier.

3 responses

  1. I’ve been looking for some improved monitoring of the ADF gateways. I’m going to see what can be stolen from Rui Romano’s solution

  2. Pingback: Gateways and the CPU Cost of Power BI Dataset Refresh – Curated SQL

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: