Why Does Power BI Query My Data Source More Than Once?

Chris Webb Get & Transform, M, Power BI, Power Query, Uncategorized October 13, 2019October 15, 2019 5 Minutes

This week I was honoured to be a guest on the Guy In A Cube channel, and for my topic I decided to tackle one of life’s eternal questions: why does Power BI query my data source more than once when I refresh my dataset?

You can watch the video here:

Although I’ve never answered this question directly in a blog post before, nevertheless almost every technique I showed in the video is something I’ve blogged about so I thought it would be useful to collect all the links to these posts in one place to provide some background to what I show in the video.

The first thing to say is that this is really a Power Query question, not just a Power BI question; a lot of what I show will therefore be relevant to Excel’s Get&Transform, Power BI dataflows and every other manifestation of Power Query out there, although I can’t guarantee that all the options and behaviour will be the same in these places.

If you’re developing in Power BI Desktop and you think that refresh is taking a long time, you should definitely check whether the Power Query engine is hitting your data source more than once. There are lots of ways to do this. Some data sources have tools that show when they are queried, such as the Run History screen in Microsoft Flow that I show in the video or SQL Server Profiler. Other ways include using Fiddler for web services or Process Monitor for files.

Next, if you find that Power Query is indeed querying your data source multiple times, the most common explanation is that it’s doing a “background refresh”. This happens by default when you refresh a dataset and is Power Query refreshing all the data previews that you see when you click on a step in the Power Query Editor window. In most cases this is so quick you don’t notice it but when you have lots of queries with lots of steps it can be a gigantic problem. You can turn it off in the Options dialog in Power BI Desktop, as shown in the video, or in the Excel Query Options dialog. I blogged about this issue here.

Another possible reason is the data privacy settings you have applied. This is a gigantic, complex topic and something that I and other people have blogged about many times. I wrote a multi-part series of posts explaining data privacy settings starting here and there’s also a great explanation of what’s going on in behind the scenes written by the dev team that I link to here; I also wrote a post here showing an example of the performance impact of data privacy checks. As I say in the post, however, do not play around with these settings unless you really know what you’re doing.

If you’re ok with writing some M code, using the technique I blogged about here to implement basic query folding on a web data source can be important – in the example in the video, it was doing this that reduced the number of calls to the web service from six to three.

Some transformations can also lead to data being read from a data source more than once. For example, in this post I showed how a merge transformation leads to multiple reads from the same Excel file and how to stop this happening and drastically improve performance.

Moving on, another important lesson to learn is how referenced queries are evaluated. This is one of the most counter-intuitive things about Power Query! With the set of referenced queries used in my demo:

…where three queries called Referenced Query 1, Referenced Query 2 and Referenced Query 3 each reference a query called Call Web Service and are loaded into a dataset (and where Call Web Service is not loaded into the dataset), most people would assume that when a refresh takes place the following happens:

The Call Web Service query is run, getting the data from the web service
The data returned is then passed to Referenced Query 1, Referenced Query 2 and Referenced Query 3

This is wrong. In fact what happens is this:

Referenced Query 1 is refreshed. Because it references the Call Web Service query, that means the Call Web Service query is also evaluated.
Referenced Query 2 is refreshed. Because it references the Call Web Service query, that means the Call Web Service query is also evaluated.
Referenced Query 3 is refreshed. Because it references the Call Web Service query, that means the Call Web Service query is also evaluated.

This explains why the Call Web Service query is still being evaluated three times once the M code above has been added to it. You can read a detailed explanation of how referenced queries are evaluated here.

There is something we can do to help here though: the Power Query engine can cache the results returned from some types of data source, such as web services, but to take full advantage of this caching in this particular scenario you have to turn off the Enable Parallel Loading Of Tables option on the Options dialog as I show here. With that done, each of the three referenced queries run one after the other, instead of in parallel. When the first of these queries runs its calls the Call Web Service query, which gets data from the web service; when the next two queries run, though, and they trigger two more evaluations of Call Web Service, the data for the call to the web service has been cached. This caching can be turned off using the IsRetry option in Web.Contents as Curt Hagenlocher explains in a comment here.

The final thing shown in the video is that the infamous Table.Buffer M function does not improve performance in this case and in fact makes performance worse. If you use Table.Buffer inside the Call Web Service query it will indeed buffer the result of the call to the web service into memory, but the data that is buffered cannot be shared between separate query executions – so the buffering happens three times, once for each time Call Web Service is evaluated by the three referenced queries. Again this is something discussed in this post. Of course there are plenty of other scenarios where Table.Buffer will help performance, but these will only be in cases where the same query requests data from the same data source more than once.

Phew, that was a lot. I don’t pretend to know everything about when and why Power Query gets data from a data source multiple times but this is the sum total of my knowledge right now. I hope it’s useful!

[Update: it’s just been pointed out to me, quite rightly, that the names of my queries are confusing. For example, “Referenced Query 1” is the query doing the referencing, not the query being referenced – which is Call Web Service. I haven’t changed the names because I want the blog post to remain consistent with the video, and I hope the diagrams and the context make everything clear]

Published by Chris Webb

My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel. View all posts by Chris Webb

Published October 13, 2019October 15, 2019

25 thoughts on “Why Does Power BI Query My Data Source More Than Once?”

sam says:

October 14, 2019 at 10:53 am

Hi Chris,
Will it help if in the First step of a Reference Query we say
let
Source = Table.Buffer(DataFromCSVFile),
Next Steps,

in
NextSteps

Loading...

Reply
1. Chris Webb says:
  
  October 15, 2019 at 9:46 am
  
  No, it won’t.
  
  Loading...
  
  Reply
Parker Stevens says:

October 14, 2019 at 12:40 pm

Thanks Chris! Great stuff. Will switching off the “enable parallel loading of tables” in PBI Desktop carry over to PBI Service as well? I was under the impression that it does not.

Loading...

Reply
1. Chris Webb says:
  
  October 15, 2019 at 9:45 am
  
  You’re right, it does not carry over to the Service so it’s only useful while developing. I should have made that clearer, sorry.
  
  Loading...
  
  Reply
  1. Serg says:
    
    June 23, 2020 at 6:00 am
    
    Hey Chris
    How does this behave in Service?
    
    Loading...
  2. Henry Bruce says:
    
    July 1, 2021 at 10:53 pm
    
    While I can’t find any documentation for this, as of today, the Power BI Service does appear to respect the Enable Parallel Loading Of Tables option. I have a Python script as a data source (running through a gateway in personal mode from the service that makes a series of REST calls to authenticate & return data) that is executed multiple times when the option is checked, but only executed once when unchecked. And with strategic placement of some lengthy InvokeAfter delays (as suggested by Chris in https://blog.crossjoin.co.uk/2019/03/26/power-bi-caching-parallelism-and-power-query-refresh-performance/) I can ensure that subsequent queries referencing this data source get their data from the persistent cache. Hope this helps others stuck with this thorny issue!
    
    Loading...
iman yu says:

October 14, 2019 at 7:57 pm

Hello, great post!
I have a fundamental question. The ‘referenced query’ in the scenario here, is it the one that we create by going to query editor and right click to select ‘reference’ (the other option is to duplicate).
Is that what you mean by referenced query?

Thanks,
Iman

Loading...

Reply
1. Chris Webb says:
  
  October 15, 2019 at 9:44 am
  
  Hi Iman, thanks for pointing that out – the naming I used was confusing and I’ll update the post appropriately (although I won’t change the names so they stay consistent with the video). “Referenced Query 1” and the other two similarly-named queries were created by right-clicking on Call Web Service and selecting Reference. So really the aren’t the queries that are referenced, they are the queries doing the referencing.
  
  Loading...
  
  Reply
Pingback: Introduction To M In Power Query And Power BI « Chris Webb's BI Blog
Pingback: When Power Query Hits Data Sources Repeatedly – Curated SQL
Rajesh says:

December 20, 2019 at 8:38 pm

Hi Chris, I’m referencing Query 4 to 5 times.. I want to reduce the number of hits to the server, so that my refresh time will reduce.. Please help with M code

Loading...

Reply
Brian says:

April 28, 2020 at 8:16 pm

I am having the same multi pull problem in Excel Power Query. Is there features to disable Parallel Loading of tables?

Loading...

Reply
Burhan Qazi says:

June 17, 2020 at 10:05 am

Hi,
my issue not solved yet. i have a simple recon queries on my dashboard. i have set all your properties mentioned in this video but still my queries run 3 times. i am using hive odbc and power bi desktop.

Loading...

Reply
Serg says:

June 23, 2020 at 6:18 am

Hi Chris
I am loading 6 csv files from SharePoint (2m rows each) as separate queries. Data load disabled for these queries. Then I aggregate these datasets and append (union) them into a new one (Table.Combine()). In this example, would every of these 6 queries be run twice in Power BI Service?

Loading...

Reply
Smauro says:

June 26, 2020 at 10:21 am

Hi Chris,

I’ve noticed that when using Table.Buffer() you’ve got 6 calls and say you don’t know why. You must know though since it’s kinda implied in your later explanation.
For anyone wondering, it’s simple:
You’ve got the no caching option in the same query, so buffering the table actually calls it again to buffer it this time = two calls per query * 3 queries = 6

Cheers

Loading...

Reply
lbendlin says:

June 28, 2020 at 2:13 am

Chris,
is any of that repeated querying visible in the Diagnostics data, or can it only be measured via Fiddler?

Loading...

Reply
Raghav G says:

July 22, 2020 at 5:17 pm

Chris,
Thanks for sharing and it is informative. Just have one question here.We are having Power BI Report server which is On-premise and not service. The settings you shown in Power BI Desktop (Parallel load and Unchecking Preview data) will carry forward to Report server (or) Will there be same settings at Report server level.?

Loading...

Reply
Vinícius Mantoan Ribeiro da Costa says:

June 14, 2021 at 10:50 pm

Hi Chris.

I see some duplicated query when I use ODBC.QUERY.
Is it a “expected” behavior? Any idea of why it happens?

Thanks.
Vinícius Mantoan

Loading...

Reply
1. Chris Webb says:
  
  June 15, 2021 at 4:08 pm
  
  Probably, yes. See https://blog.crossjoin.co.uk/2020/07/05/why-is-power-bi-running-my-sql-query-twice/
  
  Loading...
  
  Reply
Andy says:

July 3, 2021 at 9:44 am

Thanks Chris. This is useful. Question. Is there any future plan that Power BI can automatically halt the dependents while it is executing (something like making sure it is singleton blocking call)? This will be very helpful to ensure we have the advantages of parallelism and no duplicate call at the same time. More importantly, I would think customers will be much happier with faster Power BI without configuration at first impression. 🙂

Loading...

Reply
1. Chris Webb says:
  
  July 5, 2021 at 5:26 pm
  
  There are no plans for this, sorry.
  
  Loading...
  
  Reply
  1. Laurent says:
    
    July 3, 2022 at 5:02 pm
    
    This is a show stopper for me and I don’t understand why it’s not fixed. I thought power bi followed the same approach as excel (i.e. a “source” cell is not re-evaluated once)
    
    Loading...
Pritam Suresh Musale says:

October 13, 2022 at 12:27 pm

I am facing same issue in Power BI dataflow. How to resolve it?

Loading...

Reply
CTO says:

August 16, 2023 at 10:17 pm

How do you pass multiple columns from your selected columns

Loading...

Reply
Guido Leenders says:

June 19, 2024 at 7:49 pm

Another multi download of the same dataset, even with a single dataset defined, occurs specifically on Excel with Power Query and OData. The same M-code only downloads the dataset only once. It seems to be bound to more Excel Power Query versions after 2019.

Power Query on Excel first start downloading a few thousand rows (mileage varies, might be time based) and then pauses downloading. It starts a second connection to the same URL and retrieves ALL rows.

Then it continues retrieving the remaining rows from the first download.

The reason for the second download being started say after 7 seconds is unknown. It does not look like to be sampling, because no limit is placed on the result set.

Both downloads have all the same HTTP-headers and URL.

Please ensure that your OData-producer can handle out-of-order requests (not in our case currently 🙂

Loading...

Reply