Why Does Power BI Query My Data Source More Than Once?

This week I was honoured to be a guest on the Guy In A Cube channel, and for my topic I decided to tackle one of life’s eternal questions: why does Power BI query my data source more than once when I refresh my dataset?

You can watch the video here:

Although I’ve never answered this question directly in a blog post before, nevertheless almost every technique I showed in the video is something I’ve blogged about so I thought it would be useful to collect all the links to these posts in one place to provide some background to what I show in the video.

The first thing to say is that this is really a Power Query question, not just a Power BI question; a lot of what I show will therefore be relevant to Excel’s Get&Transform, Power BI dataflows and every other manifestation of Power Query out there, although I can’t guarantee that all the options and behaviour will be the same in these places.

If you’re developing in Power BI Desktop and you think that refresh is taking a long time, you should definitely check whether the Power Query engine is hitting your data source more than once. There are lots of ways to do this. Some data sources have tools that show when they are queried, such as the Run History screen in Microsoft Flow that I show in the video or SQL Server Profiler. Other ways include using Fiddler for web services or Process Monitor for files.

Next, if you find that Power Query is indeed querying your data source multiple times, the most common explanation is that it’s doing a “background refresh”. This happens by default when you refresh a dataset and is Power Query refreshing all the data previews that you see when you click on a step in the Power Query Editor window. In most cases this is so quick you don’t notice it but when you have lots of queries with lots of steps it can be a gigantic problem. You can turn it off in the Options dialog in Power BI Desktop, as shown in the video, or in the Excel Query Options dialog. I blogged about this issue here.

Another possible reason is the data privacy settings you have applied. This is a gigantic, complex topic and something that I and other people have blogged about many times. I wrote a multi-part series of posts explaining data privacy settings starting here and there’s also a great explanation of what’s going on in behind the scenes written by the dev team that I link to here; I also wrote a post here showing an example of the performance impact of data privacy checks. As I say in the post, however, do not play around with these settings unless you really know what you’re doing.

If you’re ok with writing some M code, using the technique I blogged about here to implement basic query folding on a web data source can be important – in the example in the video, it was doing this that reduced the number of calls to the web service from six to three.

Some transformations can also lead to data being read from a data source more than once. For example, in this post I showed how a merge transformation leads to multiple reads from the same Excel file and how to stop this happening and drastically improve performance.

Moving on, another important lesson to learn is how referenced queries are evaluated. This is one of the most counter-intuitive things about Power Query! With the set of referenced queries used in my demo:

…where three queries called Referenced Query 1, Referenced Query 2 and Referenced Query 3 each reference a query called Call Web Service and are loaded into a dataset (and where Call Web Service is not loaded into the dataset), most people would assume that when a refresh takes place the following happens:

  1. The Call Web Service query is run, getting the data from the web service
  2. The data returned is then passed to Referenced Query 1, Referenced Query 2 and Referenced Query 3

This is wrong. In fact what happens is this:

  1. Referenced Query 1 is refreshed. Because it references the Call Web Service query, that means the Call Web Service query is also evaluated.
  2. Referenced Query 2 is refreshed. Because it references the Call Web Service query, that means the Call Web Service query is also evaluated.
  3. Referenced Query 3 is refreshed. Because it references the Call Web Service query, that means the Call Web Service query is also evaluated.

This explains why the Call Web Service query is still being evaluated three times once the M code above has been added to it. You can read a detailed explanation of how referenced queries are evaluated here.

There is something we can do to help here though: the Power Query engine can cache the results returned from some types of data source, such as web services, but to take full advantage of this caching in this particular scenario you have to turn off the Enable Parallel Loading Of Tables option on the Options dialog as I show here. With that done, each of the three referenced queries run one after the other, instead of in parallel. When the first of these queries runs its calls the Call Web Service query, which gets data from the web service; when the next two queries run, though, and they trigger two more evaluations of Call Web Service, the data for the call to the web service has been cached. This caching can be turned off using the IsRetry option in Web.Contents as Curt Hagenlocher explains in a comment here.

The final thing shown in the video is that the infamous Table.Buffer M function does not improve performance in this case and in fact makes performance worse. If you use Table.Buffer inside the Call Web Service query it will indeed buffer the result of the call to the web service into memory, but the data that is buffered cannot be shared between separate query executions – so the buffering happens three times, once for each time Call Web Service is evaluated by the three referenced queries. Again this is something discussed in this post. Of course there are plenty of other scenarios where Table.Buffer will help performance, but these will only be in cases where the same query requests data from the same data source more than once.

Phew, that was a lot. I don’t pretend to know everything about when and why Power Query gets data from a data source multiple times but this is the sum total of my knowledge right now. I hope it’s useful!

[Update: it’s just been pointed out to me, quite rightly, that the names of my queries are confusing. For example, “Referenced Query 1” is the query doing the referencing, not the query being referenced – which is Call Web Service. I haven’t changed the names because I want the blog post to remain consistent with the video, and I hope the diagrams and the context make everything clear]

24 thoughts on “Why Does Power BI Query My Data Source More Than Once?

  1. Hi Chris,
    Will it help if in the First step of a Reference Query we say
    let
    Source = Table.Buffer(DataFromCSVFile),
    Next Steps,

    in
    NextSteps

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      No, it won’t.

  2. Thanks Chris! Great stuff. Will switching off the “enable parallel loading of tables” in PBI Desktop carry over to PBI Service as well? I was under the impression that it does not.

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      You’re right, it does not carry over to the Service so it’s only useful while developing. I should have made that clearer, sorry.

      1. While I can’t find any documentation for this, as of today, the Power BI Service does appear to respect the Enable Parallel Loading Of Tables option. I have a Python script as a data source (running through a gateway in personal mode from the service that makes a series of REST calls to authenticate & return data) that is executed multiple times when the option is checked, but only executed once when unchecked. And with strategic placement of some lengthy InvokeAfter delays (as suggested by Chris in https://blog.crossjoin.co.uk/2019/03/26/power-bi-caching-parallelism-and-power-query-refresh-performance/) I can ensure that subsequent queries referencing this data source get their data from the persistent cache. Hope this helps others stuck with this thorny issue!

  3. Hello, great post!
    I have a fundamental question. The ‘referenced query’ in the scenario here, is it the one that we create by going to query editor and right click to select ‘reference’ (the other option is to duplicate).
    Is that what you mean by referenced query?

    Thanks,
    Iman

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      Hi Iman, thanks for pointing that out – the naming I used was confusing and I’ll update the post appropriately (although I won’t change the names so they stay consistent with the video). “Referenced Query 1” and the other two similarly-named queries were created by right-clicking on Call Web Service and selecting Reference. So really the aren’t the queries that are referenced, they are the queries doing the referencing.

  4. Hi Chris, I’m referencing Query 4 to 5 times.. I want to reduce the number of hits to the server, so that my refresh time will reduce.. Please help with M code

  5. Hi,
    my issue not solved yet. i have a simple recon queries on my dashboard. i have set all your properties mentioned in this video but still my queries run 3 times. i am using hive odbc and power bi desktop.

  6. Hi Chris
    I am loading 6 csv files from SharePoint (2m rows each) as separate queries. Data load disabled for these queries. Then I aggregate these datasets and append (union) them into a new one (Table.Combine()). In this example, would every of these 6 queries be run twice in Power BI Service?

  7. Hi Chris,

    I’ve noticed that when using Table.Buffer() you’ve got 6 calls and say you don’t know why. You must know though since it’s kinda implied in your later explanation.
    For anyone wondering, it’s simple:
    You’ve got the no caching option in the same query, so buffering the table actually calls it again to buffer it this time = two calls per query * 3 queries = 6

    Cheers

  8. Chris,
    Thanks for sharing and it is informative. Just have one question here.We are having Power BI Report server which is On-premise and not service. The settings you shown in Power BI Desktop (Parallel load and Unchecking Preview data) will carry forward to Report server (or) Will there be same settings at Report server level.?

  9. Thanks Chris. This is useful. Question. Is there any future plan that Power BI can automatically halt the dependents while it is executing (something like making sure it is singleton blocking call)? This will be very helpful to ensure we have the advantages of parallelism and no duplicate call at the same time. More importantly, I would think customers will be much happier with faster Power BI without configuration at first impression. 🙂

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      There are no plans for this, sorry.

      1. This is a show stopper for me and I don’t understand why it’s not fixed. I thought power bi followed the same approach as excel (i.e. a “source” cell is not re-evaluated once)

Leave a Reply to Burhan QaziCancel reply