Data Privacy Settings And Data Refresh Performance In Excel And Power BI

I have known for a long time that data privacy checks can have an impact on the performance of data refresh in Power BI and Excel, but on a recent performance tuning engagement I had the chance to observe just how much of a difference changing these settings can make. Unfortunately I can’t share the M code for the queries I was working with but the scenario is very common:

  • The only data source was a single Excel workbook with multiple worksheets
  • There were multiple Power Query/M queries reading data from these worksheets; the output of these queries was not being loaded into the Power BI dataset
  • There were further Power Query/M queries that referenced these source queries, none of which were being loaded into the Power BI dataset, and they all fed into a single query whose output was being loaded into a dataset
  • Overall data volumes were very small, less than 1MB, and the final query only returned approximately 5000 rows
  • Data refresh times were exceedingly slow, even with the “allow data preview to download in the background” setting turned off, and there was a long period where the Power Query engine was stuck in the “Evaluating” phase of the query during refresh in Power BI Desktop before the dialog showed the amount of data being loaded:

    image

  • Data refresh times also varied by +/-25% (I have a theory why this is happening – if I can confirm it I will blog about this) from one refresh to another

As a benchmark, here are some typical refresh timings for the same table in Power BI Desktop (December 2018 version) and Excel for Office 365 (build 11126.20118 click-to-run) with the default setting of data privacy checks turned on and the data privacy level set for the source Excel file set to Public:

Power BI Desktop Excel
Evaluating Phase (Seconds) 122 173
Data Load Phase (Seconds) 140 113
Total Refresh Time (Seconds) 262 286

Here are the same timings with the Power Query engine set to combine data according to each file’s data privacy settings, and the data privacy checks for the current file turned off:

image

image

Power BI Desktop Excel
Evaluating Phase (Seconds) 69 91
Data Load Phase (Seconds) 108 112
Total Refresh Time (Seconds) 177 203

Clearly there are some substantial query performance gains to be had by turning off data privacy checks and these gains are made in the Evaluating phase. In this case the queries in both Power BI Desktop and Excel were around 30% faster with data privacy checks turned off.

BUT is it safe to turn off data privacy checks? I strongly advise you to read my series on data privacy settings starting here, in particular this post on disabling checks, to understand the issues involved. You should not turn off data privacy checks without understanding what they do because you may end up sending sensitive data to somewhere it shouldn’t go and maybe even break the law. What’s more, if you have published a Power BI dataset you cannot disable data privacy checks for scheduled refresh unless you are using the gateway in Personal mode and have configured it with the “Fast Combine” option turned on, as shown here. Therefore turning off privacy checks is only really useful for Power Query users in Excel or to speed up development in Power BI Desktop.

In this particular case there is only one data source, an Excel workbook, a source which does not support query folding, so as far as I can see there is no risk in turning off data privacy checks because there is no way data from this Excel workbook can be sent anywhere else. Now why couldn’t the Power Query engine work this out?

8 responses

  1. Hey Chris, for reference, in my tests a couple of years ago, I landed on the exact same thing you did here, including the 30%. Subject to the warnings you give on privacy (of course), turning off privacy is the fastest way to get a significant performance increase. And I agree on PQ working this out for you. That should be a given.

  2. Pingback: Why Does Power BI Query My Data Source More Than Once? « Chris Webb's BI Blog

  3. Hello Chris

    First of all, thank you very much for this article and the last 4 you did, I learn a lot of things about data privacy but one question is still remaining.

    For instance, I use an Excel or multiple excel files on my computer. When I import them, the privacy level is by default at “none” .
    Does it change anything when this option is disabled ? Which kind of transformation between them could be improved by disabling it ?
    Because they are flat file, I need to import them and there is no query folding, that’s why I try to understand…

    • “None” just means that data privacy settings are not yet set, and may be inherited from another data source such as the folder that the files are in. It’s hard to say which transformations will benefit from this.

  4. Thanks for this. I was pulling my hair out after the latest desktop upgrade. I have multiple large excel files in a single folder. I point my data source to the folder, BI imports and merges new additions.

    Turning off this setting made the world of difference to the performance and load time.

Leave a Reply to Chris Webb Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: