Data Privacy Settings And Data Refresh Performance In Excel And Power BI

I have known for a long time that data privacy checks can have an impact on the performance of data refresh in Power BI and Excel, but on a recent performance tuning engagement I had the chance to observe just how much of a difference changing these settings can make. Unfortunately I can’t share the M code for the queries I was working with but the scenario is very common:

  • The only data source was a single Excel workbook with multiple worksheets
  • There were multiple Power Query/M queries reading data from these worksheets; the output of these queries was not being loaded into the Power BI dataset
  • There were further Power Query/M queries that referenced these source queries, none of which were being loaded into the Power BI dataset, and they all fed into a single query whose output was being loaded into a dataset
  • Overall data volumes were very small, less than 1MB, and the final query only returned approximately 5000 rows
  • Data refresh times were exceedingly slow, even with the “allow data preview to download in the background” setting turned off, and there was a long period where the Power Query engine was stuck in the “Evaluating” phase of the query during refresh in Power BI Desktop before the dialog showed the amount of data being loaded:

    image

  • Data refresh times also varied by +/-25% (I have a theory why this is happening – if I can confirm it I will blog about this) from one refresh to another

As a benchmark, here are some typical refresh timings for the same table in Power BI Desktop (December 2018 version) and Excel for Office 365 (build 11126.20118 click-to-run) with the default setting of data privacy checks turned on and the data privacy level set for the source Excel file set to Public:

Power BI Desktop Excel
Evaluating Phase (Seconds) 122 173
Data Load Phase (Seconds) 140 113
Total Refresh Time (Seconds) 262 286

Here are the same timings with the Power Query engine set to combine data according to each file’s data privacy settings, and the data privacy checks for the current file turned off:

image

image

Power BI Desktop Excel
Evaluating Phase (Seconds) 69 91
Data Load Phase (Seconds) 108 112
Total Refresh Time (Seconds) 177 203

Clearly there are some substantial query performance gains to be had by turning off data privacy checks and these gains are made in the Evaluating phase. In this case the queries in both Power BI Desktop and Excel were around 30% faster with data privacy checks turned off.

BUT is it safe to turn off data privacy checks? I strongly advise you to read my series on data privacy settings starting here, in particular this post on disabling checks, to understand the issues involved. You should not turn off data privacy checks without understanding what they do because you may end up sending sensitive data to somewhere it shouldn’t go and maybe even break the law. What’s more, if you have published a Power BI dataset you cannot disable data privacy checks for scheduled refresh unless you are using the gateway in Personal mode and have configured it with the “Fast Combine” option turned on, as shown here. Therefore turning off privacy checks is only really useful for Power Query users in Excel or to speed up development in Power BI Desktop.

In this particular case there is only one data source, an Excel workbook, a source which does not support query folding, so as far as I can see there is no risk in turning off data privacy checks because there is no way data from this Excel workbook can be sent anywhere else. Now why couldn’t the Power Query engine work this out?

5 responses

  1. Hey Chris, for reference, in my tests a couple of years ago, I landed on the exact same thing you did here, including the 30%. Subject to the warnings you give on privacy (of course), turning off privacy is the fastest way to get a significant performance increase. And I agree on PQ working this out for you. That should be a given.

  2. Pingback: Why Does Power BI Query My Data Source More Than Once? « Chris Webb's BI Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: