Data Privacy Settings In Power BI/Power Query, Part 1: Performance Implications

One of the most confusing, under-documented and widely-misunderstood features of Power BI and Power Query (or Excel “Get & Transform” or whatever you want to call it) are the data privacy settings. I get caught out by them all the time, so I thought it would be a good idea to write a series of blog posts demonstrating how they work and what effect they have in different scenarios using example M queries.

Before carrying on, I suggest you read the official Microsoft documentation on the subject here:
https://powerbi.microsoft.com/en-us/documentation/powerbi-desktop-privacy-levels/
It gives you a good grounding in what the different data privacy levels are and where you can set them in the Power BI UI. The same options are available in Excel with Power Query/Get & Transform.

In this first post I’m going to look at what the performance implications of different privacy levels can be. Let’s say you have two data sources. First, an Excel workbook with a single table in that contains the name of a day of the week:

image

The second is the DimDate table in the Adventure Works DW SQL Server sample database:

image

Here’s an M query called FilterDay that returns the day name from the table in the Excel workbook:

let
    Source = Excel.Workbook(
		File.Contents("C:\Filter.xlsx")
		, null, true),
    FilterDay_Table = Source{[Item="FilterDay",Kind="Table"]}[Data],
    ChangedType = 
	Table.TransformColumnTypes(
		FilterDay_Table,
		{{"Parameter", type text}}),
    Output = ChangedType{0}[#"Parameter"]
in
    Output

image

Here’s an M query called DimDate that filters data from the DimDate table in the Adventure Works DW database, returning only the rows where the EnglishDayNameOfWeek column matches the value returned by the FilterDay query above.

let
    Source = Sql.Databases("localhost"),
    DB = Source{[Name="Adventure Works DW"]}[Data],
    dbo_DimDate = DB{[Schema="dbo",Item="DimDate"]}[Data],
    RemovedColumns = Table.SelectColumns(dbo_DimDate,
		{"DateKey", "EnglishDayNameOfWeek"}),
    FilteredRows = Table.SelectRows(RemovedColumns, 
		each ([EnglishDayNameOfWeek] = FilterDay))
in
    FilteredRows

The first time you run this second query you’ll be prompted to enter credentials to connect to SQL Server, and then (assuming you haven’t set any of the options that get Power BI to ignore privacy levels) you’ll see the “Information is required about data privacy” prompt:

image

Clicking “Continue” allows you to set a privacy level for the SQL Server database and the root of the drive that the Excel workbook is on:

image

The dropdown boxes in the left-hand column allow you to be more specific in what you set privacy levels on; in the case of the Excel file you can choose to set permissions at all folders in the path down to the workbook and the workbook itself; in the case of SQL Server there is the option of setting privacy levels on the database rather than the server.

You can choose any of the three privacy levels (Public, Organizational or Private) for the SQL Server instance and the Excel file and the query will still run:

image

This is where things start to get confusing. At this point the workbook has a privacy level set to None, which is the default for newly-created data sources:

image

However, this “None” setting means it inherits its data privacy settings from the level set for the c:\ drive set earlier, even though this setting isn’t displayed on the “Data sources in current file” tab. Hopefully the fact this is happening will be addressed in a future release: I think the current UI is misleading.

For the purposes of this blog post the data privacy settings for SQL Server are irrelevant and it’s the settings on the Excel workbook that are important. So, to make things clearer, let’s say you explicitly set the data privacy setting for SQL Server to private and the data privacy setting for the Excel workbook itself to Public like so:

image

If you right-click on the FilteredRows step of the query that returns data from the DimDate table in SQL Server and select “View Native Query” (see here for more details on this feature) to see the SQL query generated in the background for this step like so:

image

You’ll see that query folding is taking place for this step and the filter on the EnglishDayNameOfWeek column is taking place in the SQL query:

image

select [_].[DateKey],
    [_].[EnglishDayNameOfWeek]
from 
(
    select [DateKey],
        [EnglishDayNameOfWeek]
    from [dbo].[DimDate] as [$Table]
) as [_]
where [_].[EnglishDayNameOfWeek] = 'Friday'

Query folding is almost always a good thing for the performance of a query. For more details on what query folding is, see here.

If, however, you set the privacy level for the Excel workbook to Private like so:

image

…even though the DimDate query still works, query folding does not take place for the Filtered Rows step. The View Native Query right-click option is greyed out, and Profiler shows that the following SQL is executed when the query is refreshed:

select [$Ordered].[DateKey],
    [$Ordered].[EnglishDayNameOfWeek]
from 
(
    select [DateKey],
        [EnglishDayNameOfWeek]
    from [dbo].[DimDate] as [$Table]
) as [$Ordered]
order by [$Ordered].[DateKey]

Note that there is no WHERE clause in this query and that the whole of the DimDate table is returned from SQL Server.

Data from a data source that has a privacy level of Private can never be sent to another data source. That is exactly what needs to happen for query folding to take place though: a value from the Excel workbook – the text “Friday” – needs to be embedded in the WHERE clause of the SQL query sent to SQL Server in order for filtering to happen inside the database.  The risk with query folding is that a DBA could monitor the queries that are being run on SQL Server, look at the WHERE clauses, and see data from your Excel workbook. That’s maybe not a problem with day names, but potentially an issue if you were working with more sensitive data like customer names or addresses. Therefore, with the Excel workbook’s privacy level set to Private, the whole of the DimDate table is downloaded into the M engine and the filtering has to take place there to maintain the privacy of the data in Excel. The query still runs but it will probably be a lot slower than it would have been had query folding taken place. With the privacy level of the Excel workbook set to Public, on the other hand, it is ok to send data from Excel to SQL Server so query folding does take place.

To sum up, in this post I have shown how different data privacy settings can affect the performance of a query by determining whether query folding takes place or not. In part 2 of this series I will show how different data privacy settings can determine whether a query executes at all.

[This post was updated on June 19th 2017 with details on how the “None” privacy level behaves and how permissions can be set at different levels and inherited – thanks to Curt Hagenlocher for providing these details]

15 thoughts on “Data Privacy Settings In Power BI/Power Query, Part 1: Performance Implications

  1. Hi Chris, nice post! A couple of things:
    1) It is not true to say query folding does not take place. It just stops before the filter step. If there was no query folding taking place, no SQL script would be sent to the server at all.
    2) In this particular example, it is not possible for privacy to matter in a WHERE clause, since you’re filtering based on values already in the database. Unfortunately, I have yet to see an example where setting privacy to private would provide meaningful results, or why the user wouldn’t always ignore the privacy setting to get the results he/she needs.

    • Hi Colin,

      While I agree I could have been more specific by saying that query folding doesn’t take place for the filtering step, I think a lot depends on how you define the term “query folding”. Since the query can’t execute without sending some kind of SQL to the database, in this example I would say that it is right to say that query folding is not taking place at all with the workbook’s privacy level set to private – but that’s because the definition of “query folding” I have in my head is the pushing of filters back to the data source, not the running of the original query. It all depends on how you define the term though.

      I’m also going to have to disagree with you on point (2) here as well. In my example, if a DBA was looking at the SQL queries being run, there is a privacy risk. Even if all of the possible weekday values exist in SQL Server, the presence of values in the WHERE clause proves that these values also exist in the workbook and this is something that the DBA could not have known before. While I accept that the privacy levels are irritating and while I also ignore them in a lot of cases, I can appreciate why they are there. It only takes one obscure scenario where sensitive data is leaked from Excel (and let’s face it there are so many Excel users out there that there’s a good chance of even the most obscure scenarios happening) to create a gigantic PR problem for Microsoft.

  2. Oops, it appears that I didn’t properly post my follow-up response:

    On the issue of query folding:
    1) As long as Power Query pushes SQL commands to be executed on the server, query folding has taken place
    2) Query folding can be partial, and often is because steps like changing a column type are never folded.
    3) In this particular case, the selection of the DateKey and EnglishDayNameOfWeek is executed on the server, so partial folding has occurred, even if the filtering step hasn’t folded.

    You already know all of this so perhaps we’re arguing semantics, and in your post, “partial folding” could have been used as a qualification. This is not a nitpick. I was genuinely confused when you mentioned that no folding had taken place.

    I don’t really understand your comment in regards to the privacy risk. With whom does the risk lie, and what is the risk?

    Thx

    • Sorry, you’re right – I had forgotten that I was selecting columns too, so yes folding is taking place for that step but not for the filtering. I’ll update the blog post accordingly. I still don’t think that any execution of SQL on the server counts as query folding, just logic beyond the simple selection of a table, but you’re right this just comes down to semantics.

      Regarding the privacy risk, that lies with the data leaving the original data source and going to someone other than a person who has permission to see that data. Let’s say Chris has permission to see the data in the Excel workbook – therefore Chris can use Power Query or Power BI to pull data from that workbook. However if Chris wants to combine that data with data from SQL Server and query folding takes place, some of the data from Excel will get embedded in the SQL queries that get sent to SQL Server. A SQL Server DBA will be able to see that query and see some data from Excel that they would not otherwise have permission to see. We can argue whether the Power Query/Power BI UI does a good job of explaining what’s happening in the background and whether it’s even relevant in 99% of cases, but it’s definitely a breach of security.

  3.  I am not understand about your Data Privacy Settings in Power B1/Power Query?? You are using Paramter for list only for Friday? please give me step by step how you arriving to get the list of day( Name of the Week)?  I know how to use Power Queries before. This is very interesting, thanks.  

    Sent: Wednesday, May 24, 2017 at 5:40 PM

  4. Hey Chris,

    Thanks for the response. I think that in situations within which I work, Organizational would be the privacy setting to choose (or ignore :)). Otherwise, the user will never be able to issue a query against the local database server if filters have to hidden from the DBA.

    In case you weren’t aware, you can right-click on any step to see what’s folded to that point. So in this case, you can right-click on RemovedColumns to see the SQL text (it’s why I’ve argued that some folding is done, even if the filter step isn’t folded). Also saves the hassle of using Profiler. Being able to click on steps to see where the folding has ended (View Native Query option grayed out) is very convenient when troubleshooting, or determining how the steps might be best ordered.

    Looking forward to part 2!

    • Hi Colin,

      Yes, I knew you could click on any step and see the SQL – I was just being an idiot and forgetting there was other stuff in the query that was being folded.

      Re the privacy settings, Organizational would indeed be the appropriate setting in most cases if you are querying a local SQL Server instance. It all depends on the data source really.

  5. Hi Frank,

    You posted your comment while I was posting my previous comment, but you raise an interesting question. There are really two ways to create parameters. The one that Chris describes here is based on end-user selections in an Excel workbook. It has always been possible to create this type of parameter in Power Query/Power BI.

    Another type of parameter functionality was initiated in Power BI that allows the user to select parameter values when opening a Power BI template file. Although this functionality has been extended to Power Query (and Excel 2016), I have not found it to be of much use (unless you’re importing your model into Power BI Desktop), because there is no way to access these parameters as workbook selections for the end user.

    A discussion comparing and contrasting the two parameter methods would be a good topic for a future post.

    • I mostly use parameters for development purposes, to create something similar to global variables that can be changed easily. I know you can do this with normal queries too but at least with parameters you can type new values without having to open the Query Editor. I agree they’re less useful in Excel though; Excel has the advantage of the worksheet, so in most cases you can read this type of ‘global variable’ value from a cell in the worksheet rather than have to bother with parameters.

  6. Thanks for the post. I got your context here, we are concern about performance which depends on full or partial query folding.

    I followed your steps, but I am not able to repro the partial query folding scenario.

    If I make excel as Private and database as public
    Current file Privacy as Ignore the Privacy levels and potentially improve performance: Then full query folding is happening.
    Current file Privacy as Combine data according to your Privacy Level settings for each source: Then it is throwing an error saying privacy level are not compatible.
    But in your case, your are seeing partial query folding.

    Please let me know if I miss anything.

    • In this post I’m not changing the settings you are referring to (ie “Ignore the Privacy levels” etc) at all, I’m only changing the privacy levels for each data source. That comes later in the series! Do not go to the Options dialog, go to the “Data Source Settings” button on the Home tab of the Query Editor to change the privacy levels for each source.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s