Power BI Data Privacy Levels And Cloud /Web-Based Data Sources Or Dataflows

UPDATE: it is now possible to set privacy levels for cloud data sources in the Power BI portal. See https://powerbi.microsoft.com/en-us/blog/privacy-levels-for-cloud-data-sources/

I have already blogged in great detail many times about Power BI/Power Query data privacy settings (see this series for example) but there’s always something new to learn. Recently I was asked a question by Ian Eckert about how Power BI handles data privacy for cloud or web-based data sources after a dataset has been published, and it prompted yet more revelations…

Consider the following M query:

[sourcecode language=’text’ padlinenumbers=’true’]
let
Source = Xml.Tables(
Web.Contents(“https://blog.crossjoin.co.uk/feed/”)
),
channel = Source{0}[channel],
language = channel{0}[language],
out = Json.Document(
Web.Contents(
https://data.gov.uk/api”,
[
RelativePath=”3/action/package_search”,
Query=[q=language]
]
)
),
result = out[result],
results = result[results],
#”Converted to Table” = Table.FromList(
results,
Splitter.SplitByNothing(),
null,
null,
ExtraValues.Error
),
Column1 = #”Converted to Table”{0}[Column1],
#”Converted to Table1″ = Record.ToTable(Column1)
in
#”Converted to Table1″
[/sourcecode]

It doesn’t do anything particularly interesting, but it does take data from one web-based data source (the RSS feed for this blog) and sends it to another (the UK government’s open data metadata search web service). As a result, in Power BI Desktop, if you set the data privacy settings for both data sources to Public then the query runs, but if you set the data privacy settings for both data sources to Private:

image

image

…As expected, you get the following error:

image

Formula.Firewall: Query ‘Test’ (step ‘Converted to Table1’) is accessing data sources that have privacy levels which cannot be used together. Please rebuild this data combination.

Now the strange thing is that, when you publish the dataset that contains this M query, refresh always works. Why? What’s more, other datasets that do something similar will always fail when refreshed.

It turns out that when you publish a dataset that uses cloud or web-based data sources like the two used here, the Power BI service does not use the data privacy settings you have set in Power BI Desktop but instead it automatically assigns data privacy levels as follows:

  • Data sources, like the ones used here, that use Anonymous authentication are automatically given the privacy level Public
  • All other data sources are given the privacy level Private.

Interestingly, Power BI dataflows also count as cloud-based data sources and because they do not use Anonymous authentication they default to Private too, so if you are combining data from a dataflow with another data source in your dataset then you need to be careful of this.

What’s more there is at the time of writing no way to change these data privacy levels in the Power BI web-based portal. Hopefully this will change soon.

There are some workarounds though!

First of all, you can force refresh to take place through a gateway. This might sound strange because in theory, if you’re only using cloud or web-based data sources, a gateway should not be necessary. However there are already similar scenarios where a gateway is needed, for example if you are scraping data from a web page you need to use a gateway, and if you are combining data from a cloud-based data source with an on-premises data source you also need to use a gateway. If you add your cloud/web-based data sources as data sources in your gateway (unfortunately it does not seem to be possible to add a dataflow as a data source in a gateway, though) you can set their data privacy levels in the Advanced Settings section in the Manage Gateways screen:

image

You will also need to set the “Use a data gateway” option to On in the Settings dialog for your dataset after it has been published:

image

The other workaround is to copy the M code for your query and paste it into a new blank M query in an entity in a dataflow, as Matthew Roche shows here. While it does not seem to be possible to set data privacy levels for individual data sources when creating an entity, it is possible to turn off data privacy checks for an entity completely. If you create a query that sends data from one data source to another (regardless, as far as I can see, of the authentication mechanism used), you will see the following message in the Power Query Online query editor:

image

The evaluation was cancelled because combining data from multiple sources may reveal data from one source to another. Click Continue if the possibility of revealing data is okay.

If you click Continue, data privacy checks are turned off and the query runs; you can also click the Options button on the ribbon and check the “Allow combining data from multiple sources” option:

image

If one of your data sources is already itself a dataflow you may need to do some editing of the M query to make things work, but as Matthew Roche shows here it is possible to have an entity in a dataflow refer to another entity without using a computed entity (which is a Premium-only feature).

[Thanks to Arthi Ramasubramanian Iyer from Microsoft for providing background information for this post]

3 thoughts on “Power BI Data Privacy Levels And Cloud /Web-Based Data Sources Or Dataflows

  1. Hello Chris, in the Data Source Settings window this message appears:
    “It is possible that some data sources may not appear in the list due to hand-created queries”
    also in Edit Queries I can not visualize any table, I get this:
    “Expression.Error: Access to the resource is forbidden. (Edit Settings)”.

    I do not know what the reason for this is and this means that I can not refresh my reports, I would appreciate it if you could give me a solution in this regard.

    The database is taken from the project center of the Project Web App.

Leave a Reply