Credentials, Data Privacy Settings And Data Sources In Power Query/Power BI

Recently I’ve been doing some more investigations into how data privacy settings work in Power BI. This is a subject I’ve blogged about in great detail already in a series of posts last year, but this functionality is so complex that there is always more to learn. I don’t have any profound new insights to offer; instead this blog post is a write up of a series of experiments whose results shed light onto how the process of setting data privacy levels works end-to-end.

Consider the following M query:

let
    Source = 
        Json.Document(
            Web.Contents(
                "https://data.gov.uk/api/3/action/package_search?q=cows"
                )
            ),
    result = Source[result],
    results = result[results],
    results1 = results{0},
    id = results1[id],
    output = 
        Json.Document(
            Web.Contents(
                "https://data.gov.uk/api/3/action/package_show?id=" & id
                )
            )
in
    output

The query does the following:

  • Runs a query against the UK government’s open data search API (the same API I use in this post on the RelativePath and Query options for the Web.Contents() function) to get a list of datasets related to the search term “cows” via the Package Search endpoint
  • Gets the first dataset returned by the search and finds its ID
  • Sends this ID to the Package Show endpoint in the same API to get the full JSON representation of this data set. Note that the entire URL is dynamically generated and that the Query option of Web.Contents() is not used here.

It’s a typical scenario where data privacy settings can cause problems: data from one data source, the package_search endpoint, is sent to another data source, the package_show endpoint. My series from last year on data privacy settings provides some useful background information on why this is such an important thing for the Power Query engine.

Assuming that you have never used this API before, when you try to run this query in the Power Query Editor in Power BI Desktop, you’ll see the following prompt to edit the credentials used:

image

Before you click the Edit Credentials button, there are two interesting things to point out. First, in the Query Dependencies view, you see this:

image

Notice that the Package Search endpoint is shown but not the Package Show endpoint.

Second, if you click the Data Source Settings button, you’ll see the following in the dialog that appears:

image

Not only does it only show the Package Search endpoint, there is a warning that says:

“Some data sources may not be listed because of hand-authored queries”

This refers to the output step in the query that calls the Package Show endpoint with the dynamically-generated url.

Closing this dialog and going back to the Query Editor, if you click the Edit Credentials button, you can set credentials for the data source (anonymous access is fine in this case). These credentials can be set at all levels in the path down to https://data.gov.uk/api/3/action/package_search.

image

Setting credentials at the level of https://data.gov.uk means you only get prompted once; however if you select https://data.gov.uk/api/3/action/package_search from the dropdown list and click Connect you will get prompted again to set credentials, this time with a dropdown that shows all paths down to package show:

image

Assuming you set credentials at the level of https://data.gov.uk and click Connect, then Fiddler shows that a call is made to https://data.gov.uk/api/3/action/package_search?q=cows, presumably to check whether the credentials entered actually work and you move back to the Query Editor.

image

Next, in the Query Editor, you see the data privacy settings prompt:

image

Clicking Continue brings up the data privacy levels dialog:

image

You have the choice to ignore privacy levels for this file, but of course you should always try to avoid doing that. You also have two dropdown boxes that both show https://data.gov.uk on the left-hand side and another two dropdown boxes next to them, although only the top one of these is enabled.

In the first column of dropdown boxes, in the first dropdown, you can see all points in the path from https://data.gov.uk to https://data.gov.uk/api/3/action/package_search:

image

In the dropdown box immediately underneath you can see for the first time all points in the path from https://data.gov.uk to https://data.gov.uk/api/3/action/package_show:

image

If you select https://data.gov.uk in the top-left dropdown only the top-right dropdown is enabled, and in the top-right dropdown you can set the data privacy levels Public, Organizational and Private.

image

The meanings of these three levels are described in my earlier series and here, and I won’t go into detail about what they mean in this post. The bottom-right dropdown is disabled because if you set privacy levels for https://data.gov.uk then all urls that start with this path inherit this setting. This is similar to what happens with the None data privacy setting that I describe here, I think.

Setting the data privacy level to Public on https://data.gov.uk in the top-right dropdown means the query runs successfully:

image

The expected activity is shown in Fiddler:

image

And at last the Package Show endpoint is shown in the Query Dependencies view:

image

The Data Source Settings dialog shows the same as it does above in the “Data sources in current file” tab, including the warning about hand-authored queries, but on the “Global permissions” tab there is now an entry for https://data.gov.uk :

image

Although you only set a privacy level for https://data.gov.uk earlier, it’s interesting to note that the entry for https://data.gov.uk/api/3/action/package_search has a privacy level set explicitly to Public and not to None:

image

Stepping back a few steps to the Privacy Levels dialog, if you set a privacy level of Private for https://data.gov.uk like so:

image

…then the query fails with the error “Formula.Firewall: Query ‘Query1’ (step ‘output’) is accessing data sources that have privacy levels which cannot be used together. Please rebuild this data combination.”:

image

From my point of view, this is the first really interesting finding: the two endpoints, https://data.gov.uk/api/3/action/package_search and https://data.gov.uk/api/3/action/package_show, are considered as separate data sources (which tallies with what is shown in the Query Dependencies view) even though they have both inherited their data privacy level setting from https://data.gov.uk. Since they are both Private then data cannot be sent from one source to the other, hence the error.

The second interesting finding becomes apparent if you follow the steps above with a slightly different version of the original query that uses the Query option in the call to the Package Show endpoint:

let
    Source = 
        Json.Document(
            Web.Contents(
                "https://data.gov.uk/api/3/action/package_search?q=cows"
                )
            ),
    result = Source[result],
    results = result[results],
    results1 = results{0},
    id = results1[id],
    output = 
        Json.Document(
            Web.Contents(
                "https://data.gov.uk/api/3/action/package_show",
                [Query=[#"id"=id]]
                )
            )
in
    output

In this case when you look in the Data Source Settings dialog you now see both endpoints listed and you no longer see the “hand-authored queries” warning:

image

It looks like whatever method it is that the Power Query engine searches for data sources inside a query is confused by dynamically generated urls – which might also explain why data sources that use dynamic urls can’t be refreshed after publishing.

Listing Windows Language Code Identifiers And Their Associated Date And Number Formats With M In Power BI/Power Query

In a comment on my blog post about international date and number formats and changing data types with the “using locale” option in Power Query/Power BI, Jan Karel Pieterse asked if there was any way to get a list of the thousand and decimal separators used for number formatting by each language and region. Since this is exactly the kind of geeky question that fascinates me I decided to write an M query to answer it and – for bonus points – to find the default date format used too.

To start off, I found a table of all Windows Language Code Identifiers on this page:

https://msdn.microsoft.com/en-us/library/cc233982.aspx

Of course this can be loaded into Power Query easily using the “From Web” source. After that it’s easy to add a column to the table that takes a sample date (March 22nd 2018) and number (one hundred thousand and one tenth) and converts it to text using the language code identifier on each row: the Text.From() function does this for dates, and for numbers you have to use Number.ToText() if you want to get thousand separators and decimal separators. There are a few minor problems to deal with, such as the fact that Power Query doesn’t know what to do with dates for the “Congo Swahili” language code identifier and some rows have multiple language tags, but nothing serious.

Here’s the full code:

let
    //Sample dates and numbers to show
    SampleDate = #date(2018,3,22),
    SampleNumber = 100000+(1/10),
    //MS web page with list of language tags
    LocaleWebPage =
        Web.Page(
        Web.Contents(
         "https://msdn.microsoft.com/en-us/library/cc233982.aspx"
         )
         ),
    LocaleList = LocaleWebPage{1}[Data],
    RemoveColumns =
        Table.SelectColumns(
            LocaleList,
            {"Language", "Location (or type)", "Language tag"}
            ),
    SplitColumn =
        Table.SplitColumn(
            RemoveColumns,
            "Language tag",
            Splitter.SplitTextByAnyDelimiter(
                {",","or"},
                QuoteStyle.Csv
                ),
                {"Language tag"}
                ),
    //Create example columns
    DateExample =
        Table.AddColumn(
            SplitColumn,
            "Date",
            each Text.From(SampleDate, [Language tag])
            , Text.Type),
    NumberExample =
        Table.AddColumn(
            DateExample,
            "Number",
            each Number.ToText(SampleNumber,"N", [Language tag])
            , Text.Type),
    //Remove any rows containing errors
    RemoveErrors = Table.RemoveRowsWithErrors(NumberExample)
in
    RemoveErrors

Here’s some of the output:

image

So, if you’ve ever wondered how the Cornish speakers of south-west England like to format their dates or whether the Oromo speakers of Ethiopia use a comma or a full stop as a decimal separator, wonder no more. And if you are not interested in M at all and just want to download an Excel workbook with a list of all LCIDs and how numbers and dates are formatted for them, you can do so here.

BI Survey 18 Highlights

Every year, in return for publicising the BI Survey (the largest survey of BI tool customers and vendors in the world) here on my blog, I get a free copy of the results and the chance to blog about some of the more interesting findings. Here are a few points that stood out for me this year in the BI Survey 18:

  • The top 5 tools acquired by respondents were Power BI, Excel, Qlikview, Tableau and SSRS in that order – so Microsoft has three of the top 5 most popular BI tools. Given that users often have trouble understanding the different types of BI that Power BI, Excel and SSRS are suited to, Microsoft’s long-standing strategy of close ties between Power BI and Excel and the forthcoming integration of SSRS reports into Power BI Premium looks very wise.
  • Power BI has – unsurprisingly – one of the best price-to-value ratios reported of all BI tools. The flip side of this is that customers are less happy with the level of support that Microsoft offers; I guess this is all relative to more expensive BI tools which need armies of people from the vendor or from a BI consultancy to implement them, and who are more likely to bend over backwards for their customers.
  • Power BI is now the number two client tool used for SSAS after SSRS, beating Excel. I find that hard to believe but I guess it’s possible.
  • Compared to its obvious competitors, and indeed to all other BI tools, Power BI gets some very good scores. There is no clear leader (and if there was, I would be very suspicious about the survey’s methodology) but it’s clear that Microsoft is now one of the leading BI vendors and given that it’s sustaining the levels of investment that brought it to this position, I think it Power BI will continue to grow and prosper in 2019.
%d bloggers like this: