Credentials, Data Privacy Settings And Data Sources In Power Query/Power BI

Recently I’ve been doing some more investigations into how data privacy settings work in Power BI. This is a subject I’ve blogged about in great detail already in a series of posts last year, but this functionality is so complex that there is always more to learn. I don’t have any profound new insights to offer; instead this blog post is a write up of a series of experiments whose results shed light onto how the process of setting data privacy levels works end-to-end.

Consider the following M query:

let
    Source = 
        Json.Document(
            Web.Contents(
                "https://data.gov.uk/api/3/action/package_search?q=cows"
                )
            ),
    result = Source[result],
    results = result[results],
    results1 = results{0},
    id = results1[id],
    output = 
        Json.Document(
            Web.Contents(
                "https://data.gov.uk/api/3/action/package_show?id=" & id
                )
            )
in
    output

The query does the following:

  • Runs a query against the UK government’s open data search API (the same API I use in this post on the RelativePath and Query options for the Web.Contents() function) to get a list of datasets related to the search term “cows” via the Package Search endpoint
  • Gets the first dataset returned by the search and finds its ID
  • Sends this ID to the Package Show endpoint in the same API to get the full JSON representation of this data set. Note that the entire URL is dynamically generated and that the Query option of Web.Contents() is not used here.

It’s a typical scenario where data privacy settings can cause problems: data from one data source, the package_search endpoint, is sent to another data source, the package_show endpoint. My series from last year on data privacy settings provides some useful background information on why this is such an important thing for the Power Query engine.

Assuming that you have never used this API before, when you try to run this query in the Power Query Editor in Power BI Desktop, you’ll see the following prompt to edit the credentials used:

image

Before you click the Edit Credentials button, there are two interesting things to point out. First, in the Query Dependencies view, you see this:

image

Notice that the Package Search endpoint is shown but not the Package Show endpoint.

Second, if you click the Data Source Settings button, you’ll see the following in the dialog that appears:

image

Not only does it only show the Package Search endpoint, there is a warning that says:

“Some data sources may not be listed because of hand-authored queries”

This refers to the output step in the query that calls the Package Show endpoint with the dynamically-generated url.

Closing this dialog and going back to the Query Editor, if you click the Edit Credentials button, you can set credentials for the data source (anonymous access is fine in this case). These credentials can be set at all levels in the path down to https://data.gov.uk/api/3/action/package_search.

image

Setting credentials at the level of https://data.gov.uk means you only get prompted once; however if you select https://data.gov.uk/api/3/action/package_search from the dropdown list and click Connect you will get prompted again to set credentials, this time with a dropdown that shows all paths down to package show:

image

Assuming you set credentials at the level of https://data.gov.uk and click Connect, then Fiddler shows that a call is made to https://data.gov.uk/api/3/action/package_search?q=cows, presumably to check whether the credentials entered actually work and you move back to the Query Editor.

image

Next, in the Query Editor, you see the data privacy settings prompt:

image

Clicking Continue brings up the data privacy levels dialog:

image

You have the choice to ignore privacy levels for this file, but of course you should always try to avoid doing that. You also have two dropdown boxes that both show https://data.gov.uk on the left-hand side and another two dropdown boxes next to them, although only the top one of these is enabled.

In the first column of dropdown boxes, in the first dropdown, you can see all points in the path from https://data.gov.uk to https://data.gov.uk/api/3/action/package_search:

image

In the dropdown box immediately underneath you can see for the first time all points in the path from https://data.gov.uk to https://data.gov.uk/api/3/action/package_show:

image

If you select https://data.gov.uk in the top-left dropdown only the top-right dropdown is enabled, and in the top-right dropdown you can set the data privacy levels Public, Organizational and Private.

image

The meanings of these three levels are described in my earlier series and here, and I won’t go into detail about what they mean in this post. The bottom-right dropdown is disabled because if you set privacy levels for https://data.gov.uk then all urls that start with this path inherit this setting. This is similar to what happens with the None data privacy setting that I describe here, I think.

Setting the data privacy level to Public on https://data.gov.uk in the top-right dropdown means the query runs successfully:

image

The expected activity is shown in Fiddler:

image

And at last the Package Show endpoint is shown in the Query Dependencies view:

image

The Data Source Settings dialog shows the same as it does above in the “Data sources in current file” tab, including the warning about hand-authored queries, but on the “Global permissions” tab there is now an entry for https://data.gov.uk :

image

Although you only set a privacy level for https://data.gov.uk earlier, it’s interesting to note that the entry for https://data.gov.uk/api/3/action/package_search has a privacy level set explicitly to Public and not to None:

image

Stepping back a few steps to the Privacy Levels dialog, if you set a privacy level of Private for https://data.gov.uk like so:

image

…then the query fails with the error “Formula.Firewall: Query ‘Query1’ (step ‘output’) is accessing data sources that have privacy levels which cannot be used together. Please rebuild this data combination.”:

image

From my point of view, this is the first really interesting finding: the two endpoints, https://data.gov.uk/api/3/action/package_search and https://data.gov.uk/api/3/action/package_show, are considered as separate data sources (which tallies with what is shown in the Query Dependencies view) even though they have both inherited their data privacy level setting from https://data.gov.uk. Since they are both Private then data cannot be sent from one source to the other, hence the error.

The second interesting finding becomes apparent if you follow the steps above with a slightly different version of the original query that uses the Query option in the call to the Package Show endpoint:

let
    Source = 
        Json.Document(
            Web.Contents(
                "https://data.gov.uk/api/3/action/package_search?q=cows"
                )
            ),
    result = Source[result],
    results = result[results],
    results1 = results{0},
    id = results1[id],
    output = 
        Json.Document(
            Web.Contents(
                "https://data.gov.uk/api/3/action/package_show",
                [Query=[#"id"=id]]
                )
            )
in
    output

In this case when you look in the Data Source Settings dialog you now see both endpoints listed and you no longer see the “hand-authored queries” warning:

image

It looks like whatever method it is that the Power Query engine searches for data sources inside a query is confused by dynamically generated urls – which might also explain why data sources that use dynamic urls can’t be refreshed after publishing.

Using Function.ScalarVector() To Optimise Calls To M Functions

One of the most common issues faced when calling web services in M is that the the easiest way of doing so – creating a function that calls the web service, then calling the function once per row in a table using the Invoke Custom Function button – is very inefficient. It’s a much better idea to batch up calls to a web service, if the web service supports this, but doing this forces you to write more complex M code. It’s a problem I wrestled with last year in my custom connector for the Cognitive Services API, and in that case I opted to create functions that can be passed lists instead (see here for more information on how functions with parameters of type list work); I’m sure the developers working on the new AI features in dataflows had to deal with the same problem. This problem is not limited to web services either: calculations such as running totals also need to be optimised in the same way if they are to perform well in M. The good news is that there is a new M function Function.ScalarVector() that allows you to create functions that combine the ease-of-use of row-by-row requests with the efficiency of batch requests.

I’m going to try to keep things as simple as possible while showing how it works. Consider the following M function, called SingleValueUpper:

(InputText as text) => 
    Text.Upper(InputText)

It takes a single text value and returns the same text converted to upper case. Now consider the following query called Product that returns a table of fruit names in lower case:

#table(
    type table [Fruit=text],
    {{"apples"},{"oranges"},{"pears"}}
    )

image

The function can be called on this table in a third query with the following expression (which is basically what the Invoke Custom Function button produces):

Table.AddColumn(
    Product, 
    "Fruit Upper", 
    each SingleValueUpper([Fruit])
    )

The output of this is:

image

In this example the function is called three times – once for each row – and as I said if you were calling a web service in this way, for example, it would be very inefficient.

Now consider the following M function, called ListUpper:

(InputText as list) => 
    List.Transform(
        InputText, 
        each Text.Upper(_)
        )

This function takes a list of text values, converts each of the text values to upper case, and returns a list of the results. Calling the function with the contents of the Fruit column of the table returned by the Product query, like so:

ListUpper(Product[Fruit])

…returns the following list:

image

If you were calling a web service like this – perhaps by making a single POST request with all of these values in the body – it would be much more efficient.

Function.ScalarVector allows you to call a function like this through the Invoke Function Button. The following function, called ScalarVectorUpper in my example, shows how:

Function.ScalarVector(
    type function(Fruit as text) as text,
    (InputTable) =>
        let
            BufferTable = Table.Buffer(InputTable),
            InputList = BufferTable[Fruit],
            CallFunction = ListUpper(InputList)            
        in
            CallFunction
)

image

Function.ScalarVector is given two arguments:

  • A function type whose parameters match the columns from the input table that we’re interested in using. In this case the input table is the Product table and it has one text column called Fruit; we want to pass the contents of the Fruit column to this function.
  • A function that is passed a table (in this case called InputTable) that has the same columns as those listed in the function type. The documentation says this function “must not enumerate its input table more than once” so it uses Table.Buffer() to make sure this is the case; it then takes the contents of the Fruit column from this table and passes it to the ListUpper function defined above, and returns a list of upper case text values.

Basically, it creates a function that looks like it is being called on a row-by-row basis but which is actually called once for the whole table.

The result of calling this function on the table returned by the Product query:

Table.AddColumn(
    Product, 
    "Fruit Upper", 
    each ScalarVectorUpper([Fruit])
    )

…is the same as before, but potentially a lot more efficient:

image

This may seem obscure, but personally I think this is incredibly useful and important. Anyone writing a custom connector in the future (although Function.ScalarVector() isn’t available in the current release of the Power Query SDK I assume it will appear soon) is likely to want to use it, and remember also that custom connectors can be used to create M function libraries and I believe this scenario will be officially supported too.

You can download the example pbix file for this post here.

The CredentialQuery Option For Web.Contents() In Power BI Custom Data Connectors

There’s a new, as yet undocumented, option for the Web.Contents() M function that is only available for custom data connectors and not in Power BI Desktop: CredentialQueryOption. It does the same thing as the Query option, but it stops the values you are passing to the query parameter from being logged in Power BI’s diagnostic logs, so it’s useful if you are passing sensitive information such as passwords.

For example, consider the following call to Web.Contents() used in a custom data connector:

Web.Contents(
 "https://data.gov.uk/api", 
 [RelativePath="3/action/package_search", 
 Query=[q="cows"]
]
)

When the custom data connector is used in Power BI Desktop, if diagnostic logging is turned on in the Options dialog:

image

…then, if you look in the log files you’ll see the entire url called including the search term “cows”:

image

However, if you change the M code to:

Web.Contents(
 "https://data.gov.uk/api", 
 [RelativePath="3/action/package_search", 
 CredentialQuery=[q="cows"]
]
)

So that the CredentialQuery option is used instead of the Query option, the search term will not appear in the log file.

Thanks to Curt Hagenlocher for providing this information.

Can You Use Power Query In Power BI Or Excel To Write Data To A Data Source?

The short answer: no.

The long answer:

It is, technically, possible to use Power Query (in either Power BI or Excel) to write data to a data source; for example, here’s an old post I wrote showing how to run a UPDATE statement to update data in a table in SQL Server. However because the Power Query engine may evaluate a value or statement more than once when a query is executed, it is therefore not safe or supported to use it to try to create, insert, update, delete or otherwise change data in a data source – it should only be used to read data. You may see some functions listed in the M function reference that suggest otherwise but they aren’t supported either, and indeed may not work at all.

Thanks to Matt Masson of the dev team for providing the official position on this.

A Function To Generate The M Code For A Table Type

This is going to sound obscure, and it is, but recently I’ve been using the #table() intrinsic function in M a lot – specifically the version that takes a table type as its first parameter (as I describe here) – and because it’s a bit of a pain to have to write the M code for a table type by hand, I’ve written an M function that takes a table and returns the text for the M code that is needed to define a table type. Here it is:

(InputTable as table) as text =>
let
    Source = 
        Table.Schema(InputTable),
    SortRows = 
        Table.Sort(
            Source,
            {{"Position", Order.Ascending}}),
    RemoveColumns = 
        Table.SelectColumns(
            SortRows,
            {"Name", "TypeName"}),
    AddCustom = 
        Table.AddColumn(
            RemoveColumns, 
            "TypeNames", 
            each 
            Expression.Identifier([Name]) & " = " & [TypeName]),
    Output = 
        "[" & Text.Combine(AddCustom[TypeNames], ", ") & "]"
in
    Output

Nothing complex here, but now I’ve posted this I know that in the future I’ll be able to Google for it when I’m working onsite with a customer and I need it!

To give you an idea of how it works, take the table that is returned by the following M expression, which calls the public TripPin OData web service:

OData.Feed(
"https://services.odata.org/TripPinRESTierService/Airports", 
null, 
[Implementation="2.0"])

image

Passing this table to the function above returns the following text, the M code for a record that lists the names of the columns in this table and their data types, suitable for use with #table:

[Name = Text.Type, IcaoCode = Text.Type, 
IataCode = Text.Type, Location = Record.Type]

Using Power Query And Microsoft Flow To Automate The Creation Of CSV Files

I was super-excited when, a few days ago, the long-awaited integration of Power Query into Microsoft Flow was announced. I was then gutted when work commitments meant I couldn’t blog about it right away and Erik Svensen beat me to it with this excellent post showing how to push data from SQL Server via Power Query and Flow to a Power BI streaming dataset:

https://eriksvensen.wordpress.com/2018/09/25/powerquery-everywhere-now-in-microsoftflow-as-well/

Anyway, the integration of Power Query into Flow opens up so many interesting possibilities so I couldn’t resist writing about it myself today. For example: something I am asked about a lot is whether it is possible to use Power Query to automate the creation of CSV files. You can’t do this in a supported way in Power Query in Excel or Power BI (and yes I know about all of the slightly hacky ways people have done it like this), but it is absolutely something you can do with Power Query in Flow.

Here’s how:

image

First of all I have a Schedule – Recurrence trigger, which allows you to run a flow on a schedule. In this case I’m setting the Flow to run once a day at 6am UK time:

image

Next I have the all-important Transform data using Power Query action. At the moment you can only use Power Query with SQL Server data sources, and (at least when I tested it) it only seems to work with Azure SQL Database data sources and not on-premises SQL Server databases via a gateway – hopefully this gets fixed soon. Clicking the Edit Query button:

image

…opens up the Power Query online editor, which will be familiar to anyone who has used Power Query in Excel or Power BI Desktop, and you build your query:

image

Next I have a Create CSV Table action to convert the results returned by the query to csv format:

image

And finally I take the csv data and save it to a CSV file in OneDrive For Business with a Create File action:

image 

Run the Flow and the CSV file is created:

image

And I can view the contents in Excel Online:

image

The Power Query/Flow integration is still in Preview and I found a few things didn’t work reliably: for example the first few times I ran my Flow I got errors saying that it couldn’t connect to the Azure SQL Database, even though it clearly could while I was designing the query, but that error went away after a while. What’s more it only works for SQL Server data sources right now and I really hope that it is enabled for all the other data sources that Power Query can connect to, especially Excel. These are just teething troubles though, and it’s clear that this is going to be revolutionary for Power Query and Flow users alike!

I really need to improve my Flow skills now…

Converting Decimal Numbers To Hexadecimal In Power Query M

This is a very short post! A lot of people have blogged about how to convert numbers between different bases in M (see for example Maxim Zelensky’s very elegant solution for converting from binary to decimal), but today I noticed there was a very easy way to convert a decimal number to hexadecimal using the Number.ToText() function: you just need to use “x” in the second parameter. For example:

Number.ToText(12, "x") //returns c
Number.ToText(123, "x") //returns 7b

I’m sure this will come in handy somewhere…

%d bloggers like this: