Storing Large Images In Power BI Datasets

Jason Thomas and Gerhard Brueckl have both blogged on the subject of storing images as text inside a Power BI dataset:

http://sqljason.com/2018/01/embedding-images-in-power-bi-using-base64.html

https://blog.gbrueckl.at/2018/01/storing-images-powerbi-analysis-services-data-models/

Since they wrote those posts, however, Power BI has added the ability to set the Data Category property on measures as well as columns in tables. This means it is now possible to have the output of a DAX measure displayed as an image in a Power BI report and this in turn opens up a lot of new possibilities – including the ability to work around the maximum size of a text value that can be loaded into Power BI (see my previous blog post for more details) and therefore work with larger images.

Here’s a rather lovely picture of a rose:

2014-08-18 17.02.10_20Pct

The original is about 2.1MB; I have a folder on my PC where different versions of this picture, saved at different percentages of the original size, are stored:

image

Using the technique that Gerhard wrote about, where the pictures can be stored as text in a single cell in a Power BI dataset and then displayed (in this case I’m using the Image by CloudScope custom visual) some truncation of the image occurs even with the smallest files because of the 32766 character limit on the length of a text value that can be loaded into Power BI. Here’s what you see when you display the version of the picture that is 20% of the original size, a file of only 113KB:

image

To work around this, what you need to do is to split the text representation of the image up into multiple smaller text values stored across multiple rows, each of which is less than the 32766 character limit, and then reassemble them in a DAX measure after the data has been loaded.

Splitting the text up in M is actually not that hard, but it is hard to do efficiently. Here’s an example of an M query that reads all the data from all of the files in the folder above and returns a table:

let
    //Get list of files in folder
    Source = Folder.Files("C:\Users\Chris\Documents\PQ Pics"),
    //Remove unnecessary columns
    RemoveOtherColumns = Table.SelectColumns(Source,{"Content", "Name"}),
    //Creates Splitter function
    SplitTextFunction = Splitter.SplitTextByRepeatedLengths(30000),
    //Converts table of files to list
    ListInput = Table.ToRows(RemoveOtherColumns),
    //Function to convert binary of photo to multiple
    //text values
    ConvertOneFile = (InputRow as list) =>
        let
            BinaryIn = InputRow{0},
            FileName = InputRow{1},
            BinaryText = Binary.ToText(BinaryIn, BinaryEncoding.Base64),
            SplitUpText = SplitTextFunction(BinaryText),
            AddFileName = List.Transform(SplitUpText, each {FileName,_})
        in
            AddFileName,
    //Loops over all photos and calls the above function
    ConvertAllFiles = List.Transform(ListInput, each ConvertOneFile(_)),
    //Combines lists together
    CombineLists = List.Combine(ConvertAllFiles),
    //Converts results to table
    ToTable = #table(type table[Name=text,Pic=text],CombineLists),
    //Adds index column to output table
    AddIndexColumn = Table.AddIndexColumn(ToTable, "Index", 0, 1)
in
    AddIndexColumn

In my next post I’ll show you my original version of this query, explain why it was slow, and try to explain how the version above works and why it is much faster.

Here’s what the query above returns:

image

The Pic column contains the split text values, each of which are less than the 32766 character limit, so when this table is loaded into Power BI no truncation occurs. The index column is necessary because without it we won’t be able to recombine all the split values in the correct order.

The only thing left to do is to create a measure that uses the DAX ConcatenateX() function to concatenate all of the pieces of text back into a single value, like so:

Display Image = 
IF(
    HASONEVALUE('PQ Pics'[Name]),
    "data:image/jpeg;base64, " & 
    CONCATENATEX(
        'PQ Pics', 
        'PQ Pics'[Pic],
        ,
        'PQ Pics'[Index],
        ASC)
        )

…set the data category of this measure to be “Image URL”:

image

…and then display the value of the image in a report:

image

image

Unfortunately, as I also mentioned in my previous post, most DAX functions (and that includes ConcatenateX()) have a limit of around 2.1 million characters so the original 2.1MB file still can’t be displayed, alas:

image

However, I do think this technique will be useful because it allows you to work with much larger pictures than before.

It can also be useful in other situations too. I recently came across a great new custom visual called PDF Viewer that can display PDF files stored in text form in a Power BI report:

image

The example file for this visual shows how a large PDF file can be split across two columns in a table; the technique I describe here is a more practical solution to this problem.

What Is The Maximum Length Of A Text Value In Power BI?

What is the maximum length of a text value in Power BI? It turns out that this is a more complex question than you might think!

The maximum length of a text value that the Power Query engine can load into a single cell in a table in a dataset is 32766 characters – any more than that and the text will be silently truncated. However, if you’re working with text inside the Power Query engine you’ll find that you can work with much longer text values.  To illustrate this, consider the following M query:

let
    Source = 
        #table(
            type table[charcount = number],
            {
                {1},
                {10000},
                {30000},
                {40000}
                }
                ),
    #"Added Custom" = 
        Table.AddColumn(
            Source, 
            "LongText", 
            each Text.Repeat("1", [charcount]),
            type text
            ),
    #"Inserted Text Length" = 
        Table.AddColumn(
            #"Added Custom", 
            "Length", 
            each Text.Length([LongText]), 
            Int64.Type
            )
in
    #"Inserted Text Length"

It creates a table with four rows and three columns. The first column contains the numbers 1, 10000, 30000 and 40000; the second column contains the character “1” repeated the number of times given in the first column; the third column returns the length of the text in the second column using the Text.Length() M function. Here’s the output in the Power Query Editor, which is pretty much as you’d expect:

image

I’m not sure if there is a maximum length for text values in M; I experimented with adding an extra row to the table above with a 900,000,000 character text value and Text.Length() was able to return the correct value, albeit after a bit of a wait.

Load the table above into your Power BI dataset though, and add a DAX calculated column with the following expression:

DAX Length = LEN('LengthsDemo'[LongText])

…and you can see in the Data pane of the main Power BI Desktop window that the long text value in the last row has been truncated to 32766 characters:

image

Once you’ve loaded your data into Power BI the documentation says that the maximum length of a text value is “268,435,456 Unicode characters (256 mega characters) or 536,870,912 bytes”. The bad news is that many DAX functions such as ConcatenateX() put a limit on the length of the text values that they can work with at around 2.1 million characters (thank you Jeffrey Wang for providing this information – it isn’t documented anywhere at the moment). If you exceed this limit you’ll get the following error:

Function ‘PLACEHOLDER’ encountered a Text that exceeds the maximum allowable length.

In summary, then, there are two different practical limits on the maximum length of a text value in Power BI: the 32766 character limit on text being loaded into Power BI, and the 2.1 million character limit in DAX functions. The first of these can be worked around with some clever M – you need to split long text values up into multiple smaller values stored in different columns or rows – but even if you do this, the second limit may stop you recreating the original value after the data has been loaded.

Why is this useful or important? How can you split text values up in M in the most efficient way? I’ll come to that in my next two posts!

Power Query Comes To Azure Data Factory With Wrangling Data Flows

One of the many big announcements at Build this week, and one that caused a lot of discussion on Twitter, was about Wrangling Data Flows in Azure Data Factory. You can read the blog post here:

https://azure.microsoft.com/en-us/blog/analytics-in-azure-remains-unmatched-with-new-innovations/

…but what isn’t clear from this is that it’s basically Power Query Online integrated into ADF. You can see it in action by watching the following video – the demo of Wrangling Data Flows starts at around the 21 minute mark:

https://mybuild.techcommunity.microsoft.com/sessions/76997

image

As the presenter says, the Power Query Online editor generates M in the background as you would expect and “we are going to take this M and translate it into Spark and run it over big data”. Query folding to Spark, basically. More technical detail about all this is available here:

https://github.com/gauravmalhot/wranglingdataflow

…including a document discussing which M functions currently support query folding and which ones as yet don’t. Obviously, this feature will only work well if as much query folding as possible takes place.

This feels like a much more significant win for team Power Query than the integration with SSIS that was announced recently, if only because SSIS is a bit legacy and ADF is the cool new thing. I wonder if this opens up the possibility of integration between Power BI dataflows and ADF in the future, as another example of how self-service BI solutions can be easily transitioned into centrally-managed, enterprise-grade BI solutions? If that happens I hope someone sorts out the dataflow/data flow naming mess.

You can sign up for the preview of Wrangling Data Flows here.

Extracting All The M Code From A Power BI Dataset Using The DISCOVER_M_EXPRESSIONS DMV

DMVs (Dynamic Management Views) are, as the Analysis Services documentation states, “queries that return information about model objects, server operations, and server health”. They’re also available in Azure Analysis Service, Power BI and Power Pivot and are useful for a variety of reasons, for example for generating documentation.

Several as-yet undocumented DMVs have appeared in Power BI recently and one that caught my eye was DISCOVER_M_EXPRESSIONS. Unfortunately, when I tried to run it in DAX Studio against an open Power BI file I got an error saying it was only available in the Power BI Service:

image

Luckily, now that XMLA Endpoints are now in preview and SQL Server Management Studio 18 has been released (which supports connections to Power BI via XMLA Endpoints) we can test it against a published dataset stored in a Premium capacity. The following query can be run from a DAX query window in SQL Server Management Studio:

select * from
$system.discover_m_expressions

…returns a list of all the Power Query queries  in the selected dataset and their M code:

image

If you don’t have Premium you can run the same query from an Excel table against any published dataset using the technique I blogged about here:

image

I know there are other methods for doing this (for example using copy/paste) it’s useful to be able to do this via a DMV because it means you can automate the process of extracting all your M code easily.

Some of the other new DMVs look like they are worthy of a blog post too – I can guess what most of them do from their names, but others are more mysterious and perhaps hint at features that have not been announced yet.

Table.Buffer() Does Not Buffer Nested Values

Here’s yet another entry in the list of useful things I learned from Ehren von Lehe on the Power Query MSDN forum: Table.Buffer() does not buffer nested table, record or list values inside the cells of a table. From this thread:

Buffering is shallow. It will force the evaluation of any scalar cell values, but will leave non-scalar values (records, lists, tables, etc.) as-is.

It’s actually really difficult to come up with a simple demo query to prove this though (the Power Query engine is too clever about not evaluating things it doesn’t need for the final output of a query), but it’s fairly easy to understand the principle. Whenever you have an expression that returns a table something like this:

image

…if you use Table.Buffer() on this table it will only buffer the scalar values (in this case the text values in the Name and Signature columns). The nested table values, as in the highlighted cell, will not be buffered in memory and if you try to access the contents it may result in another call back to the underlying data source.

Excel.Workbook() And The delayTypes Option In Power Query/Power BI

A while ago I found myself wondering – as I often do about this kind of thing – about what the undocumented third parameter of the Excel.Workbook() M function (called delayTypes) actually does. I found a forums post from Guy Hunkin of the Excel Power Query team in 2017 here, which gives the following summary:

Originally, Excel.Workbook used to read the entire workbook data to accurately assign types to the columns. This was EXTREMELY slow on large workbooks.

Having this in mind, we added this flag to delay the behavior. When set to “true”, we don’t infer any column types. Instead, the UI uses its normal inference algorithm to generate a Table.TransformColumnTypes step with the inferred types and if it inferred wrongly, the user can update the transformation explicitly.

I also posted a question on the Power Query MSDN forum and got some more details from Colin Banfield, one of the many M experts who hang out there, as well as doing some investigations of my own. This post summarises what I found out.

First of all, a simple demonstration of what it does. Consider the following Excel table:

image

Connecting to this table in the December 2018 release of Power BI Desktop generates the following M query:

let
    Source = 
    Excel.Workbook(
        File.Contents(
            "C:\DelayTypesTest.xlsx"
            ), 
            null, 
            true
            ),
    SourceData_Table = 
    Source{[Item="SourceData",Kind="Table"]}[Data],
    #"Changed Type" = 
    Table.TransformColumnTypes(
        SourceData_Table,
        {
            {"TextColumn", type text}, 
            {"IntegerColumn", Int64.Type}, 
            {"DecimalColumn", type number}, 
            {"DateColumn", type date}
            }
            )
in
    #"Changed Type"

Notice that the delayTypes option is now automatically set to true and that there is a “Changed Type” step. The output of this query, as seen in the Power Query Editor, is what you would expect:

image

Compare this with the output of the following query which has delayTypes set to false and has no “Changed Type” step:

let
    Source = 
    Excel.Workbook(
        File.Contents(
            "C:\DelayTypesTest.xlsx"
            ), 
            null, 
            false
            ),
    SourceData_Table = 
    Source{[Item="SourceData",Kind="Table"]}[Data]
in
    SourceData_Table

image

There are two things to notice:

  • With delayTypes set to false no “Changed Type” step is necessary, the data types on the columns are set by Excel.Workbook(). With delayTypes set to true all the columns returned by Excel.Workbook() have their data type set to Any and a subsequent “Changed Type” step is necessary to set data types to something useful.
  • With delayTypes set to true the IntegerColumn and the DecimalColumn columns have different data types assigned to them by the “ChangedType” step: IntegerColumn becomes Int64.Type whereas DecimalColumn becomes type number. With delayTypes set to false both columns are set to number. So it looks like Excel.Workbook with delayTypes set to false will only convert columns to primitive data types.

The second question is: if it is, primarily, a performance optimisation how much of an impact does it have? I built four queries that read all the data from an 80MB xlsx file to test the performance of all of the variations of delayTypes true/false and having a “Changed Type” step present or not (just in case there was an overhead to having a “Changed Type” step). Here are the results – the main timings are in minutes and seconds, and in brackets is the length of the “Evaluating” phase of query execution:

delayTypes=false delayTypes=true
no “Changed Type” step 2:08
(1:21)
1:06
(0:00)
“Changed Type” present 2:19
(1:30)
1:08
(0:00)

I wouldn’t read too much into these exact timings because in my experience timings for the same query can fluctuate quite a lot, but one thing is clear: setting delayTypes to true results in much better performance. Indeed with delayTypes set to false nearly half the time was spent in the “Evaluating” phase whereas with delayTypes set to true there was no “Evaluating” phase at all. As a result I can’t think of a reason not to set the delayTypes option to true.

It’s worth noting that delayTypes is set to false by default if you don’t specify this option, so be aware of this if you are writing your own M code.

Power BI/Power Query/M Pre-cons At SQLKonferenz, SQLBits and SQL Saturday Iceland

If you’re learning Power BI and you want to deepen your understanding of its data transformation and loading functionality, the Power Query engine and the M language, I’m doing one-day pre-conference seminars on this subject at three conferences over the next few months:

You’ll learn about all the functionality in the Power Query Editor window plus concepts such as merging, appending, parameters, functions, data privacy and M. I hope to see you at one of them!

ODBC Data Sources, The SqlCapabilities Option And Power Query/Power BI Data Refresh Performance

A few weeks ago I received a great tip from Simon Nuss about a Power Query feature relating to ODBC data sources that I hadn’t seen before (although it’s in Power BI Desktop and Excel 365, so it can’t be that new) and which is potentially important for the performance of data refresh and of the Power Query Editor’s preview pane. It’s only visible if you click “Advanced options” when creating your ODBC data source:

image

It’s the “Supported row reduction clauses” option:

image

What does it do? Well, first of all let’s see what happens if you don’t set it. Let’s say you have an ODBC data source that uses the Microsoft ODBC Driver for SQL Server to connect to the Adventure Works DW database. If you create a new query that uses this data source to get the contents of the DimDate table it works pretty much as you’d expect; here’s the M code:

let
    Source =
        Odbc.DataSource(
            "dsn=Test SQL",
            [HierarchicalNavigation=true]
        ),
    #"Adventure Works DW_Database" =
        Source{
            [Name="Adventure Works DW",Kind="Database"]
            }[Data],
    dbo_Schema =
        #"Adventure Works DW_Database"{
            [Name="dbo",Kind="Schema"]
            }[Data],
    DimDate_Table =
        dbo_Schema{[Name="DimDate",Kind="Table"]}[Data]
in
    DimDate_Table

image

However, if you look in Profiler when the contents of the Preview window is refreshed, you’ll see that the Power Query engine is running a query that gets the entire contents of the table:

image

Obviously, if you are connecting to a big table this could make the Power Query Editor slow. What’s more, as this blog post shows, the Power Query Editor only actually wants to display 1000 rows and the fact that the query is requesting all the rows from the table suggests that even basic query folding is not happening on this ODBC data source.

Now let’s see what happens if you do select something from the “Supported row reduction clauses” dropdown. As the screenshot above shows, there are four values in the dropdown that relate to the form of SQL supported by your ODBC source to only get the first N rows from a table. If you don’t know which one to choose you can click the Detect button and it will select one for you. In this case I selected TOP:

image

Having done this, when you connect to the same DimDate table, you’ll see that now the Power Query Editor is only requesting the top 1000 rows from the table:

image

Also, if you look at the M code for the query, the Odbc.DataSource function has the as-yet undocumented SqlCapabilities option set in its second parameter,

Odbc.DataSource(
 "dsn=Test SQL",
 [HierarchicalNavigation=true,
 SqlCapabilities=[LimitClauseKind=LimitClauseKind.Top]
 ]
)

Actually it’s not quite undocumented because it is mentioned here as something that is only available if you’re using Odbc.DataSource in a custom connector; obviously things have changed.

Setting this option can make a big difference to the responsiveness of the Power Query Editor when you’re working with large tables and/or slow data sources – I saw this myself this week when working with one of my customers. It’s also going to be important if you use the Keep Rows/Keep Top Rows button on the ribbon or the Table.FirstN() M function; I suspect there are other, more obscure, reasons why it might speed refresh up even if you aren’t filtering the rows of the table but I’m not 100% sure.

One last thing to mention is that there is another undocumented function which is what, I think, the Detect button in the screenshots above uses to find out how to set the SqlCapabilities option. It’s called Odbc.InferOptions and it can be used like so:

Odbc.InferOptions("dsn=Test SQL")

This function returns a record containing a field called LimitClauseKind that tells you the type of limit clause that is supported:

image

image

Here’s a list of the possible LimitClauseKind values I found using #shared:

image

Data Privacy Settings And Data Refresh Performance In Excel And Power BI

I have known for a long time that data privacy checks can have an impact on the performance of data refresh in Power BI and Excel, but on a recent performance tuning engagement I had the chance to observe just how much of a difference changing these settings can make. Unfortunately I can’t share the M code for the queries I was working with but the scenario is very common:

  • The only data source was a single Excel workbook with multiple worksheets
  • There were multiple Power Query/M queries reading data from these worksheets; the output of these queries was not being loaded into the Power BI dataset
  • There were further Power Query/M queries that referenced these source queries, none of which were being loaded into the Power BI dataset, and they all fed into a single query whose output was being loaded into a dataset
  • Overall data volumes were very small, less than 1MB, and the final query only returned approximately 5000 rows
  • Data refresh times were exceedingly slow, even with the “allow data preview to download in the background” setting turned off, and there was a long period where the Power Query engine was stuck in the “Evaluating” phase of the query during refresh in Power BI Desktop before the dialog showed the amount of data being loaded:

    image

  • Data refresh times also varied by +/-25% (I have a theory why this is happening – if I can confirm it I will blog about this) from one refresh to another

As a benchmark, here are some typical refresh timings for the same table in Power BI Desktop (December 2018 version) and Excel for Office 365 (build 11126.20118 click-to-run) with the default setting of data privacy checks turned on and the data privacy level set for the source Excel file set to Public:

Power BI Desktop Excel
Evaluating Phase (Seconds) 122 173
Data Load Phase (Seconds) 140 113
Total Refresh Time (Seconds) 262 286

Here are the same timings with the Power Query engine set to combine data according to each file’s data privacy settings, and the data privacy checks for the current file turned off:

image

image

Power BI Desktop Excel
Evaluating Phase (Seconds) 69 91
Data Load Phase (Seconds) 108 112
Total Refresh Time (Seconds) 177 203

Clearly there are some substantial query performance gains to be had by turning off data privacy checks and these gains are made in the Evaluating phase. In this case the queries in both Power BI Desktop and Excel were around 30% faster with data privacy checks turned off.

BUT is it safe to turn off data privacy checks? I strongly advise you to read my series on data privacy settings starting here, in particular this post on disabling checks, to understand the issues involved. You should not turn off data privacy checks without understanding what they do because you may end up sending sensitive data to somewhere it shouldn’t go and maybe even break the law. What’s more, if you have published a Power BI dataset you cannot disable data privacy checks for scheduled refresh unless you are using the gateway in Personal mode and have configured it with the “Fast Combine” option turned on, as shown here. Therefore turning off privacy checks is only really useful for Power Query users in Excel or to speed up development in Power BI Desktop.

In this particular case there is only one data source, an Excel workbook, a source which does not support query folding, so as far as I can see there is no risk in turning off data privacy checks because there is no way data from this Excel workbook can be sent anywhere else. Now why couldn’t the Power Query engine work this out?

Credentials, Data Privacy Settings And Data Sources In Power Query/Power BI

Recently I’ve been doing some more investigations into how data privacy settings work in Power BI. This is a subject I’ve blogged about in great detail already in a series of posts last year, but this functionality is so complex that there is always more to learn. I don’t have any profound new insights to offer; instead this blog post is a write up of a series of experiments whose results shed light onto how the process of setting data privacy levels works end-to-end.

Consider the following M query:

let
    Source = 
        Json.Document(
            Web.Contents(
                "https://data.gov.uk/api/3/action/package_search?q=cows"
                )
            ),
    result = Source[result],
    results = result[results],
    results1 = results{0},
    id = results1[id],
    output = 
        Json.Document(
            Web.Contents(
                "https://data.gov.uk/api/3/action/package_show?id=" & id
                )
            )
in
    output

The query does the following:

  • Runs a query against the UK government’s open data search API (the same API I use in this post on the RelativePath and Query options for the Web.Contents() function) to get a list of datasets related to the search term “cows” via the Package Search endpoint
  • Gets the first dataset returned by the search and finds its ID
  • Sends this ID to the Package Show endpoint in the same API to get the full JSON representation of this data set. Note that the entire URL is dynamically generated and that the Query option of Web.Contents() is not used here.

It’s a typical scenario where data privacy settings can cause problems: data from one data source, the package_search endpoint, is sent to another data source, the package_show endpoint. My series from last year on data privacy settings provides some useful background information on why this is such an important thing for the Power Query engine.

Assuming that you have never used this API before, when you try to run this query in the Power Query Editor in Power BI Desktop, you’ll see the following prompt to edit the credentials used:

image

Before you click the Edit Credentials button, there are two interesting things to point out. First, in the Query Dependencies view, you see this:

image

Notice that the Package Search endpoint is shown but not the Package Show endpoint.

Second, if you click the Data Source Settings button, you’ll see the following in the dialog that appears:

image

Not only does it only show the Package Search endpoint, there is a warning that says:

“Some data sources may not be listed because of hand-authored queries”

This refers to the output step in the query that calls the Package Show endpoint with the dynamically-generated url.

Closing this dialog and going back to the Query Editor, if you click the Edit Credentials button, you can set credentials for the data source (anonymous access is fine in this case). These credentials can be set at all levels in the path down to https://data.gov.uk/api/3/action/package_search.

image

Setting credentials at the level of https://data.gov.uk means you only get prompted once; however if you select https://data.gov.uk/api/3/action/package_search from the dropdown list and click Connect you will get prompted again to set credentials, this time with a dropdown that shows all paths down to package show:

image

Assuming you set credentials at the level of https://data.gov.uk and click Connect, then Fiddler shows that a call is made to https://data.gov.uk/api/3/action/package_search?q=cows, presumably to check whether the credentials entered actually work and you move back to the Query Editor.

image

Next, in the Query Editor, you see the data privacy settings prompt:

image

Clicking Continue brings up the data privacy levels dialog:

image

You have the choice to ignore privacy levels for this file, but of course you should always try to avoid doing that. You also have two dropdown boxes that both show https://data.gov.uk on the left-hand side and another two dropdown boxes next to them, although only the top one of these is enabled.

In the first column of dropdown boxes, in the first dropdown, you can see all points in the path from https://data.gov.uk to https://data.gov.uk/api/3/action/package_search:

image

In the dropdown box immediately underneath you can see for the first time all points in the path from https://data.gov.uk to https://data.gov.uk/api/3/action/package_show:

image

If you select https://data.gov.uk in the top-left dropdown only the top-right dropdown is enabled, and in the top-right dropdown you can set the data privacy levels Public, Organizational and Private.

image

The meanings of these three levels are described in my earlier series and here, and I won’t go into detail about what they mean in this post. The bottom-right dropdown is disabled because if you set privacy levels for https://data.gov.uk then all urls that start with this path inherit this setting. This is similar to what happens with the None data privacy setting that I describe here, I think.

Setting the data privacy level to Public on https://data.gov.uk in the top-right dropdown means the query runs successfully:

image

The expected activity is shown in Fiddler:

image

And at last the Package Show endpoint is shown in the Query Dependencies view:

image

The Data Source Settings dialog shows the same as it does above in the “Data sources in current file” tab, including the warning about hand-authored queries, but on the “Global permissions” tab there is now an entry for https://data.gov.uk :

image

Although you only set a privacy level for https://data.gov.uk earlier, it’s interesting to note that the entry for https://data.gov.uk/api/3/action/package_search has a privacy level set explicitly to Public and not to None:

image

Stepping back a few steps to the Privacy Levels dialog, if you set a privacy level of Private for https://data.gov.uk like so:

image

…then the query fails with the error “Formula.Firewall: Query ‘Query1’ (step ‘output’) is accessing data sources that have privacy levels which cannot be used together. Please rebuild this data combination.”:

image

From my point of view, this is the first really interesting finding: the two endpoints, https://data.gov.uk/api/3/action/package_search and https://data.gov.uk/api/3/action/package_show, are considered as separate data sources (which tallies with what is shown in the Query Dependencies view) even though they have both inherited their data privacy level setting from https://data.gov.uk. Since they are both Private then data cannot be sent from one source to the other, hence the error.

The second interesting finding becomes apparent if you follow the steps above with a slightly different version of the original query that uses the Query option in the call to the Package Show endpoint:

let
    Source = 
        Json.Document(
            Web.Contents(
                "https://data.gov.uk/api/3/action/package_search?q=cows"
                )
            ),
    result = Source[result],
    results = result[results],
    results1 = results{0},
    id = results1[id],
    output = 
        Json.Document(
            Web.Contents(
                "https://data.gov.uk/api/3/action/package_show",
                [Query=[#"id"=id]]
                )
            )
in
    output

In this case when you look in the Data Source Settings dialog you now see both endpoints listed and you no longer see the “hand-authored queries” warning:

image

It looks like whatever method it is that the Power Query engine searches for data sources inside a query is confused by dynamically generated urls – which might also explain why data sources that use dynamic urls can’t be refreshed after publishing.

%d bloggers like this: