Handling Multiple URL Query Parameters With The Same Name Using Web.Contents In Power Query/Power BI

Some time ago I wrote a pair of popular posts about using the Query and RelativePath options of the Web.Contents function in Power Query and why they are important for dataset refresh. I have recently learned something extra about this subject which merits a new post, though: how to handle multiple URL query parameters with the same name.

In the examples for this post I’ll be using a free, fake web service available at http://jsonplaceholder.typicode.com/ which requires no authentication so you will be able to run the code I show yourself. Let’s start by considering the following call to the comments resource of this API:

http://jsonplaceholder.typicode.com/comments?postId=1

In M, you can use the Web.Contents function and the Query option to call the API like so:

//Generates the URL http://jsonplaceholder.typicode.com/comments?postId=1
Web.Contents("http://jsonplaceholder.typicode.com/comments",[Query=[postId="1"]])

Now this API – unlike many others – allows you to pass multiple URL query parameters of the same name. For example this is a valid call:

http://jsonplaceholder.typicode.com/comments?postId=1&postId=2

The problem is, if you try to do the following in M:

Web.Contents("http://jsonplaceholder.typicode.com/comments",[Query=[postId="1",postId="2"]])

…you’ll get an error, because the Query option take a record and a record can’t have two fields with the same name.

Here’s where it gets interesting. In the working example above, the postId field in the Query record contains a text value. The field can contain a list of text values instead, though, and this is how you solve the problem:

//Generates the URL http://jsonplaceholder.typicode.com/comments?postId=1&postId=2
Web.Contents("http://jsonplaceholder.typicode.com/comments",[Query=[postId={"1","2"}]])

This results in a URL with two query parameters that are both called postId but have the values 1 and 2 respectively.

There’s something else worth mentioning about this too. Sometimes when you’re generating a URL you may not want to add a query parameter to it if a value is null. One way of handling this is to start with an empty record and then add fields to it using the Record.AddField function, but using an empty list provides another approach. For example:

//Generates the URL http://jsonplaceholder.typicode.com/comments
//and *not* http://jsonplaceholder.typicode.com/comments?postId=
Web.Contents("http://jsonplaceholder.typicode.com/comments",[Query=[postId={}]])

…generates a URL without the postId query parameter in. This means you can write a simple function with an optional parameter for postId like this:

(optional myPostId as text) =>
Json.Document(
    Web.Contents("http://jsonplaceholder.typicode.com/comments",[Query=[postId=myPostId ?? {}]])
    )

If a text value is passed to myPostId then the postId query parameter is added to the URL; if no value is passed, myPostId is null and the ?? null coalescing operator (see Ben Gribaudo’s post here for more information on this operator) can be used to replace it with an empty list.

One last thing to mention: if you need to see the web service calls generated by Web.Contents when you’re testing in the Power Query Editor, you can use the query diagnostics functionality as I show here.

[I’m extremely grateful to Dave Maiden for letting me know about all this – it answers a question I have been asked several times in the past]

Comparing The Performance Of Reading Data From Files With File.Contents And Web.Contents In Power Query And Power BI

In my last post I mentioned the Power Query engine’s persistent cache, which in some scenarios caches the data read from a data source when a query is refreshed. Another important nugget of information that Ehren von Lehe of the Power Query dev team mentioned in a post on the Power Query MSDN forum recently is the fact that if you use File.Contents to get data from a file then the persistent cache is not used, but if you use Web.Contents to get data from the same file then the persistent cache is used. I guess the thinking here is that there is no point creating an on-disk cache containing the contents of a file that is already on disk.

Using Process Monitor (see here and here for more information on how to do this) to view how much data is read from disk when a query is run it is possible to see this in action. Here’s the graph showing how much data is read from a 150MB csv file when a slow query is refreshed. This particular slow query results in five reads to the csv file – it’s more or less the same scenario from my second Process Monitor blog post here, with the query itself described in a lot of detail here. Here’s a graph of the data captured by Process Monitor showing time in seconds on the X axis and amount of data read in MB on the Y axis:

image

Since this query uses File.Contents to get the data from the csv file, the persistent cache is not used; as you can see, the graph shows clearly that the full contents of the csv file are read five times.

The same query altered to use Web.Contents shows just two full reads:

image

I have no idea why the file is read twice rather than once, but it’s definitely different behaviour to the version that uses File.Contents.

As far as I can see it is possible to replace File.Contents with Web.Contents in every case. So, if you have the following expression:

File.Contents("c:\users\myuser\Desktop\file.txt")

You can just replace it with:

Web.Contents("c:\users\myuser\Desktop\file.txt")

Which one is faster though? Just because a query reads data from disk more often does not necessarily mean that it will be slower. In the above scenario, with the csv file stored on my local hard drive, the Web.Contents version of the query refreshes in 18 seconds while the File.Contents version refreshes in 14 seconds. Replacing the csv file with an Excel file that contains the same data (remember that Excel files are a lot slower than csv files to read data from, as I showed here) results in the version of the query that uses File.Contents running in 205 seconds, while the version that uses Web.Contents running in 297 seconds. So it looks like, in most cases, File.Contents is the right choice when reading data from a file (as you would hope).

However, when using the same csv file stored on a network file share, the Web.Contents version takes 23 seconds while the File.Contents version takes 25 seconds. So maybe if you are dealing with files that are stored remotely over a slow connection it might be worth replacing File.Contents with Web.Contents to see if you get any performance benefits. There may be other situations where Web.Contents is the faster choice too. If you test this and see a difference, let me know by leaving a comment!

UPDATE: please also read Curt Hagenlocher’s comment below – Web.Contents may be changed in the future so it only works with http/https

The CredentialQuery Option For Web.Contents() In Power BI Custom Data Connectors

There’s a new, as yet undocumented, option for the Web.Contents() M function that is only available for custom data connectors and not in Power BI Desktop: CredentialQueryOption. It does the same thing as the Query option, but it stops the values you are passing to the query parameter from being logged in Power BI’s diagnostic logs, so it’s useful if you are passing sensitive information such as passwords.

For example, consider the following call to Web.Contents() used in a custom data connector:

Web.Contents(
 "https://data.gov.uk/api", 
 [RelativePath="3/action/package_search", 
 Query=[q="cows"]
]
)

When the custom data connector is used in Power BI Desktop, if diagnostic logging is turned on in the Options dialog:

image

…then, if you look in the log files you’ll see the entire url called including the search term “cows”:

image

However, if you change the M code to:

Web.Contents(
 "https://data.gov.uk/api", 
 [RelativePath="3/action/package_search", 
 CredentialQuery=[q="cows"]
]
)

So that the CredentialQuery option is used instead of the Query option, the search term will not appear in the log file.

Thanks to Curt Hagenlocher for providing this information.

Webcast: Accessing Web Services With Power BI And Power Query

Earlier this week I gave a webcast on accessing web services with Power BI, Power Query and M on Reza Rad’s YouTube channel. You can watch it here:

It’s an introduction to the subject: I cover the basics of using Web.Contents but don’t go into all the obscure details of what each of the options for it do (most of which I have blogged about anyway). I hope you find it useful!

Displaying Routes On A Map In Power BI

In last week’s post I described the new Power Query M functions for working with Well Known Text objects, and in a post a few weeks ago I showed how you can use the Icon Map custom visual to display Well Known Text data. In this post I’ll show you how you can put all this together to do something really useful: display routes on a map in a Power BI report.

First of all you’ll need to download the latest version of the Icon Map custom visual here, because at the time of writing the version in AppSource doesn’t have the WKT functionality needed. Second, you’ll need a data source with a table of latitudes and longitudes that represent points on a route that you want to plot. I’m going to use the Azure Maps Get Route Directions API as my data source; to keep things simple I’m going to use the Shared Key authentication method for this API, but I’ll explain below why this can be a problem in the real world. To follow the example you’ll need to create an Azure Maps Account in the Azure Portal so you can get your own key.

Here’s a simple example to start off with. The following M code:

let
  Source = Json.Document(
      Web.Contents(
          "https://atlas.microsoft.com/route/directions/json", 
          [ApiKeyName = "subscription-key", Query
            = [
            #"api-version" = "1.0", 
            query
              = "51.502243,-0.140073:51.517777,-0.138621", 
            travelMode = "pedestrian"
          ]]
        )
    ),
  routes = Source[routes],
  routes1 = routes{0},
  legs = routes1[legs],
  legs1 = legs{0},
  points = legs1[points],
  PointsList = List.Transform(
      points, 
      each GeographyPoint.From(
          [longitude], 
          [latitude]
        )
    ),
  LineRecord = [Kind = "LINESTRING", Points
    = PointsList],
  WKTLineString = Geography.ToWellKnownText(
      LineRecord
    ),
  ToTable = #table(
      type table[
        Category = number, 
        WKT = text, 
        Size = number
      ], 
      {{1, WKTLineString, 1}}
    )
in
  ToTable

…calls the Azure Maps Get Route Directions API to get a walking route between two locations in central London. When you run the query you’ll be prompted to authenticate – choose the Web API option and enter your key. It’s a very basic example of how the API can be used: I have assumed there’s only one leg in the route, then taken the list of latitudes and longitudes for that single leg, turned that into a list of points and then turned that list into a linestring as I showed in last week’s blog post. Here’s the output in the Icon Map visual:

This is what the table returned by the query looks like:

The Category and Size columns aren’t very interesting; Icon Map needs these columns to display data. The WKT column contains the linestring object. Here’s how the visual is set up in Power BI:

We can do better than this though. In the example pbix file I’ve uploaded here, I’ve created a more sophisticated query that asks for six different routes between the two locations used above that all go via one waypoint (so there are two legs in the route data returned). I won’t go through the M code because there’s nothing interesting in it, but the end result is a report that allows you to plot these six different routes in different colours on a map and use a slicer to hide and show them, along with some information on their length and travel time:

One last thing to point out regarding these examples is that they won’t refresh if you publish them to the Power BI Service because I’ve used the ApiKeyName option in Web.Contents to handle key authentication (see here) and that isn’t supported in the Service yet. To work around this you’ll need to hard code the Azure Maps API key in your query and use the Skip Test Connection option in the Power BI Service.

Using The “Skip Test Connection” Option For Power BI Data Sources To Avoid Scheduled Refresh Failures

One of the most popular blog posts I have written in recent years is this post on how the RelativePath and Query options for the Web.Contents() M function can be used to allow datasets that use web services as data sources to refresh successfully in the Power BI Service. It’s very frustrating when you write some complex M code in Power BI Desktop and find that your dataset can’t be refreshed after it has been published! While, at the time of writing, this problem is still present, yesterday’s announcement of the new “Skip Test Connection” option for data sources in the Power BI Service does mean there are more scenarios where your datasets can be made to refresh.

As I mentioned in that original post, the trick of using the RelativePath and Query options with Web.Contents() only works if the first parameter passed to Web.Contents() is a url that itself can be called on its own without an error. For example, the following expression:

Web.Contents(
	"http://www.myservice.com",
	[RelativePath="hello",
	 Query=[q="SearchTerm"]
	])

…will only refresh successfully in the Power BI Service if http://www.myservice.com can be reached as a url on its own without returning an error. I believe this is because this url is used when the Power BI Service is preparing to run the query to check whether the credentials it has stored are valid.  As the blog post says there is a workaround where dummy query parameter values can be used in that first parameter, but this won’t work in all cases. Instead, now you can turn on the “Skip Test Connection” option on the data source in the Power BI Service and the dataset will refresh even if http://www.myservice.com on its own returns an error.

Similarly this new option helps when you are using API keys passed in through query parameters for authentication. In Power BI Desktop and Excel Get&Transform/Power Query, you have the option to use the Web API authentication type to store sensitive API keys that need to be passed through url parameters securely rather than embed them in your M code. Unfortunately this is not supported in the Power BI Service yet and (as I found out on a recent consultancy engagement) there are other complications with using Anonymous authentication. For example, say you need to call a web service with an M expression something like this:

Web.Contents("http://www.myservice.com?key=password123")

…where the key query parameter contains an API key used for authentication. If you set this data source to use Anonymous authentication in Power BI Desktop or Excel it will refresh with no problems. However, after you publish to the Power BI Service you will not be able to refresh your dataset unless you turn on the “Skip Test Connection”, because the Power BI Service apparently strips out any query parameters from the url when it is testing the connection. In this case, if it strips out the key parameter, then obviously the call will fail because it has removed the means of authentication.

This new option does not mean every dataset can now be refreshed. As far as I can see, if you dynamically generate a url for use with Web.Contents() and do not use the RelativePath and/or Query options you will not be able refresh your dataset because the Power BI Service will not be able to analyse your code to discover what urls are present in it. I suspect that if you see the “Some data sources may not be listed because of hand-authored queries” warning in the Data Source Settings dialog in Power BI Desktop (which I blogged about here) then you will not be able to refresh your dataset in the Power BI Service, but I need to do more testing to be sure.

As always, if you test this yourself please let me know about any interesting things you find by leaving a comment!

Troubleshooting Power BI Timeouts, Part 2: Timeouts Specified In Power Query Functions

In the first post in this series I showed how the Power BI Service applies a limit on the total amount of time it takes to refresh a dataset in the Power BI Service, except when you initiate your refresh via an XMLA Endpoint. In this post I’ll look at the various timeouts that can be configured in Power Query functions that are used to access data.

Every time a Power BI Import mode dataset connects to a data source it goes through a Power Query query, and inside the code of that Power Query query will be an M function that connects to a specific type of data source. Most – but not all – of these M functions have the option to set timeouts. The M functions that connect to relational databases (for example Sql.Database which is used to connect to SQL Server-related sources, or Odbc.DataSource which is used to connect to ODBC sources) typically allow you to configure two types of timeout:

  • A connection timeout, which specifies how long to wait when trying to open a connection to the data source
  • A command timeout, which specifies how long the query to get data from the source is allowed to run

Some other functions have other timeouts more appropriate to the data source they access: for example Web.Contents and OData.Feed have a Timeout property for the HTTP requests they make behind the scenes. Other functions (for example Excel.Workbook) have no timeout-related properties that you can set at all. You should check the documentation , either online or using the #shared intrinsic variable in the Power Query Editor, to see what options are available and what the default timeouts are.

Here’s a simple example of how to set a timeout when connecting to SQL Server. First of all, I created a scalar function called ForceDelay in TSQL that returns a value of 1 after a specified number of seconds, so the query:

select [dbo].[ForceDelay](10) as MyColumn

…takes 10 seconds to run.

When you connect to SQL Server in Power BI/Power Query you can paste a SQL query in the connection dialog under Advanced Options; when you do so you’ll also see the option to set the command timeout in minutes:

Here’s the M code generated when you use the SQL query above, set the Command timeout to 1 minute and click OK:

let
  Source = Sql.Database(
    "localhost",
    "AdventureWorksDW2017",
    [
      Query
        = "select [dbo].[ForceDelay](10) as MyColumn",
      CommandTimeout = #duration(
        0,
        0,
        1,
        0
      )
    ]
  )
in
  Source

Notice that the CommandTimeout option has been set on the Sql.Database function, and that the value passed to it is a duration of one minute defined using #duration. Since the SQL query takes 10 seconds to run and the timeout is 1 minute, it runs successfully.

However, if you set the CommandTimeout option to 5 seconds like so:

let
  Source = Sql.Database(
    "localhost", 
    "AdventureWorksDW2017", 
    [
      Query
        = "select [dbo].[ForceDelay](10) as MyColumn", 
      CommandTimeout = #duration(
        0, 
        0, 
        0, 
        5
      )
    ]
  )
in
  Source

…then the query will take longer that the timeout and you’ll see the following error in the Power Query Editor:

The error message is:

DataSource.Error: Microsoft SQL: Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.

If you hit the timeout when refreshing in the main Power BI window you’ll see the same message in the refresh dialog box:

So, as you can see, if you are working with large data volumes or slow queries you will need to be aware of the default timeouts set in the M functions you’re using and alter them if need be.

A Look At Lobe – A Free, Easy-To-Use Tool For Training Machine Learning Models

A few months ago I heard about a new tool from Microsoft called Lobe which makes it easy to train machine learning models. It’s nothing to do with Power BI but I find anything to do with self-service data analytics interesting, and when I finally got round to playing with it today I thought it was so much fun that it deserved a blog post.

You can download it and learn more at https://www.lobe.ai/ and there’s a great ten minute video describing how to use it here:

The most impressive thing about it is not what it does but how it does it: a lot of tools claim to make machine learning easy for non-technical users but Lobe really is easy to use. My AI/ML knowledge is very basic but I got up and running with it extremely quickly.

To test it out I downloaded lots of pictures of English churches and trained a model to detect whether the church had a tower or a spire. After I labelled the pictures appropriately:

…Lobe was able to train the model:

I could test it inside the tool. The model was able to tell whether a church had a tower:

…or a spire:

…very reliably!

If I have one criticism it’s that when you want to use your model things get a lot more technical, at least compared to something like AI Builder for Power Apps and Power Automate, but I guess that’s because it is just a tool for training models. There have been some recent improvements here though (see this blog post) and Lobe does provide a local API for testing purposes that can be consumed in Power BI with some custom M code.

Here’s an example of how to call the local API in Power Query:

let
  Source = Folder.Files("C:\Churches"),
  #"Removed Other Columns"
    = Table.SelectColumns(
    Source,
    {"Content", "Name"}
  ),
  #"Added Custom" = Table.AddColumn(
    #"Removed Other Columns",
    "CallAPI",
    each Text.FromBinary(
      Web.Contents(

        //Insert Lobe Connect URL here                              
        "http://localhost...",
        [
          Content = Json.FromValue(
            [
              image = Binary.ToText(
                [Content],
                BinaryEncoding.Base64
              )
            ]
          ),
          Headers = [
            #"Content-Type"
              = "application/json"
          ]
        ]
      )
    )
  ),
  #"Parsed JSON"
    = Table.TransformColumns(
    #"Added Custom",
    {{"CallAPI", Json.Document}}
  ),
  #"Expanded CallAPI"
    = Table.ExpandRecordColumn(
    #"Parsed JSON",
    "CallAPI",
    {"predictions"},
    {"predictions"}
  ),
  #"Expanded predictions"
    = Table.ExpandListColumn(
    #"Expanded CallAPI",
    "predictions"
  ),
  #"Expanded predictions1"
    = Table.ExpandRecordColumn(
    #"Expanded predictions",
    "predictions",
    {"label", "confidence"},
    {"label", "confidence"}
  ),
  #"Pivoted Column" = Table.Pivot(
    #"Expanded predictions1",
    List.Distinct(
      #"Expanded predictions1"[label]
    ),
    "label",
    "confidence",
    List.Sum
  ),
  #"Changed Type"
    = Table.TransformColumnTypes(
    #"Pivoted Column",
    {
      {"Tower", type number},
      {"Spire", type number}
    }
  ),
  #"Removed Columns"
    = Table.RemoveColumns(
    #"Changed Type",
    {"Content"}
  )
in
  #"Removed Columns"

You can export models to a variety of other places for production use, including Azure Functions and Azure Machine Learning.

Definitely something to keep an eye on, especially because it will soon be able to do object detection and data classification as well as image classification.

Credentials, Data Privacy Settings And Data Sources In Power Query/Power BI

Recently I’ve been doing some more investigations into how data privacy settings work in Power BI. This is a subject I’ve blogged about in great detail already in a series of posts last year, but this functionality is so complex that there is always more to learn. I don’t have any profound new insights to offer; instead this blog post is a write up of a series of experiments whose results shed light onto how the process of setting data privacy levels works end-to-end.

Consider the following M query:

let
    Source = 
        Json.Document(
            Web.Contents(
                "https://data.gov.uk/api/3/action/package_search?q=cows"
                )
            ),
    result = Source[result],
    results = result[results],
    results1 = results{0},
    id = results1[id],
    output = 
        Json.Document(
            Web.Contents(
                "https://data.gov.uk/api/3/action/package_show?id=" & id
                )
            )
in
    output

The query does the following:

  • Runs a query against the UK government’s open data search API (the same API I use in this post on the RelativePath and Query options for the Web.Contents() function) to get a list of datasets related to the search term “cows” via the Package Search endpoint
  • Gets the first dataset returned by the search and finds its ID
  • Sends this ID to the Package Show endpoint in the same API to get the full JSON representation of this data set. Note that the entire URL is dynamically generated and that the Query option of Web.Contents() is not used here.

It’s a typical scenario where data privacy settings can cause problems: data from one data source, the package_search endpoint, is sent to another data source, the package_show endpoint. My series from last year on data privacy settings provides some useful background information on why this is such an important thing for the Power Query engine.

Assuming that you have never used this API before, when you try to run this query in the Power Query Editor in Power BI Desktop, you’ll see the following prompt to edit the credentials used:

image

Before you click the Edit Credentials button, there are two interesting things to point out. First, in the Query Dependencies view, you see this:

image

Notice that the Package Search endpoint is shown but not the Package Show endpoint.

Second, if you click the Data Source Settings button, you’ll see the following in the dialog that appears:

image

Not only does it only show the Package Search endpoint, there is a warning that says:

“Some data sources may not be listed because of hand-authored queries”

This refers to the output step in the query that calls the Package Show endpoint with the dynamically-generated url.

Closing this dialog and going back to the Query Editor, if you click the Edit Credentials button, you can set credentials for the data source (anonymous access is fine in this case). These credentials can be set at all levels in the path down to https://data.gov.uk/api/3/action/package_search.

image

Setting credentials at the level of https://data.gov.uk means you only get prompted once; however if you select https://data.gov.uk/api/3/action/package_search from the dropdown list and click Connect you will get prompted again to set credentials, this time with a dropdown that shows all paths down to package show:

image

Assuming you set credentials at the level of https://data.gov.uk and click Connect, then Fiddler shows that a call is made to https://data.gov.uk/api/3/action/package_search?q=cows, presumably to check whether the credentials entered actually work and you move back to the Query Editor.

image

Next, in the Query Editor, you see the data privacy settings prompt:

image

Clicking Continue brings up the data privacy levels dialog:

image

You have the choice to ignore privacy levels for this file, but of course you should always try to avoid doing that. You also have two dropdown boxes that both show https://data.gov.uk on the left-hand side and another two dropdown boxes next to them, although only the top one of these is enabled.

In the first column of dropdown boxes, in the first dropdown, you can see all points in the path from https://data.gov.uk to https://data.gov.uk/api/3/action/package_search:

image

In the dropdown box immediately underneath you can see for the first time all points in the path from https://data.gov.uk to https://data.gov.uk/api/3/action/package_show:

image

If you select https://data.gov.uk in the top-left dropdown only the top-right dropdown is enabled, and in the top-right dropdown you can set the data privacy levels Public, Organizational and Private.

image

The meanings of these three levels are described in my earlier series and here, and I won’t go into detail about what they mean in this post. The bottom-right dropdown is disabled because if you set privacy levels for https://data.gov.uk then all urls that start with this path inherit this setting. This is similar to what happens with the None data privacy setting that I describe here, I think.

Setting the data privacy level to Public on https://data.gov.uk in the top-right dropdown means the query runs successfully:

image

The expected activity is shown in Fiddler:

image

And at last the Package Show endpoint is shown in the Query Dependencies view:

image

The Data Source Settings dialog shows the same as it does above in the “Data sources in current file” tab, including the warning about hand-authored queries, but on the “Global permissions” tab there is now an entry for https://data.gov.uk :

image

Although you only set a privacy level for https://data.gov.uk earlier, it’s interesting to note that the entry for https://data.gov.uk/api/3/action/package_search has a privacy level set explicitly to Public and not to None:

image

Stepping back a few steps to the Privacy Levels dialog, if you set a privacy level of Private for https://data.gov.uk like so:

image

…then the query fails with the error “Formula.Firewall: Query ‘Query1’ (step ‘output’) is accessing data sources that have privacy levels which cannot be used together. Please rebuild this data combination.”:

image

From my point of view, this is the first really interesting finding: the two endpoints, https://data.gov.uk/api/3/action/package_search and https://data.gov.uk/api/3/action/package_show, are considered as separate data sources (which tallies with what is shown in the Query Dependencies view) even though they have both inherited their data privacy level setting from https://data.gov.uk. Since they are both Private then data cannot be sent from one source to the other, hence the error.

The second interesting finding becomes apparent if you follow the steps above with a slightly different version of the original query that uses the Query option in the call to the Package Show endpoint:

let
    Source = 
        Json.Document(
            Web.Contents(
                "https://data.gov.uk/api/3/action/package_search?q=cows"
                )
            ),
    result = Source[result],
    results = result[results],
    results1 = results{0},
    id = results1[id],
    output = 
        Json.Document(
            Web.Contents(
                "https://data.gov.uk/api/3/action/package_show",
                [Query=[#"id"=id]]
                )
            )
in
    output

In this case when you look in the Data Source Settings dialog you now see both endpoints listed and you no longer see the “hand-authored queries” warning:

image

It looks like whatever method it is that the Power Query engine searches for data sources inside a query is confused by dynamically generated urls – which might also explain why data sources that use dynamic urls can’t be refreshed after publishing.

Excel Dynamic Arrays And FilterXML

I’ll admit that I’m a bit less excited about Excel dynamic arrays than I was when I last blogged about them. Don’t get me wrong: from a pure Excel point-of-view they are still very cool, but I’ve since found out that the CubeValue function can’t be used with dynamic arrays which prevents me from doing all the really fun SSAS/Power BI/Power Pivot things I wanted to do with cube formulas.

It’s not all doom and gloom though. Several years ago I blogged about the then-new WebService and FilterXML functions (here and here). I very quickly found that the WebService function was very limited indeed and that Power Query did everything it did but better; on the other hand I felt FilterXML I had some unexplored potential, although I never got round to doing any exploring. Fast forward to last year and I saw that FilterXML was one of a number of existing functions that are affected by the new dynamic array behaviour, a change that makes it even more flexible.

Consider the following, publicly-available XML file:

http://www.hmrc.gov.uk/softwaredevelopers/rates/exrates-monthly-0719.xml

It’s a list of exchange rates published by the UK tax authorities and the contents look like this:

image

Just the kind of data you want to bring into Excel. The WebService function doesn’t work with this URL for some reason but it’s very easy to bring this data into an Excel table using Power Query with a few clicks using the Web data source:

image

If you prefer to work with dynamic arrays though (and I’m sure there are lots of reasons why that might be true), you can do that with a combination of Power Query and FilterXML.

The first thing to do is to use Power Query to load the entire XML document into a single cell in a worksheet. Here’s a query to do this:

let
    Source = 
    Text.FromBinary(
        Web.Contents(
            "http://www.hmrc.gov.uk/softwaredevelopers" &
            "/rates/exrates-monthly-0719.xml"
        )
    ),
    #"Converted to Table" = 
    #table(
        1, 
        {{Source}}
    ),
    #"Cleaned Text" = 
    Table.TransformColumns(
        #"Converted to Table",
        {{"Column1", Text.Clean, type text}}
    ),
    #"Replaced Value" = 
    Table.ReplaceValue(
        #"Cleaned Text",
        "> <",
        "><",
        Replacer.ReplaceText,{"Column1"}
    ),
    #"Replaced Value1" = 
    Table.ReplaceValue(
        #"Replaced Value",
        ">  <",
        "><",
        Replacer.ReplaceText,{"Column1"}
    ),
    #"Replaced Value2" = 
    Table.ReplaceValue(
        #"Replaced Value1",
        ">   <",
        "><",
        Replacer.ReplaceText,
        {"Column1"}
    ),
    #"Replaced Value3" = 
    Table.ReplaceValue(
        #"Replaced Value2",
        ">    <",
        "><",
        Replacer.ReplaceText,{"Column1"}
    )
in
    #"Replaced Value3"

One interesting point to make here: the FilterXML function does not like spaces between closing and opening angle brackets in XML (maybe this is why WebService errors too?) so I’m removing all occurrences of this, as well as removing any unprintable characters. Here’s the output, a table with one column and one row where the only cell contains the full XML:

image

You can then use an Excel formula like this to run an XPath query against this XML document:

=FILTERXML($A$2, "/exchangeRateMonthList/exchangeRate/countryName")

to get a list of all the country names spilling out to as many rows in the worksheet as necessary:

3d266a16-fa27-4927-9f79-558be7891293

[If you want a comparison with how FilterXML used to work in all its CTRL+SHIFT+ENTER glory see the “Scraping a whole XML document” section here; if you want to learn XPath, the query language used by FilterXML, there is a good tutorial here]

I’m not an XPath expert, or even an Excel expert, so I’ll finish here but hopefully this will prove useful to someone. You can download an Excel workbook containing the demos from this post here – note that neither Power Query nor dynamic arrays work in Excel Online yet, so don’t look at the workbook in the browser.

%d bloggers like this: