The Power Query XML Functions, Missing Or Null Values And Inconsistent Schema Inference

A few months ago one of my colleagues at Microsoft, David Browne, showed me an interesting Power Query problem with how the Xml.Tables and Xml.Document M functions handle null or missing values. I’m posting the details here because the problem seems fairly common, it causes a lot of confusion and it’s not easy to deal with.

In XML there are two ways to represent a null or missing value:<a></a> or omitting the element completely. Unfortunately the Xml.Tables and Xml.Document M functions handle these inconsistently: they treat the <a></a> form as a table but the other as a scalar.

For example consider the following M query that takes some XML (with no missing values), passes it to Xml.Tables and then expands the columns:

let
    Source = "<?xml version=""1.0"" encoding=""UTF-8""?>
    <Record>
     <Location>
        <id>123</id>
        <code>abc</code>
     </Location>
     <Location>
        <id>123</id>
        <code>hello</code>     
     </Location>
    <Location>
        <id>123</id>
        <code>124</code>     
     </Location>
</Record>",
   t0 = Xml.Tables(Source),
    #"Expanded Table" = Table.ExpandTableColumn(t0, "Table", {"id", "code"}, {"id", "code"})
   
in
   #"Expanded Table"

Here’s the output:

Notice how the code column contains scalar values.

Now compare this with a query where the XML contains a missing value using the <a></a> form but where the query is otherwise identical:

let
    Source = "<?xml version=""1.0"" encoding=""UTF-8""?>
    <Record>
     <Location>
        <id>123</id>
        <code>abc</code>
     </Location>
     <Location>
        <id>123</id>
        <code></code>     
     </Location>
    <Location>
        <id>123</id>
        <code>124</code>     
     </Location>
</Record>",
   t0 = Xml.Tables(Source),
    #"Expanded Table" = Table.ExpandTableColumn(t0, "Table", {"id", "code"}, {"id", "code"})
   
in
   #"Expanded Table"

Here’s the output:

Notice how the code column now contains values of type table because of the presence of a missing value.

Finally, here’s one last example where the missing value is handled by omitting the element:

let
    Source = "<?xml version=""1.0"" encoding=""UTF-8""?>
    <Record>
     <Location>
        <id>123</id>
        <code>abc</code>
     </Location>
     <Location>
        <id>123</id> 
     </Location>
    <Location>
        <id>123</id>
        <code>124</code>     
     </Location>
</Record>",
   t0 = Xml.Tables(Source),
    #"Expanded Table" = Table.ExpandTableColumn(t0, "Table", {"id", "code"}, {"id", "code"})
   
in
   #"Expanded Table"

Here’s the output:

Notice how code now contains scalar values but the value in that column on the second row is null.

You can probably guess the kind of problems this can cause: if your source XML sometimes contains missing values and sometimes doesn’t you can end up with errors if you try to expand a column that sometimes contains table values and sometimes doesn’t. It’s not a bug but sadly this is just one of those things which is now very difficult for the Power Query team to change.

There’s no easy way to work around this unless you can change the way your data source generates its XML, which is unlikely. Probably the best thing you can do is use the Xml.Document function; it has the same problem but it’s slightly easier to deal with given how it returns values in a single column, which means you can use a custom column to trap table values something like this:

 let
  Source
    = "<?xml version=""1.0"" encoding=""UTF-8""?>
    <Record>
     <Location>
        <id>123</id>
        <code>abc</code>
     </Location>
     <Location>
        <id>123</id> 
        <code></code>
     </Location>
    <Location>
        <id>123</id>
        <code>124</code>     
     </Location>
</Record>",
  t0 = Xml.Document(Source),
  Value = t0{0}[Value],
  #"Expanded Value"
    = Table.ExpandTableColumn(
    Value,
    "Value",
    {
      "Name",
      "Namespace",
      "Value",
      "Attributes"
    },
    {
      "Name.1",
      "Namespace.1",
      "Value.1",
      "Attributes.1"
    }
  ),
  #"Removed Other Columns"
    = Table.SelectColumns(
    #"Expanded Value",
    {"Name.1", "Value.1"}
  ),
  #"Added Custom" = Table.AddColumn(
    #"Removed Other Columns",
    "Custom",
    each
      if Value.Is([Value.1], type table) then
        null
      else
        [Value.1]
  )
in
  #"Added Custom"

[Thanks to David Browne for the examples and solutions, and to Curt Hagenlocher for confirming this is the way the Xml.Tables and Xml.Document functions are meant to behave]

3 responses

  1. Pingback: Power Query XML Inconsistencies – Curated SQL

Leave a Reply to Nathan Watkins Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: