Power BI DirectQuery Best Practices For Snowflake And Other Databases

Recently I collaborated with a number of people from Microsoft and Snowflake to write a blog post on best practices for using Power BI in DirectQuery mode on Snowflake. You can read it here:

https://medium.com/@jerellegainey/best-practices-for-using-power-bi-in-directquery-mode-with-snowflake-bfd1312ca7ab

It builds on what is already in the Power BI guidance documentation for DirectQuery to add some advice specific to Snowflake. It also has a few other tips that are generally applicable to all DirectQuery sources and which aren’t in the guidance docs (yet), such as the importance of setting the Maximum Connections Per Data Source property (which I also blogged about recently here) and the fact you can increase this to 30 in a Premium capacity, as well as the importance of always testing DirectQuery performance in the Power BI Service rather than in Power BI Desktop. As a result it’s worth reading if you are thinking of using Power BI in DirectQuery mode with Synapse, BigQuery or any other source.

If you are considering using DirectQuery on a project I have one last piece of advice: think very carefully about why you need to use DirectQuery and not Import mode. Many people don’t and end up in trouble – in my opinion Import mode should be the default choice simply because it will almost always perform better than DirectQuery mode, whatever back-end database you’re using.

Gateways And Measuring Power Query CPU Usage During Power BI Dataset Refresh

After last week’s post on measuring Power Query CPU usage during dataset refresh, someone asked an obvious question that I should have addressed: does using a gateway change anything? After all, if you’re using a gateway to connect to an on-premises data source then all the Power Query queries transforming the data from that source will be executed on the gateway machine and not in the Power BI Service.

Let’s do a quick test to find out. I couldn’t use the same Power Query query I used in last week’s post (it turns out you can’t force the use of a gateway when there isn’t an external data source) so instead I used another dataset that connects to a large CSV stored in ADLSgen2 storage and does a group by operation – something which is guaranteed to be very expensive in terms of CPU for Power Query.

Here’s what Profiler shows for the refresh operation when no gateway is used:

The refresh took around 30 seconds and used around 44 seconds of CPU time.

Here’s what Profiler shows when the refresh does use a gateway:

The refresh takes a lot longer, around 103 seconds (as you would expect – instead of loading the data from ADLSgen2 storage in the cloud to the Power BI Service, it has to take a round trip via the gateway on my PC) but the important thing is that the CPU time is now very low – 141 milliseconds.

So, as you might expect, the CPU time for refreshes that use an on-premises data gateway is not shown in Profiler traces because, as I said, all the work done by the Power Query engine is done on the gateway machine and not in the Power BI Service. Making refreshes use a gateway, even when you don’t need to, can be a way of taking load off a Power BI Premium capacity if it’s overloaded.

This in turn raises the question of how you measure Power Query CPU usage on a gateway? As far as I know it isn’t possible for individual Power Query queries (I could be wrong though), although the gateway logs do allow you to capture CPU usage for the whole machine. Better gateway monitoring tools are on the way but this seems like a good time to mention my colleague Rui Romano’s open source gateway monitoring solution (article | repo) which makes understanding the gateway logs a lot easier.

Measuring Power Query CPU Usage During Power BI Dataset Refresh

Some time ago I wrote a post about how optimising for CPU Time is almost as important as optimising for Duration in Power BI, especially if you’re working with Power BI Premium Gen2. This is fairly straightforward if you’re optimising DAX queries or optimising Analysis Services engine-related activity for refreshes. But what about Power Query-related activity? You may have a small dataset but if you’re doing a lot of complex transformations in Power Query that could end up using a lot of CPU, even once the CPU smoothing for background activity that happens with Premium Gen2 has happened. How can you measure how expensive your Power Query queries are in terms of CPU? In this post I’ll show you how.

Let’s consider two Power Query queries that return a similar result and which are connected to two different tables in the same Power BI dataset. The first query returns a table with one column and one row, where the only value is a random number returned by the Number.Random M function:

#table(type table [A=number],{{Number.Random()}})

The second query also returns a table with a single value in it:

let
  InitialList = {1 .. 1000000},
  RandomNumbers = List.Transform(
    InitialList,
    each Number.Random()
  ),
  FindMin = List.Min(RandomNumbers),
  Output = #table(
    type table [A = number],
    {{FindMin}}
  )
in
  Output

This second query, however, generates one million random numbers, finds the minimum and returns that value – which of course is a lot slower and more expensive in terms of CPU.

If you run a SQL Server Profiler trace connected to Power BI Desktop and refresh each of the two tables in the dataset separately, the Command End event for the refresh will tell you the duration of the refresh and also the amount of CPU Time used by the Analysis Services engine for the refresh (there will be several Command End events visible in Profiler but only one with any significant activity, so it will be easy to spot the right one). In Desktop, however, the Command End event does not include any CPU used by the Power Query Engine. Here’s what the Command End event for the first Power Query query above looks like in Desktop:

As you would expect the values in both the Duration and CPU Time columns are low. Here is what the Command End event looks like for the second query above:

This time the refresh is much slower (the Duration value is much larger than before) but the CPU Time value is still low, because the Analysis Services engine is still only receiving a table with a single value in it. All the time taken by the refresh is taken in the Power Query engine.

If you publish a dataset containing these queries to a Premium workspace in the Power BI Service, connect Profiler to the XMLA Endpoint for the workspace, and then refresh the two tables again then for the first, fast query you won’t notice much difference:

[Note that in this screenshot I’ve chosen a comparable Command End event to the one I used in Desktop, although for some reason it doesn’t show the duration. The overall refresh duration, which includes some extra work to do a backup, is around 2 seconds]

However, for the second, slower query you can see that the CPU Time for the Command End event is much higher. This is because in the Power BI Service the event’s CPU Time includes all the Power Query-related activity as well as all Analysis Services engine activity:

This is a simple example where there is very little work being done in the Analysis Services engine, which means that pretty much all the CPU Time can be attributed to the Power Query engine. In the real world, when you’re working with large amount of data, it will be harder to understand how much work is being done in the Analysis Services engine and how much is being done in the Power Query engine. This is where Power BI Desktop comes in, I think. In Desktop you know you are only seeing the CPU used by the Analysis Services engine, so I’ll bet that if there is a big difference in the ratio of CPU Time to Duration for your refresh in Power BI Desktop compared to the Power BI Service, it’s highly likely that that difference is due to Power Query engine activity and that’s where you should concentrate your optimisation efforts.

Of course the next question is how can you optimise Power Query queries so they use less CPU? I don’t know, I haven’t done it yet – but when I have something useful to share I’ll blog about it…

Cancelling Power BI Dataset Refreshes With The Enhanced Refresh API

The most exciting (at least for me) feature in the new Enhanced Refresh API (blog announcement | docs) is the ability to cancel a dataset refresh that’s currently in progress. Up until now, as this blog post by my colleague Michael Kovalsky shows, this has been quite difficult to do: not only do you need to use the XMLA Endpoint but you also need to take into account that in many cases the Power BI Service will automatically restart the refresh even after you’ve cancelled it. Now, though, if (and only if) you start the refresh using the Enhanced Refresh API you can also cancel it via the Enhanced Refresh API too. This is important because I’ve seen a few cases where rogue refreshes have consumed a lot of CPU on a Premium capacity and caused throttling, even after all CPU smoothing has taken place, and Power BI admins have struggled to cancel the refreshes.

This and all the other great functionality the new API includes (the ability to refresh individual tables or partitions! control over parallelism!) means that it can handle many of the advanced scenarios that, in the past, you’d have had to write some complex TMSL commands for; in my opinion anyone working on an enterprise-level dataset in Power BI Premium should be using it for their refreshes.

But Chris, I hear you say, I’m a data person and find working with APIs confusing and difficult! Yeah, me too – which is why, when I saw this tweet by Stephen Maguire about .NET interactive notebook for Visual Studio Code he’s built for the Enhanced Refresh API, I was interested:

https://github.com/samaguire/PowerBINotebooks

It’s a really great set of examples for learning how to use the Enhanced Refresh API through PowerShell and the notebook format makes it a lot more user-friendly than just another bunch of scripts. I highly recommend that you check it out.

How The “Maximum Connections Per Data Source” Property On Power BI DirectQuery Datasets Can Affect Report Performance

If you’re working with DirectQuery in Power BI then one of the most important properties you can set on your dataset is the “Maximum connections per data source” property. You can find it on the Published Dataset Settings tab in the Options dialog in Power BI Desktop:

The description of what it does in the guidance documentation is pretty comprehensive:

You can set the maximum number of connections DirectQuery opens for each underlying data source. It controls the number of queries concurrently sent to the data source.

The setting is only enabled when there’s at least one DirectQuery source in the model. The value applies to all DirectQuery sources, and to any new DirectQuery sources added to the model.

Increasing the Maximum Connections per Data Source value ensures more queries (up to the maximum number specified) can be sent to the underlying data source, which is useful when numerous visuals are on a single page, or many users access a report at the same time. Once the maximum number of connections is reached, further queries are queued until a connection becomes available. Increasing this limit does result in more load on the underlying data source, so the setting isn’t guaranteed to improve overall performance.

When the model is published to Power BI, the maximum number of concurrent queries sent to the underlying data source also depends on the environment. Different environments (such as Power BI, Power BI Premium, or Power BI Report Server) each can impose different throughput constraints.

I thought it would be interesting to do some experiments to see how this property behaves, what you see in Profiler (or Log Analytics) when connections are queued up, and how you can find an optimal value for your dataset.

The first thing to mention – and this is something I only realised relatively recently – is that this property applies to DirectQuery on Power BI datasets and Analysis Services as well as traditional DirectQuery to external databases. I’m a lot more comfortable with Power BI than any relational database so I decided to do my testing with a DirectQuery dataset connected back to another Power BI dataset; the behaviour of the feature is the same as with DirectQuery to a relational database.

For my tests I created a simple dataset – let’s call it Dataset A – with not much data but a really inefficient DAX measure on it. I then created a composite model dataset – let’s call this Dataset B – with a DirectQuery connection to Dataset A. Finally I created a report with a Live connection to Dataset B with 25 card visuals on, each of which used the inefficient measure with a different filter. The DAX query for each of these cards, when run on its own through DAX Studio, took around 28 seconds, with almost all that time spent in the Formula Engine. The datasets and reports were published to a PPU workspace and all tests were run in the Power BI Service and not in Power BI Desktop (at the time of writing, things work differently in Desktop – which means you should always test performance of DirectQuery reports in the Service and not in Desktop). I ran Profiler traces using the Query Begin and Query End events on both Dataset A and Dataset B during my tests.

First of all, let’s see what happened when the Maximum Number of Connections property on Dataset B was set to 1. This means that Dataset B is only allowed to have one connection open to Dataset A to run its DirectQuery queries. When the report was run, right at the start the Profiler trace on Dataset B showed 25 Query Begin events indicating all 25 queries for the 25 card visuals were being run in parallel; the Profiler trace on Dataset A showed just 1 Query Begin event:

This is what you would expect: since Dataset B can only use one connection to Dataset A it can only run one query at a time and the other 24 queries have to queue up to wait for the connection. When the first query against Dataset A completed, another one started and so on. Since the maximum length of time that a DAX query can run in the Power BI Service is 225 seconds, after 225 seconds any remaining queries timed out. At that point 8 queries had completed, so 8 cards were rendered, and all the remaining cards showed the timeout error I blogged about here:

At the end, the Profiler trace against Dataset A showed 8 completed Query Begin/End pairs:

While the Profiler Trace against Dataset B showed, after the 25 Query Begin events, 8 Query End events for the successful queries and then 17 Query End events with timeout errors for the unsuccessful queries.

One interesting thing to notice is the durations of the queries. As I said, when run on their own each of these queries took around 28 seconds, and the Profiler trace on Dataset A shows each query taking around 28 seconds. If you look at the successful queries on Dataset B you’ll see that their duration goes up in increments of around 28 seconds: the first takes 29984ms, the second takes 58191ms, the third takes 87236ms and so on until you hit the 225 second timeout limit. This shows that the duration of the queries against Dataset B, the composite model, includes the time waiting to acquire a connection. Notice also that the CPU Time of the queries against Dataset B is minimal because it only includes the CPU used by the query for Dataset B; you have to add the CPU Time to the related queries on Dataset A to get the total CPU Time used by these queries.

The important question is, though, what is the effect of increasing the Maximum Connections Per Data Source property? Increasing it will increase the number of queries run in parallel, but is more paralellism always better? I reran my tests with the property set to 5, 10 (which is the default value and the maximum that can be used for datasets not in Premium capacity) and 30 (which is the maximum value that can be used for datasets in any form of Premium capacity). Here are the results:

Maximum Connections Per Data Source	Number of Visuals That Render Successfully In 225 Seconds
1	8
5	16
10	10
30	8

As you can see increasing the parallelism a little bit helps more than increasing the parallelism a lot, and in this case reducing the value from the default was better than increasing it: overloading your source with a lot of expensive parallel queries is often a bad thing. This test isn’t representative of most real-world reports – you shouldn’t have one visual, let alone 30, with queries that run for as long as 30 seconds and the best way to optimise a report like this would be to display the same data in a smaller number of visuals – but I think it’s a useful illustration of how this property works and how it can affect report performance.

Build Web Sites And Embed Power BI Reports In Them Using Power Pages

In amongst all the announcements at Build recently, you may have heard about a new member of the Power Platform being launched: Power Pages. You can read the docs here, and there’s a good, detailed video overview here, but here’s a quick summary of what it is:

Microsoft Power Pages is a secure, enterprise-grade, low-code software as a service (SaaS) platform for creating, hosting, and administering modern external-facing business websites. Whether you’re a low-code maker or a professional developer, Power Pages enables you to rapidly design, configure, and publish websites that seamlessly work across web browsers and devices.

So what? I’m not a web designer and I’m pretty sure most of you aren’t either, so why blog about it here? Most data and BI people don’t need to build web sites… do they?

Well I was playing around with it and noticed one important detail:

It has built-in support for embedding Power BI reports into the web sites you build! You can read more about this here and here. What’s more, it supports all forms of Power BI embedding (which can be an extremely confusing subject): as well as the use of Power BI Embedded, for sharing reports with external users, you can use regular Power BI Premium, Secure Embedding (which doesn’t need Premium), and Publish to Web for sharing with the general public. It also supports embedding of reports and dashboards (though not paginated reports) and also more complex security scenarios if you have the relevant web development skills.

As someone with no web development skills whatsoever it was very easy for me to build a web site with a Power BI report embedded into it:

When would this be useful? I talk to a lot of people who want to share Power BI reports with external users. You can use Azure B2B for this, although it doesn’t give you the smooth experience a custom-built web site does – but using a custom-built web site of course requires you to actually build that web site, and not everyone has a web developer available to do this work. This is where I see Power Pages being extremely useful: self-service web development for self-service data people, letting you share data securely outside your organisation quickly and easily.

Stopping Some Users Seeing Certain Columns Or Measures In Your Power BI Report With Object Level Security And Field Parameters

If you have sensitive data in your Power BI dataset you may need to stop some users seeing the data in certain columns or measures. There is only one way to achieve this: you have to use Object Level Security (OLS) in your dataset. It’s not enough to exclude those measures or columns from your reports or to hide them, because there will always be ways for enterprising users to see data they shouldn’t be allowed to see. However the problem with OLS up to now is that it didn’t play nicely with Power BI reports and so you had to create multiple versions of the same report for different security roles. The good news is that there’s now a way to create one report connected to a dataset with OLS and have it display different columns and measures to users with different permissions.

Let’s say you have a dataset and report that looks like this:

As you can see, it displays the names and addresses of employees along with sales and bonus data.

Now let’s say that the address and bonus data should only be visible to HR and everyone else should only be able to see the names and sales values. As I said, the only way to achieve this is to create a role that uses OLS to deny access to the address and bonus columns. Gilbert Quevauvilliers has a great post showing how to set up OLS using Tabular Editor here so I won’t go into detail about how to do this, but here’s how I configured the role in Tabular Editor 3:

If you publish the report and test the role in the browser, you’ll see that you get a “The visual has unrecognized fields” error because the table in the report uses the Address and Bonus fields which, of course, the user cannot access because of the OLS:

Security is working as expected but wouldn’t it be great if, instead of seeing an error here, you could build a single report that displays all the fields when the user has permission to see them and only displays the Name and Sales fields to users who are members of the role with OLS applied?

Well, now you can thanks to the new field parameters feature. The intended use of field parameters is to enable the end users of your reports to choose the fields displayed in visuals using a slicer. Behind the scenes when you create a field parameter a table is added to your dataset with one row for each field you have chosen; this effectively makes the fields used in a visual data-driven, and you can use Row Level Security (RLS) on the table created for your field parameter to control which fields are displayed in your visual and solve the problem.

Going back to our report, the next step is to create a new field parameter with all the columns and measures used in the table:

Notice that the “Add slicer to this page” checkbox is deselected because you don’t need a slicer on the report here. Here’s what the table created for the field parameter looks like:

With the field parameter created it can be used instead of the individual fields in the table definition:

You can then edit the role that already has OLS in it to apply RLS on the field parameter table, so only the rows for the fields that are allowed by the OLS are returned:

'FieldsForMyTable'[FieldsForMyTable] = "Name" ||
'FieldsForMyTable'[FieldsForMyTable] = "Sales"

It’s important that the OLS and RLS are defined in the same role because of the restriction on combining OLS and RLS from different roles.

With all this done, when you view the report through the role you only see Name and Sales displayed:

It’s important to stress that the OLS is still securing the data here – the RLS is just preventing the errors.

One downside of this technique is that things could get complicated if you have multiple visuals that need to display different combinations of secured and non-secured fields in a report. There could also be a performance penalty: when a visual uses a field parameter an extra DAX query is run on the field parameter table to determine the fields to display, and while these queries should be extremely fast most of the time there’s always a risk that they somehow slow your report down.

In conclusion, this workaround isn’t ideal but I think it’s the best way to work with OLS in Power BI reports that’s possible at the moment.

[Thanks to John Vulner for background information on how field parameters work]

Calculating Percentiles In Power Query Online

[This post was originally published on the Power Query blog which no longer exists. I have republished my posts from there here to make sure the content remains available.]

There’s a new aggregation option available for the Group By transformation in Power Query Online: percentiles. To see how it works, let’s see a simple example.

Consider the following Excel table containing information about invoices from different countries:

If you connect to this table from a Power BI or Power Platform dataflow, you can select the Country column and click the Group by button on the Transform tab in the ribbon:

When you do this the Group By dialog will open with the Country column selected in the Group by dropdown box. This is where you’ll see the new Percentile option in the Operation dropdown box; if you select it another box will appear underneath the Operation dropdown box where you can enter a number between 0 and 100 to indicate which percentile should be returned when the data is aggregated. You will also need to enter a new column name to contain the aggregated data and select a column containing the numeric data to be aggregated. Here’s what the Group By dialog would look like if you wanted to find the value for the 50^th percentile from the Amount column for each country:

Click OK and here’s the output of the transformation:

This functionality uses the List.Percentile M function to do its work.

One last thing to mention is that if you’re using SQL Server or Azure Data Explorer as your data source query folding will take place for this transformation.

Understanding The “We Couldn’t Fold The Expression To The Data Source” Error In Power BI

If you’re using DirectQuery mode in Power BI you may occasionally run into the following error message:

Couldn’t load the data for this visual

OLE DB or ODBC error: [Expression.Error] We couldn’t fold the expression to the data source. Please try a simpler expression..

What does it mean and how can you fix it?

To understand what’s going on here you must first understand what query folding is. There’s some great documentation here that I strongly recommend you read, but in a nutshell query folding refers to how the Power Query engine inside Power BI can push calculation and transformation logic back to whatever data source you’re using in the form of a query – for example a SQL query if your data source is a relational database. Most of the time when people talk about query folding they are using Import mode but it’s even more important in DirectQuery mode: in DirectQuery mode not only does every transformation you create in the Power Query Editor have to fold, but every DAX query (including all your DAX calculations) generated by the visuals on your report has to be folded into one or more queries against your data source too.

You can do some pretty complex things in the Power Query Editor and in DAX and the error message above is the error you get when Power BI admits defeat and says it can’t translate a DAX query generated by a visual on a report into a query against your data source. The cause is likely to be a combination of several of the of the following:

A complex data model
Complex DAX used in measures or calculated columns
The use of dynamic M parameters
Complex transformations created in the Power Query Editor

Unfortunately it’s hard to be more specific because Power BI can fold different transformations to different data sources and this error almost never occurs in simple scenarios.

How can you avoid it? Again, I can only offer general advice:

Don’t do any transformations in the Power Query Editor if you’re using DirectQuery mode. If you want to use DirectQuery you should always make sure your data is modelled appropriately in whatever data source you’re using before you start designing your dataset in Power BI.
Keep your data model as simple as possible. For example, avoiding bi-directional relationships is a good idea.
Try to implement as much of the logic for your calculations in your data source and reduce the amount of DAX you need to write.
Try to write your DAX in a different way in the hope that Power BI will be able to fold it.

The Power Query XML Functions, Missing Or Null Values And Inconsistent Schema Inference

A few months ago one of my colleagues at Microsoft, David Browne, showed me an interesting Power Query problem with how the Xml.Tables and Xml.Document M functions handle null or missing values. I’m posting the details here because the problem seems fairly common, it causes a lot of confusion and it’s not easy to deal with.

In XML there are two ways to represent a null or missing value:<a></a> or omitting the element completely. Unfortunately the Xml.Tables and Xml.Document M functions handle these inconsistently: they treat the <a></a> form as a table but the other as a scalar.

For example consider the following M query that takes some XML (with no missing values), passes it to Xml.Tables and then expands the columns:

let
    Source = "<?xml version=""1.0"" encoding=""UTF-8""?>
    <Record>
     <Location>
        <id>123</id>
        <code>abc</code>
     </Location>
     <Location>
        <id>123</id>
        <code>hello</code>     
     </Location>
    <Location>
        <id>123</id>
        <code>124</code>     
     </Location>
</Record>",
   t0 = Xml.Tables(Source),
    #"Expanded Table" = Table.ExpandTableColumn(t0, "Table", {"id", "code"}, {"id", "code"})
   
in
   #"Expanded Table"

Here’s the output:

Notice how the code column contains scalar values.

Now compare this with a query where the XML contains a missing value using the <a></a> form but where the query is otherwise identical:

let
    Source = "<?xml version=""1.0"" encoding=""UTF-8""?>
    <Record>
     <Location>
        <id>123</id>
        <code>abc</code>
     </Location>
     <Location>
        <id>123</id>
        <code></code>     
     </Location>
    <Location>
        <id>123</id>
        <code>124</code>     
     </Location>
</Record>",
   t0 = Xml.Tables(Source),
    #"Expanded Table" = Table.ExpandTableColumn(t0, "Table", {"id", "code"}, {"id", "code"})
   
in
   #"Expanded Table"

Here’s the output:

Notice how the code column now contains values of type table because of the presence of a missing value.

Finally, here’s one last example where the missing value is handled by omitting the element:

let
    Source = "<?xml version=""1.0"" encoding=""UTF-8""?>
    <Record>
     <Location>
        <id>123</id>
        <code>abc</code>
     </Location>
     <Location>
        <id>123</id> 
     </Location>
    <Location>
        <id>123</id>
        <code>124</code>     
     </Location>
</Record>",
   t0 = Xml.Tables(Source),
    #"Expanded Table" = Table.ExpandTableColumn(t0, "Table", {"id", "code"}, {"id", "code"})
   
in
   #"Expanded Table"

Here’s the output:

Notice how code now contains scalar values but the value in that column on the second row is null.

You can probably guess the kind of problems this can cause: if your source XML sometimes contains missing values and sometimes doesn’t you can end up with errors if you try to expand a column that sometimes contains table values and sometimes doesn’t. It’s not a bug but sadly this is just one of those things which is now very difficult for the Power Query team to change.

There’s no easy way to work around this unless you can change the way your data source generates its XML, which is unlikely. Probably the best thing you can do is use the Xml.Document function; it has the same problem but it’s slightly easier to deal with given how it returns values in a single column, which means you can use a custom column to trap table values something like this:

 let
  Source
    = "<?xml version=""1.0"" encoding=""UTF-8""?>
    <Record>
     <Location>
        <id>123</id>
        <code>abc</code>
     </Location>
     <Location>
        <id>123</id> 
        <code></code>
     </Location>
    <Location>
        <id>123</id>
        <code>124</code>     
     </Location>
</Record>",
  t0 = Xml.Document(Source),
  Value = t0{0}[Value],
  #"Expanded Value"
    = Table.ExpandTableColumn(
    Value,
    "Value",
    {
      "Name",
      "Namespace",
      "Value",
      "Attributes"
    },
    {
      "Name.1",
      "Namespace.1",
      "Value.1",
      "Attributes.1"
    }
  ),
  #"Removed Other Columns"
    = Table.SelectColumns(
    #"Expanded Value",
    {"Name.1", "Value.1"}
  ),
  #"Added Custom" = Table.AddColumn(
    #"Removed Other Columns",
    "Custom",
    each
      if Value.Is([Value.1], type table) then
        null
      else
        [Value.1]
  )
in
  #"Added Custom"

[Thanks to David Browne for the examples and solutions, and to Curt Hagenlocher for confirming this is the way the Xml.Tables and Xml.Document functions are meant to behave]

Category: Power BI

Power BI DirectQuery Best Practices For Snowflake And Other Databases

Like this:

Gateways And Measuring Power Query CPU Usage During Power BI Dataset Refresh

Like this:

Measuring Power Query CPU Usage During Power BI Dataset Refresh

Like this:

Cancelling Power BI Dataset Refreshes With The Enhanced Refresh API

Like this:

How The “Maximum Connections Per Data Source” Property On Power BI DirectQuery Datasets Can Affect Report Performance

Like this:

Build Web Sites And Embed Power BI Reports In Them Using Power Pages

Like this:

Stopping Some Users Seeing Certain Columns Or Measures In Your Power BI Report With Object Level Security And Field Parameters

Like this:

Calculating Percentiles In Power Query Online

Like this:

Understanding The “We Couldn’t Fold The Expression To The Data Source” Error In Power BI

Like this:

The Power Query XML Functions, Missing Or Null Values And Inconsistent Schema Inference

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: