I’m Posting On The Power Query Blog Too!

In the future you’re going see me writing blog posts on the official Power Query blog as well as here on my own personal blog, and indeed the first of these posts went live a few hours ago. It’s on a new M function called Table.StopFolding which, as the name suggests, stops query folding taking place:

https://powerquery.microsoft.com/en-us/blog/stop-query-folding-with-table-stopfolding/

I’m doing this a) because I was asked very nicely by the Power Query team if I could help out, and b) because it doesn’t make sense for announcements about new Power BI or Power Query functionality, however obscure, to be made on my own personal blog rather than on an official product blog. This isn’t going to affect the number of posts here though.

The Power Query XML Functions, Missing Or Null Values And Inconsistent Schema Inference

A few months ago one of my colleagues at Microsoft, David Browne, showed me an interesting Power Query problem with how the Xml.Tables and Xml.Document M functions handle null or missing values. I’m posting the details here because the problem seems fairly common, it causes a lot of confusion and it’s not easy to deal with.

In XML there are two ways to represent a null or missing value:<a></a> or omitting the element completely. Unfortunately the Xml.Tables and Xml.Document M functions handle these inconsistently: they treat the <a></a> form as a table but the other as a scalar.

For example consider the following M query that takes some XML (with no missing values), passes it to Xml.Tables and then expands the columns:

let
    Source = "<?xml version=""1.0"" encoding=""UTF-8""?>
    <Record>
     <Location>
        <id>123</id>
        <code>abc</code>
     </Location>
     <Location>
        <id>123</id>
        <code>hello</code>     
     </Location>
    <Location>
        <id>123</id>
        <code>124</code>     
     </Location>
</Record>",
   t0 = Xml.Tables(Source),
    #"Expanded Table" = Table.ExpandTableColumn(t0, "Table", {"id", "code"}, {"id", "code"})
   
in
   #"Expanded Table"

Here’s the output:

Notice how the code column contains scalar values.

Now compare this with a query where the XML contains a missing value using the <a></a> form but where the query is otherwise identical:

let
    Source = "<?xml version=""1.0"" encoding=""UTF-8""?>
    <Record>
     <Location>
        <id>123</id>
        <code>abc</code>
     </Location>
     <Location>
        <id>123</id>
        <code></code>     
     </Location>
    <Location>
        <id>123</id>
        <code>124</code>     
     </Location>
</Record>",
   t0 = Xml.Tables(Source),
    #"Expanded Table" = Table.ExpandTableColumn(t0, "Table", {"id", "code"}, {"id", "code"})
   
in
   #"Expanded Table"

Here’s the output:

Notice how the code column now contains values of type table because of the presence of a missing value.

Finally, here’s one last example where the missing value is handled by omitting the element:

let
    Source = "<?xml version=""1.0"" encoding=""UTF-8""?>
    <Record>
     <Location>
        <id>123</id>
        <code>abc</code>
     </Location>
     <Location>
        <id>123</id> 
     </Location>
    <Location>
        <id>123</id>
        <code>124</code>     
     </Location>
</Record>",
   t0 = Xml.Tables(Source),
    #"Expanded Table" = Table.ExpandTableColumn(t0, "Table", {"id", "code"}, {"id", "code"})
   
in
   #"Expanded Table"

Here’s the output:

Notice how code now contains scalar values but the value in that column on the second row is null.

You can probably guess the kind of problems this can cause: if your source XML sometimes contains missing values and sometimes doesn’t you can end up with errors if you try to expand a column that sometimes contains table values and sometimes doesn’t. It’s not a bug but sadly this is just one of those things which is now very difficult for the Power Query team to change.

There’s no easy way to work around this unless you can change the way your data source generates its XML, which is unlikely. Probably the best thing you can do is use the Xml.Document function; it has the same problem but it’s slightly easier to deal with given how it returns values in a single column, which means you can use a custom column to trap table values something like this:

 let
  Source
    = "<?xml version=""1.0"" encoding=""UTF-8""?>
    <Record>
     <Location>
        <id>123</id>
        <code>abc</code>
     </Location>
     <Location>
        <id>123</id> 
        <code></code>
     </Location>
    <Location>
        <id>123</id>
        <code>124</code>     
     </Location>
</Record>",
  t0 = Xml.Document(Source),
  Value = t0{0}[Value],
  #"Expanded Value"
    = Table.ExpandTableColumn(
    Value,
    "Value",
    {
      "Name",
      "Namespace",
      "Value",
      "Attributes"
    },
    {
      "Name.1",
      "Namespace.1",
      "Value.1",
      "Attributes.1"
    }
  ),
  #"Removed Other Columns"
    = Table.SelectColumns(
    #"Expanded Value",
    {"Name.1", "Value.1"}
  ),
  #"Added Custom" = Table.AddColumn(
    #"Removed Other Columns",
    "Custom",
    each
      if Value.Is([Value.1], type table) then
        null
      else
        [Value.1]
  )
in
  #"Added Custom"

[Thanks to David Browne for the examples and solutions, and to Curt Hagenlocher for confirming this is the way the Xml.Tables and Xml.Document functions are meant to behave]

Understanding Power Query Query Diagnostics Data With The Diagnostics.ActivityID M Function

I was looking at the output of Power Query’s Query Diagnostics feature recently (again) and trying to understand it better. One of the more confusing aspects of it is the way that the Power Query engine may evaluate a query more than once during a single refresh. This is documented in the note halfway down this page, which says:

Power Query might perform evaluations that you may not have directly triggered. Some of these evaluations are performed in order to retrieve metadata so we can best optimize our queries or to provide a better user experience (such as retrieving the list of distinct values within a column that are displayed in the Filter Rows experience). Others might be related to how a connector handles parallel evaluations.

I came up with the following M query to illustrate this:

#table(
    type table 
    [#"Activity ID"=text], 
    {{Diagnostics.ActivityId()}}
    )

If you paste this code into a new blank query:

…you have a query that returns a table containing a single cell containing the text value returned by the Diagnostics.ActivityId M function, which I blogged about here. The output – copied from the Data pane of the main Power BI window – looks like this:

The Diagnostics.ActivityId function is interesting because it returns an identifier for the currently-running query evaluation, so in the table above the value in the Activity ID column is the identifier for the query that returned that table.

If you run a Query Diagnostics trace when refreshing this query, you’ll see that the Activity Id column of the Diagnostics_Detailed trace query contains evaluation identifier values:

The following query takes the output of a Diagnostics_Detailed trace query and gets just the unique values from the Id and Activity Id columns:

let
    Source = #"Diagnostics_Detailed_2022-04-24_19:40",
    #"Removed Other Columns" = Table.SelectColumns(Source,{"Id", "Activity Id"}),
    #"Removed Duplicates" = Table.Distinct(#"Removed Other Columns")
in
    #"Removed Duplicates"

This makes it easy to see that my query was actually evaluated (or at least partially evaluated) three times when I clicked refresh. Since the value in the Activity Id column for Id 4.10 matches the value in the table loaded into my dataset, I know that that was the evaluation that loaded my table into the dataset.

New Options For The Table.Buffer Function In Power Query

The March 2022 release of Power BI Desktop includes some new options for the Table.Buffer M function that can be set in a new second parameter. The options are of type BufferMode.Type; if you look at the built-in documentation for this type you’ll see the following:

The two allowed values are:

  • BufferMode.Eager: The entire value is immediately buffered in memory before continuing
  • BufferMode.Delayed: The type of the value is computed immediately but its contents are not buffered until data is needed, at which point the entire value is immediately buffered

Anyone with previous experience of Table.Buffer will see that BufferMode.Eager is the existing behaviour, but what is BufferMode.Delayed for?

It turns out that it’s there to make development faster. Consider the following M query that loads data from a CSV file with seven columns and a million rows in, and then uses Table.Buffer to buffer that table into memory:

let
  Source = Csv.Document(
    File.Contents(
      "C:\NumbersMoreColumns.csv"
    ),
    [
      Delimiter  = ",",
      Columns    = 7,
      Encoding   = 65001,
      QuoteStyle = QuoteStyle.None
    ]
  ),
  #"Promoted Headers"
    = Table.PromoteHeaders(
    Source,
    [PromoteAllScalars = true]
  ),
  #"Changed Type"
    = Table.TransformColumnTypes(
    #"Promoted Headers",
    {
      {"A", Int64.Type},
      {"B", Int64.Type},
      {"C", Int64.Type},
      {"D", Int64.Type},
      {"E", Int64.Type},
      {"F", Int64.Type},
      {"G", Int64.Type}
    }
  ),
  BufferTable = Table.Buffer(
    #"Changed Type"
  )
in
  BufferTable

When I refresh this query in Power BI Desktop on my PC I see the “Evaluating…” message for 20 seconds before the data starts to load:

If, however, I add the second parameter [BufferMode = BufferMode.Delayed] to Table.Buffer like so:

let
  Source = Csv.Document(
    File.Contents(
      "C:\NumbersMoreColumns.csv"
    ), 
    [
      Delimiter  = ",", 
      Columns    = 7, 
      Encoding   = 65001, 
      QuoteStyle = QuoteStyle.None
    ]
  ), 
  #"Promoted Headers"
    = Table.PromoteHeaders(
    Source, 
    [PromoteAllScalars = true]
  ), 
  #"Changed Type"
    = Table.TransformColumnTypes(
    #"Promoted Headers", 
    {
      {"A", Int64.Type}, 
      {"B", Int64.Type}, 
      {"C", Int64.Type}, 
      {"D", Int64.Type}, 
      {"E", Int64.Type}, 
      {"F", Int64.Type}, 
      {"G", Int64.Type}
    }
  ), 
  BufferTable = Table.Buffer(
    #"Changed Type", 
    [BufferMode = BufferMode.Delayed]
  )
in
  BufferTable

Then, when I run my refresh, the “Evaluating…” message only appears very briefly before the data starts to load:

It’s important to stress that after the “Evaluating…” phase the data load takes exactly the same amount of time – it’s only the “Evaluating…” phase that is faster. This can save you a lot of time as a developer, nonetheless. I have been told when these options are available in dataflows they will make validation (which occurs when you close Power Query Online after editing a dataflow) much faster too – in fact this developed to partially solve the dataflow validation problem.

[Thanks to Curt Hagenlocher for this information]

Update: something I should make clear is that this functionality is only useful for people who are already using Table.Buffer in their queries. If you’re not using Table.Buffer already then these changes won’t be of any benefit or interest.

Expanding Azure Data Explorer Dynamic Columns In Power Query

Azure Data Explorer has a data type called dynamic which can be used to hold scalar values as well as arrays and property bags; you can read about it in the docs here. For example (following on from my recent series on DirectQuery on Log Analytics, starting here) the ApplicationContext column in the PowerBIDatasetsWorkspace table that holds the IDs of the dataset, report and visual that generated a DAX query (see this post for more background) is of type dynamic:

This is what the contents of the column look like:

Now you can easily extract the individual property values from this column in KQL, and indeed I did so in the KQL queries in this post, but the interesting thing is you can also extract these values in Power Query M very easily and – crucially – maintain query folding using the Record.FieldOrDefault M function in a custom column without needing to write any KQL yourself, in both Import mode and DirectQuery mode.

In order to do this, first of all you have to enter a table name or KQL query in the third parameter of AzureDataExplorer.Contents function. When you do this you can treat a dynamic column as a record even if the Power Query UI doesn’t display it as such. Here’s an example M query that shows all of this in action on the PowerBIDatasetsWorkspace table that contains Power BI data in Log Analytics:

let
  Source = AzureDataExplorer.Contents(
    "InsertClusterName", 
    "InsertDBName", 
    "PowerBIDatasetsWorkspace", 
    [
      MaxRows                 = null, 
      MaxSize                 = null, 
      NoTruncate              = null, 
      AdditionalSetStatements = null
    ]
  ), 
  #"Added Custom" = Table.AddColumn(
    Source, 
    "Custom", 
    each Record.FieldOrDefault(
      [ApplicationContext], 
      "DatasetId"
    )
  )
in
  #"Added Custom"

From this query, here’s what the Custom Column dialog for the #”Added Custom” step looks like:

Here’s the output in the Power Query Editor:

And here’s the KQL query that this M query folds to (taken from the View Native Query dialog in the Power Query Editor):

PowerBIDatasetsWorkspace
| extend ["Custom"]=["ApplicationContext"]["DatasetId"]

I admit that this is a super-obscure tip but I think it’s fascinating nonetheless, especially given how nested structures are becoming more and more common in the world of big data. It would be great to have similar behaviour in other connectors…

Thanks to my colleague Itay Sagui (whose blog has several posts on Power BI/Azure Data Explorer integration) for this information.

Simulating Slow Data Sources In Power BI

As a postscript to my series on Power BI refresh timeouts (see part 1, part 2 and part 3) I thought it would be useful to document how I was able to simulate a slow data source in Power BI without using large data volumes or deliberately complex M code.

It’s relatively easy to create an M query that returns a table of data after a given delay. For example, this query returns a table with one column and one row after one hour and ten minutes:

let
  Source = Function.InvokeAfter(
    () => #table({"A"}, {{1}}), 
    #duration(0, 1, 10, 0)
  )
in
  Source

Some notes:

  • I’m using #table to return the table without having to connect to a data source. More details on how to use #table can be found here.
  • The delay is achieved using the Function.InvokeAfter M function, with the amount of time to wait for specified using #duration

A more interesting problem is how to create an M query that, instead of waiting for a given duration and then returning a table immediately, returns the rows of a table one at a time with a delay between each row. Here’s a query that does that, returning ten rows one second at a time:

let
  NumberOfRows = 10,
  DaysToWait = 0,
  HoursToWait = 0,
  MinutesToWait = 0,
  SecondsToWait = 1,
  Source = #table(
    {"A"},
    List.Transform(
      {1 .. NumberOfRows},
      each Function.InvokeAfter(
        () => {1},
        #duration(
          DaysToWait,
          HoursToWait,
          MinutesToWait,
          SecondsToWait
        )
      )
    )
  )
in
  Source

Last of all, to simulate a slow SQL Server data source – not being much good at TSQL at all – I borrowed some code from this thread on Stack Overflow to create a function that returns a scalar value after a specified number of seconds:

CREATE FUNCTION [dbo].[ForceDelay](@seconds int) returns int as 
BEGIN DECLARE @endTime datetime2(0) = DATEADD(SECOND, @seconds, GETDATE()); 
WHILE (GETDATE() < @endTime ) 
SET @endTime = @endTime;
return 1;
END

I showed how to call this function from Power Query using a native SQL query here.

Power BI/Power Query Data Privacy Settings And Errors Caused By Nested Values

Over the past few years I’ve blogged and presented extensively on the subject of Power Query’s data privacy settings (see here for a post with links to all this content). I thought I knew everything there was to know… but of course I didn’t, and I’ve recently learned about an issue that can cause mysterious errors.

As always it’s easiest to show an example of how it can occur. Here’s a table of airport names taken from the TripPin public OData feed:

Note how the Location column contains nested values of data type Record, and note that I have not expanded this column.

Here’s another query with sales data for these airports coming from Excel:

Now let’s say we want to join these two queries together using a Merge operation in the Power Query Editor. When you do this, because you are combining data from two different sources and because OData supports query folding, you will be prompted to set data privacy settings on these sources (unless you have already done so at some point in the past) because a Merge could result in data being sent from Excel to the OData source.

If you set the data privacy levels to Private on each source, like so:

…you are telling Power Query that it should never send data from these sources to any other source. As a result, Power Query has to load the data from both sources, buffer that data in memory, and do the Merge inside its own engine.

When you do the Merge everything looks normal at first:

But when you expand the Airports column you’ll see that the nested values in the Location column have been turned into the text “[Record]” and as a result can no longer be expanded.

This is because Power Query has had to buffer the values in the Airports query but it is unable to buffer nested values (I wrote about this here).

There are two ways to fix this. First of all, you can change the data privacy settings or turn them off completely. I don’t recommend turning them off completely because this is only possible in Power BI Desktop and not in the Power BI Service, and even changing the data privacy settings can lead to some unexpected issues later on. For example, if you set the data privacy levels for both sources to Public like so:

…then no buffering is needed (because data can now be sent from one source to another) and the nested values in the Location field can be expanded:

…and of course you do so:

However, people always forget that you have to set your data privacy levels again after publishing your dataset to the Power BI Service. And if you or someone else subsequently sets the data privacy levels back to Private you’ll get the following error in the output of the query:

The error message here is:

“Expression.Error: We cannot convert the value “[Record]” to type Record.”

Depending on the data type of the nested field you might get:

“Expression.Error: We cannot convert the value “[Table]” to type Table.”

or

“Expression.Error: We cannot convert the value “[List]” to type List.”

…instead.

The second way to fix the problem is easier and probably safer: you just need to expand the Location column before the Merge operation instead of after it. That way there are no nested fields present when the Merge takes place so all the values can be buffered. Here’s what the Airports table looks like after the Location column has been expanded, before the Merge:

…and here’s the output of the Merge even when the data privacy levels for both sources are set to Private:

Handling “Select All” For Slicers Bound To Dynamic M Parameters In Power BI

Last year I blogged about handling multi-select scenarios for dynamic M parameters. Since then support for “Select All” has been added (see here) but since a regular slicer with no items selected behaves the same as one with all items selected, some extra logic is needed to make a slicer bound to a dynamic M parameter behave like a regular slicer.

To illustrate this, I created a simple dataset with two tables in it:

  1. A DirectQuery table linked to the AdventureWorksDW DimDate table (a simple date dimension table), stored in Snowflake
  2. An Import mode table called Day Name that contains one column and seven rows containing the names of the days of the week

I then created an M parameter called DayParam and bound it to the DayName column of the Day Name table and enabled the Multi-select and Select all options:

Here’s the M code that shows how to use the DayParam parameter to filter the DimDate table on the EnglishDayNameOfWeek column, and handle the scenario where “Select All” has been selected as well as the scenario where no item has been selected:

let
  Source = Snowflake.Databases(
    "xyz.snowflakecomputing.com",
    "DEMO_WH"
  ),
  AWORKS_Database = Source
    {
      [
        Name = "AWORKS",
        Kind = "Database"
      ]
    }
    [Data],
  PUBLIC_Schema = AWORKS_Database
    {[Name = "PUBLIC", Kind = "Schema"]}
    [Data],
  DIMDATE_Table = PUBLIC_Schema
    {[Name = "DIMDATE", Kind = "Table"]}
    [Data],
  #"Filtered Rows" =
    if DayParam = null then
      DIMDATE_Table
    else if Type.Is(
      Value.Type(DayParam),
      List.Type
    )
    then
      if DayParam = {"__SelectAll__"} then
        DIMDATE_Table
      else
        Table.SelectRows(
          DIMDATE_Table,
          each List.Contains(
            DayParam,
            [ENGLISHDAYNAMEOFWEEK]
          )
        )
    else
      Table.SelectRows(
        DIMDATE_Table,
        each [ENGLISHDAYNAMEOFWEEK]
          = DayParam
      )
in
  #"Filtered Rows"

The #”Filtered Rows” step does the following:

  • If nothing is selected in the slicer then the DayParam parameter will contain a null value, and in this case no filter will be applied to the DimDate table
  • If the DayParam parameter is of type list (when there is a multi-select in the slicer) then
    • If it’s a list containing a single text value of “__SelectAll__”, the default value returned when the slicer has “Select all” selected, then no filter will be applied to the DimDate table
    • Otherwise List.Contains will be used to filter the DimDate table by all the selected values
  • Otherwise the DayParam parameter will contain a single scalar value (because a single value has been selected in the slicer) and the DimDate table will be filtered by that value

Here’s how it works:

Multi-Value Parameters In Power Query Online

There’s a nice new feature in Power Query Online (the version of Power Query used in Dataflows): parameters of type List. You can see this as a new option in the Type dropdown in the “Manage parameters” dialog:

Why is this interesting? In the past, Power Query parameters were always single values like a date or a string; now a parameter can contain mutliple values.

There’s one other new feature in Power Query Online that goes along with this: In and Not In filters, which can use these new List parameters.

I’m sure there are other cool things you can do with this but I’ll leave them to future blog posts.

Preserving Data Types With SQL Queries In Power Query And Power BI

My post earlier this year on enabling query folding when using SQL queries as a data source in Power Query provoked a lot of interest. This post adds one more useful detail: how to preserve the original data types of the columns in your query when using this technique with SQL Server-related sources.

Consider the DimDate table in the AdventureWorksDW2017 sample database for SQL Server:

Notice that the FullDateAlternateKey column has the data type Date.

If you connect to this table in the normal way, by selecting it in the Navigation pane when you connect to your SQL Server instance, the M code for your Power Query query will look something like this:

let
  Source = Sql.Databases("localhost"), 
  AdventureWorksDW2017 = Source
    {[Name = "AdventureWorksDW2017"]}
    [Data], 
  dbo_DimDate = AdventureWorksDW2017
    {[Schema = "dbo", Item = "DimDate"]}
    [Data]
in
  dbo_DimDate

Unsurprisingly, the FullDateAlternateKey column in the Power Query query also has a data type of Date, as indicated by the calendar icon on the left side of the column header in the Power Query Editor:

However, if you use Value.NativeQuery to run a SQL query to get the same data and set EnableFolding=true, like so:

let
  Source = Sql.Databases("localhost"), 
  AdventureWorksDW2017 = Source
    {[Name = "AdventureWorksDW2017"]}
    [Data], 
  Q = Value.NativeQuery(
    AdventureWorksDW2017, 
    "Select * From DimDate", 
    null, 
    [EnableFolding = true]
  )
in
  Q

…you’ll see that the FullDateAlternateKey column comes through as a DateTime type instead:

The same thing would happen with a column of type Time too, ie it would come through as a DateTime.

If you want the types in the output of Value.NativeQuery to match the types in the output of the first Power Query query above there’s an extra option you need to add: PreserveTypes=true.

let
  Source = Sql.Databases("localhost"), 
  AdventureWorksDW2017 = Source
    {[Name = "AdventureWorksDW2017"]}
    [Data], 
  Q = Value.NativeQuery(
    AdventureWorksDW2017, 
    "Select * From DimDate", 
    null, 
    [
      PreserveTypes = true, 
      EnableFolding = true
    ]
  )
in
  Q

In the output of this query, FullDateAlternateKey has the data type Date again:

This option is only available for the SQL Server connector (and connectors related to it) at the time of writing.

[Thanks to Curt Hagenlocher for this information]

%d bloggers like this: