New Options For The Table.Buffer Function In Power Query

The March 2022 release of Power BI Desktop includes some new options for the Table.Buffer M function that can be set in a new second parameter. The options are of type BufferMode.Type; if you look at the built-in documentation for this type you’ll see the following:

The two allowed values are:

  • BufferMode.Eager: The entire value is immediately buffered in memory before continuing
  • BufferMode.Delayed: The type of the value is computed immediately but its contents are not buffered until data is needed, at which point the entire value is immediately buffered

Anyone with previous experience of Table.Buffer will see that BufferMode.Eager is the existing behaviour, but what is BufferMode.Delayed for?

It turns out that it’s there to make development faster. Consider the following M query that loads data from a CSV file with seven columns and a million rows in, and then uses Table.Buffer to buffer that table into memory:

let
  Source = Csv.Document(
    File.Contents(
      "C:\NumbersMoreColumns.csv"
    ),
    [
      Delimiter  = ",",
      Columns    = 7,
      Encoding   = 65001,
      QuoteStyle = QuoteStyle.None
    ]
  ),
  #"Promoted Headers"
    = Table.PromoteHeaders(
    Source,
    [PromoteAllScalars = true]
  ),
  #"Changed Type"
    = Table.TransformColumnTypes(
    #"Promoted Headers",
    {
      {"A", Int64.Type},
      {"B", Int64.Type},
      {"C", Int64.Type},
      {"D", Int64.Type},
      {"E", Int64.Type},
      {"F", Int64.Type},
      {"G", Int64.Type}
    }
  ),
  BufferTable = Table.Buffer(
    #"Changed Type"
  )
in
  BufferTable

When I refresh this query in Power BI Desktop on my PC I see the “Evaluating…” message for 20 seconds before the data starts to load:

If, however, I add the second parameter [BufferMode = BufferMode.Delayed] to Table.Buffer like so:

let
  Source = Csv.Document(
    File.Contents(
      "C:\NumbersMoreColumns.csv"
    ), 
    [
      Delimiter  = ",", 
      Columns    = 7, 
      Encoding   = 65001, 
      QuoteStyle = QuoteStyle.None
    ]
  ), 
  #"Promoted Headers"
    = Table.PromoteHeaders(
    Source, 
    [PromoteAllScalars = true]
  ), 
  #"Changed Type"
    = Table.TransformColumnTypes(
    #"Promoted Headers", 
    {
      {"A", Int64.Type}, 
      {"B", Int64.Type}, 
      {"C", Int64.Type}, 
      {"D", Int64.Type}, 
      {"E", Int64.Type}, 
      {"F", Int64.Type}, 
      {"G", Int64.Type}
    }
  ), 
  BufferTable = Table.Buffer(
    #"Changed Type", 
    [BufferMode = BufferMode.Delayed]
  )
in
  BufferTable

Then, when I run my refresh, the “Evaluating…” message only appears very briefly before the data starts to load:

It’s important to stress that after the “Evaluating…” phase the data load takes exactly the same amount of time – it’s only the “Evaluating…” phase that is faster. This can save you a lot of time as a developer, nonetheless. I have been told when these options are available in dataflows they will make validation (which occurs when you close Power Query Online after editing a dataflow) much faster too – in fact this developed to partially solve the dataflow validation problem.

[Thanks to Curt Hagenlocher for this information]

Update: something I should make clear is that this functionality is only useful for people who are already using Table.Buffer in their queries. If you’re not using Table.Buffer already then these changes won’t be of any benefit or interest.

14 responses

  1. Typo?:
    The type of the value is computed immediately but its contents are **NOT** buffered until data is needed…

    Chris, would you please do a blog sometime about the Azure-managed VNET gateway for Power BI? That team has had their stuff in preview for a whole year and it is unreliable and opaque and impossible to troubleshoot when it is misbehaving (which is often). This public preview was announced March of 2021. Based on what limited visibility I have, and based on the pace of change, I’d guess it will be another year or more before GA.

    I suspect that all the customers of this feature are having as much trouble as we are. We have opened a permanent support case, to alert product engineers each time there is a failure, and get their feedback every time. This is necessary because nobody has access to view the gateway logs except the product team (not even the support team). The rate of failures is very high (approx. 1 to 5%, depending on the week and the phase of the moon).

    …I wish that Microsoft public previews would be rated somehow, based on the expected GA date and/or based on the level of engineering investment & talent that is being allocated to a given feature. This is a public preview that I would have certainly avoided in retrospect. But after struggling for months and months, I now have a lot of my own skin in the game.

  2. Do we know when this function enhancement will become available in dataflows?

    (Trying this method in a dataflow this morning results in “2 arguments were passed to a function which expects 1”)

  3. Pingback: Power Query BufferMode Types – Curated SQL

  4. 1] Are the `*.Buffer()` family of functions able to release memory, if references go out of scope?

    For example: If you want to force sorting before merging two tables, where the final merge does not include all records in A — will A still be in memory?

    2] If you want to force evaluation, but don’t actually need the full table in memory at once, is there an alternative?

    For example: When generating multiple random values, you can end up in a situation where lazy eval causes `Table.AddColumn` to generate a random number once, applied to every row.

    I used `Table.AddIndexColumn` to force eval (causing all random numbers) instead of using `*.Buffer()` because I was thinking that

    – That it doesn’t have to load the entire table into memory before processing — it allows streaming
    – That it doesn’t waste memory by buffering everything

    • 1) It depends. Tables frequently hold references to the tables which were used to produce them, so even if the buffered table itself was no longer directly available it might still be pinned in memory by some other value.

      2) In short, there’s no function which is currently guaranteed to do this. Table.AddIndexColumn does achieve this today, and I’ve seen enough people use it that way that we probably won’t ever make it a foldable function for risk of breaking the behavior of existing queries. FWIW, random numbers should be fixed now so that they don’t do weird things with Table.AddColumn.

  5. Hi Chris, this actually makes me think – I have wrapped a Table.Buffer() around my sorting steps to force the step before subsequent steps … but maybe you know of a better approach?

  6. I’ve been using Table.Buffer for a long time and I’m happy to hear it will make it’s way to dataflows to make validation quicker.

    What I don’t understand is what makes the new type faster? Both types are buffered into memory but at different times – one is immediately and the other when the data is needed, which would be at refresh for the latter, correct? I don’t understand how they affect the evaluation time differently when loading. If anything, wouldn’t the Eager type evaluate quicker?

    • What makes it faster is that it allows the buffering to only take place when the dataset or dataflow is actually refreshing, and not when various metadata checks (ie the stuff that happens when the “Evaluating” message appears in Desktop, or the “Validating” phase when you close a dataflow) happen before a refresh takes place when you’re developing.

  7. Instead of all this why not just have a Manual mode of Calculation similar to Power Pivot or Excel which can be switched on during development which forces PQ to work with just the first 1000 rows in the preview.

  8. Pingback: Chris Webb's BI Blog: I’m Posting On The Power Query Blog Too! Chris Webb's BI Blog

Leave a Reply to Mark Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: