Power Query/M Optimisation: Getting The Maximum Value From A Column, Part 2

In part 1 of this series – which I strongly recommend you read before reading this post – I showed how removing columns from a table can make a dramatic improvement to the performance of certain transformations in Power Query. In this post I’ll show some tricks taught to me by Curt Hagenlocher of the dev team that can improve performance even more.

First of all, let me remind you of my original, totally unoptimised query which takes over a minute to run:

let
    Source = 
    Csv.Document(
        File.Contents(
            "C:\Users\chwebb\Downloads\pp-2018.csv"
            ),
            [Delimiter=",", Columns=16, 
            Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Changed Type" = 
    Table.TransformColumnTypes(
        Source,
        {{"Column2", Int64.Type}}
        ),
    #"Sorted Rows" = 
    Table.Sort(
        #"Changed Type",
        {{"Column2", Order.Descending}}
        ),
    Column2 = 
    #"Sorted Rows"{0}[Column2]
in
    Column2

In particular, I’d like to draw your attention to the last step where the position and lookup operators (I blogged about them here) are used to return the value in the Column2 column of the first line of the sorted table:

#”Sorted Rows”{0}[Column2]

Breaking this expression down:

  • #”Sorted Rows” returns the sorted table from the previous step in the query
  • {0} returns the first line of that table as a record
  • [Column2] gives the value in the Column2 field from that record

It turns out that changing the order of {0} and [Column2] in this expression makes a big impact on performance. So using the following expression in the last step instead:

#”Sorted Rows”[Column2]{0}

…reduces query execution time to 5-7 seconds! I’m told this is because asking for the column first does the same thing internally as manually removing all other columns in the table which, as I showed in my last post, has a dramatic impact on performance.

It’s important to understand that this is not a general recommendation to request the column before the row in expressions like this, because in other scenarios requesting the row before the column might perform better. What you need to do is test writing expressions like this both ways to see what gives you the best performance.

One piece of general advice that Curt did give me, though, was that most of the optimisations that the Power Query engine can use only happen on tables – not records or lists – so you should always try to work with tables whenever you can. For this particular query, using the Table.FirstN function to get the first row of the table like so:

Table.FirstN(#”Sorted Rows”,1)[Column2]{0}

…allows for another internal optimisation to kick in, taking the query execution time down to around 2 seconds – the same performance as the original query in my previous post.

9 responses

  1. Pingback: Power Query/M Optimisation: Getting The Maximum Value From A Column, Part 1 « Chris Webb's BI Blog

    • “most of the optimisations that the Power Query engine can use only happen on tables – not records or lists”
      Waouh, I was thinking the opposite ! Good to know, thank you,

  2. Pingback: Optimizing Max Value Performance in Power Query – Curated SQL

  3. Interesting Chris

    I got curious so I downloaded the same CSV file and experimented a bit on my laptop.
    In Power BI Desktop I created four different queries, based on your various code examples:
    1. The fast one (List.Max)
    2. The slowest one (#”Sorted Rows”{0}[Column2])
    3. The improved one (#”Sorted Rows”[Column2]{0})
    4. As #3 above, but here I merged all steps into one single step.

    I then refreshed each query and measured duration very simply by using a stopwatch.
    Approximate query refresh durations (seconds):
    1. 3
    2. 155
    3. 17
    4. 8

    What surprised me a bit (not yet being very experienced in the M engine) is the performance improvement by simply merging several steps into a single one (query 4). To me this seems to indicate that a notable price is paid for having multiple steps on your source, compared to instead having merged the same code into one single step.

    What are your thoughts on this, maybe this is indeed expected behaviour?

  4. Pingback: Tables, Numbers, Immutability And Power Query Performance « Chris Webb's BI Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: