Nested Calculations In Power Query

Quite often, in Power Query, you want to ‘nest’ calculations and transformations – apply them not across the whole table, but repeat the same calculation or transformation across multiple groups of rows within that table. Let me give you an example…

Take the following input table:

image

Imagine you wanted to add a column showing the rank of each row by Sales. In Power Query you would just need to:

  1. Load the data
  2. Sort the table in descending order by Sales
  3. Add an index column starting at 1, which is the rank

You would end up with the following output:

image

…and here’s the M code, all of which is generated by the UI:

let
    Source = Excel.CurrentWorkbook(){[Name="Sales"]}[Content],
    #"Sorted Rows" = Table.Sort(Source,{{"Sales", Order.Descending}}),
    #"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Rank", 1, 1)
in
    #"Added Index"

Now imagine you wanted to rank Sales within each month. That’s to say you want to apply the same steps as shown in the previous query but for each month individually to get an output like this:

image

One way to tackle this, and problems like it, is to do the following. First, do a Group By on the column you want to repeat the calculation over, in this case Month, and use the All Rows aggregation operation. This will result in a table with one row for each month and a column containing nested tables, as shown below:

image

Each one of these tables contains the rows from the original table for the month.

You can then take your original transformation and turn it into a function, either in a separate query or as a step in your current query. Here’s an example of how the query above can be turned into a function that takes a table and returns a table with a rank column added:

(tabletorank as table) as table =>
     let
      SortRows = Table.Sort(tabletorank,{{"Sales", Order.Descending}}),
      AddIndex = Table.AddIndexColumn(SortRows, "Rank", 1, 1)
     in
      AddIndex

 

Next, you need to pass each nested table to this function. You could do that in a calculated column, but the most elegant way I think is by using the Table.TransformColumns() function which takes a function and applies it to every value in a column (see here for another example of how to use it).

Finally, you get the final output by clicking on the Expand icon in the AllRows column and then choosing to expand all the columns in the nested table except the ones you originally grouped on:

image

image

Here’s the full M code:

let
    //Get data from Excel
    Source = Excel.CurrentWorkbook(){[Name="Sales"]}[Content],
    //Group by Month
    Grouped = Table.Group(Source, {"Month"}, {{"AllRows", each _, type table}}),
    //Declare a function that adds a Rank column to a table
    RankFunction = (tabletorank as table) as table =>
     let
      SortRows = Table.Sort(tabletorank,{{"Sales", Order.Descending}}),
      AddIndex = Table.AddIndexColumn(SortRows, "Rank", 1, 1)
     in
      AddIndex,
    //Apply that function to the AllRows column
    AddedRank = Table.TransformColumns(Grouped, {"AllRows", each RankFunction(_)}),
    //Expand the tables in the AllRows column again
    ExpandAgain = Table.ExpandTableColumn(AddedRank, "AllRows", 
       {"Product", "Sales", "Rank"}, {"Product", "Sales", "Rank"})
in
    ExpandAgain

You can download the example workbook here.

This pattern could also be applied to other types of calculation that need to be nested, for example running totals or shares.

28 thoughts on “Nested Calculations In Power Query

      • About 80% from your book is related to general transformations via UI. At the same time 80% from your blog about handling quite specific cases. If you structure these cases I think you will get really cool 2nd book. Anyway, your work is very helpfull.

  1. Great post Chris,
    I’m wondering if there is a way to utilize this approach to rank more than one type of context. For example, I have product data with varying levels of ranking needs such as product line, brand, season, region, etc. Im wondering if the function might be “reused” for every contextual ranking need like a loop?
    I am currently doing the different rankings using power pivot but my users could definitely benefit from the data being in a flat format like this.
    Power query has changed my whole job in a completely irreversible kind of way! Thanks for your great posts!
    Kellan

  2. One small question, Chris.

    You wrote that TransformColumns is more elegant than Calculated Column.

    What do you think about performance differences between these two approaches?

    • There shouldn’t be a performance difference; they basically translate to the same operators under the covers.

  3. Wow, how elegant & useful (could have used it a couple of times already), so thanks for sharing!

    There seems to be a way to do it without the function, so it could be used without the advanced editor. Simply replace the steps after “Grouping” by this:

    AddRank = Table.AddColumn(Grouped, “Rank”, each Table.AddIndexColumn(Table.Sort([AllRows], “Sales”), “Rank”, 1, 1)),

    & expand on Rank

  4. Hi, it seemes an easy way, but if you have a equal Value twice, it won’t identify it and just count’s down the Index. Isn’t there a real Ranking function`?

    • No, there isn’t a built in rank function but it’s fairly easy to write one yourself. I showed how to do this in the final chapter of my Power Query book, published last year. The code for the function declaration would be something like this:
      (ValueToRank) =>
      Table.RowCount(
      Table.SelectRows(SourceTable, each [ValueColumn]>ValueToRank)
      ) + 1

  5. Chris –

    Just ran across this today (2015-09-04) and find it very interesting. How did I miss the “all rows” operation in the “group by?”

    Anyway, this technique is useful to me for duplicate analysis. I don’t like the straight “remove duplicates” approach because its naive…who knows which records will drop.

    With the technique you’ve outlined here, I will now group by the column for which I want to dup-check. My group by condition will include both “all rows” and “Count rows.” When I expand I’ll have a dup count for each record. Very cool!

    Thanks very much for sharing!

    Dave

  6. Chris-
    Let’s say we have 20 sales columns (Sales#1; Sales#2…). I was wondering How you would have done to use this very usefull rank function over these 20 columns.
    Thanks for your great posts.
    Erwan

  7. Hi Chris;
    Thanks a lot for your prompt reply. Actually, I tried to apply the ranking function to several columns but I am stuck in attempting to retrieve the Table.ColumnNames for the Table.Group(Source, {“Item”}, {{“AllRows”, each _, type table}}) and pass the column name to the function.

    Would you have any idea about how to proceed to achieve this?

    I think the ablity to apply a function over grouped rows for multiple columns would be very helpful for a lot of people.

    Thanks again for your help

  8. Hello – am learning loads from your blog but have a transformation question which is a bit more complex than this example, although this example seems relevant. I’m not a programmer, have basic SQL, have some rudimentary success reading M in that I can follow and modify examples although have difficulty creating from scratch.

    Possibilities (to my mind) to solve this include indexing by a countif or possibly a partitioned index or in any ways a doubling up of this thread’s example.

    Here’s my example data and desired output (column headers and values).

    DATA

    Product Product Type Producer Type(P or Q Only) Producer Rank(1-3 only) ProducerID

    W1 C1 Q Q Q-11
    W1 C1 P P1 P-999
    W1 C1 P P2 P-555
    W1 C2 Q Q Q-3
    W1 C2 P P1 P-999
    W1 C2 P P2 P-222
    W1 C2 P P3 P-555
    W2 C1 Q Q Q-3
    W2 C1 P P1 P-123
    W2 C1 P P2 P-999
    W3 C4 Q Q Q-5
    W3 C4 P P1 P-777
    W3 C4 P P2 P-999

    DESIRED OUTPUT

    Product Product Type Product-ProductType ProducerID-“Q” ProducerID -“P” Rank1 ProducerID -“P” Rank2 ProducerID -“P” Rank3
    W1 C1 W1-C1 Q-11 P-999 P-555
    W1 C2 W1-C2 Q-3 P-999 P-222 P-555
    W2 C1 W2-C1 Q-3 P-123 P-999
    W3 C4 W3-C4 Q-5 P-777 P-999

    In basic Excel it’s reasonably straightforward – I do a lookup of a lookup in order to bring the ProducerID all into a single row against the corresponding Product-ProductType. Can PQ help? It’d be a great model for my work if it’s possible, even if complicated because for millions of rows of data lookups take a long time!

    • Hi Saul,

      From what I can see, you want read in the first table three lines at a time and then a single line out for each group of three? The basic approach in that case would be to put an index column on the table (starting at 0), then do an integer divide on that column by three (so the first three lines would have a value of 0, the next three a value of 1, the next three a value of 2 and so on) then do a pivot on that column.

  9. Hi Chris – Thanks for the reply (I’m amazed how you keep on top of your blog!)

    That approach sounds OK although I’ve now learned the requirement has changed (often the way!) and so upped in complexity. In the example I gave, that would mean the data now has to include up to 9 Ps for multiple Qs per W-C.

    So the desired output now is not a unique row per W-C as given above, but an unknown number of repeated W-C rows for each Q (so unique key becomes W-C-Q), plus all the related Ps in up to 9 columns to the right of each. P and Q IDs in are repeatable.

    I think it’s too much to ask more here – I’ll see if can commandeer a programmer at my end.

    Thanks again and I’ll definitely keep reading up on this.

  10. Oh perhaps it would! I think my mental stumbling block – other than still learning and wrapping my head around which of the seemingly several possible solutions be best! – was that the number of Ps and Qs against each item is variable and unknown in the data (ie. each W-C has an unknown number of associated rows). Your suggestion I think still holds, with a divide by a count for each key.

    Thanks again.

  11. Hello Chris! I started working with Power BI just few weeks ago, but I already find your blog extremely useful. I stumbled upon this post when trying to calculate ranking in a nested table in Power BI Desktop. I used your approach, however I’d like to go one step further. Say I have some data, grouped by ProjectName, and the nested records contain data from 2015 and 2016. I would like to calculate the rank based on a slicer, that is present on my report page. So if my slicer is set to show only data from 2016 the nested table should contain only rows from 2016, and so rank will be different. How would I go about that? Is that possible to expose a function from a data source, so the slicer values would be input parameters to that function? And then the result would be presented on the dashboard. Thanks.

    • If you want to calculate a rank based on a slicer in a report, you’ll have to do the rank calculation in DAX in a measure. This blog post is only for calculations that are done during data load, and I guess you don’t want to have to reload your data each time a slicer changes.

      • Thanks Chris, I got that far already. However I am dealing with a different issue right now: I can’t seem to get sth similar to T-SQL Row_number because RANKX repeats the ranking for the same values, and I cannot rank by multiple columns (as I would in T-SQL). This is not related to this post though 🙂

  12. Hi Criss,
    I tried your example and it’s working fine. Many thanks for that!

    As you mentioned in your artikel: “This pattern could also be applied to other types of calculation that need to be nested, for example running totals or shares.”
    I took your example a step further. I want to add a extra calculation to the nested table.
    (later on i want to calculate the difference between the values and achief this:)
    Month Product Sales Rank Difference
    Jan Grapes 33 1 33
    Jan Oranges 22 2 -11
    Jan Apples 11 3 -11
    Feb Grapes 77 1 77
    Feb Apples 55 2 -22
    Feby Oranges 44 3 -11
    March Oranges 99 1 99
    March Apples 88 2 -11
    March Grapes 66 3 -22
    Based on this info:
    http://excel-inside.pro/blog/2015/11/05/absolute-and-relative-references-in-power-query/

    I just added a column to the function after AddIndex:

    AddIndex = Table.AddIndexColumn(SortRows, “Rank”, 1, 1),
    AddKol1 = Table.AddColumn(AddIndex,”Column1″, each 1)
    in
    AddIndex

    The issue is:
    when i now manually open this table at this point, i don’t see my new added Column1
    And i just can’t figure out why, because -as you said- this pattern could also be applied to other types of calculations,,,
    Can you help me with this one?

    This is the full copy of code at this moment:
    let
    Bron = Excel.Workbook(File.Contents(“D:\….\NestedCalc.xlsx”), null, true),
    Sales_Sheet = Bron{[Item=”Sales”,Kind=”Sheet”]}[Data],
    #”Headers met verhoogd niveau” = Table.PromoteHeaders(Sales_Sheet),
    #”Type gewijzigd” = Table.TransformColumnTypes(#”Headers met verhoogd niveau”,{{“Month”, type text}, {“Product”, type text}, {“Sales”, Int64.Type}}),
    //Group by Month
    #”Grouped” = Table.Group(#”Type gewijzigd”, {“Month”}, {{“AllRows”, each _, type table}}),
    //Declare a function that adds a Rank column to a table
    RankFunction = (tabletorank as table) as table =>
    let
    SortRows = Table.Sort(tabletorank,{{“Sales”, Order.Descending}}),
    AddIndex = Table.AddIndexColumn(SortRows, “Rank”, 1, 1),
    AddKol1 = Table.AddColumn(AddIndex,”Column1″, each 1)
    //Example of later to use calculation
    // AddKol1 = Table.AddColumn(AddIndex,”Column1″, each try AddIndex[Sales]{[Rank]} – AddIndex[Sales]{[Rank]-1} otherwise 0)
    in
    AddIndex,
    //Apply that function to the AllRows column
    AddedRank = Table.TransformColumns(Grouped, {“AllRows”, each RankFunction(_)}),
    //Expand the tables in the AllRows column again
    ExpandAgain = Table.ExpandTableColumn(AddedRank, “AllRows”,
    {“Product”, “Sales”, “Rank”, “Column1”}, {“Product”, “Sales”, “Rank”, “Column1”})
    in
    ExpandAgain

    • It’s because the let expression used in the RankFunction step returns the value of AddIndex and not the value of AddKol1. You need to do this:

      RankFunction = (tabletorank as table) as table =>
      let
      SortRows = Table.Sort(tabletorank,{{“Sales”, Order.Descending}}),
      AddIndex = Table.AddIndexColumn(SortRows, “Rank”, 1, 1),
      AddKol1 = Table.AddColumn(AddIndex,”Column1″, each 1)
      //Example of later to use calculation
      // AddKol1 = Table.AddColumn(AddIndex,”Column1″, each try AddIndex[Sales]{[Rank]} – AddIndex[Sales]{[Rank]-1} otherwise 0)
      in
      AddKol1,

      Note the last line of code.

  13. As it often is, this was a beginners mistake. Completely read over it, as usual.
    I just forgot to adjust one word in the function! How stupid….
    It’s my mental stumbling block in this development proces, no faith in my resolving power.
    Many thanks for your reaction Cris! I completely got stuck an you pulled me out of it.
    Keep up this good work. You just earned much kuddo’s!
    Now i can go on and make a marvelous Power BI dashboard to please someone else.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s