Aggregating By Local Groups In Power Query

When looking through the Power Query library reference I noticed that the Table.Group() function has a very useful option to aggregate data using what it calls local groups. Rather than explain what this does, let me show you…

Consider the following table showing all of the days in January and whether an employee was on holiday, at work or off sick on any given day in January 2014:

image

Importing this into Power Query and finding the number of days spent on each activity is trivial using the Group By functionality in the Power Query window. Here’s the script that the UI generates:

let

    Source = Excel.CurrentWorkbook(){[Name="Calendar"]}[Content],

    GroupedRows = Table.Group(Source, {"Activity"}, 

        {{"Count of Days", each Table.RowCount(_), type number}})

in

    GroupedRows


And here’s the output it generates:

So far so good. But wouldn’t it be useful to know about distinct time ranges spent on each activity? For example, you can see from the first screenshot that this particular employee was off sick from Friday January 17th to Tuesday January 21st, and then again from Friday January 24th to Monday January 27th; you might want to see these aggregated into two separate time ranges. Table.Group() also allows you to do this.

First, here’s a script with an example:

let

    Source = Excel.CurrentWorkbook(){[Name="Calendar"]}[Content],

    FilteredRows = Table.SelectRows(

                    Source

                    , each ([Day Of Week] <> "Saturday" and [Day Of Week] <> "Sunday")),

    TimeRanges = Table.Group(

                    FilteredRows

                    , "Activity"

                    , {

                        {"Start Date", each List.Min([Date]), type date}

                        , {"End Date", each List.Max([Date]), type date}

                        , {"Number of Days", each List.Count([Date]), type number}

                      }

                    , GroupKind.Local)

in

    TimeRanges

Here’s the output:

You can see from this screenshot that I now have one row for each consecutive range of days (ignoring weekends) spent either on vacation, working or off sick.

Step-by-step, here’s an explanation of what’s happening in the script:

  • Source imports the data from the table in the worksheet
  • FilteredRows filters out the days that fall on a weekend
  • TimeRanges uses the Table.Group function to do all the interesting stuff:
    • It takes the table returned by the FilteredRows step
    • Does a Group By on the Activity column
    • It calculates the min, max and count of the Date column and adds them as new columns
    • The GroupKind.Local optional parameter is the key to getting the behaviour you can see here. The default type of grouping, GroupKind.Global, does a standard group by across the whole table as seen in the first example above. GroupKind.Local on the other hand aggregates only over consecutive sequences of rows, and this means we see three separate time ranges for the activity “Working” and two separate groups for “Sick”.

Pretty cool, isn’t it?

You can download the sample workbook here.

17 thoughts on “Aggregating By Local Groups In Power Query

  1. Hi Chris, This is really awesome trick. But I have just opposite problem. I have data similar to last screenshot above, however repeated some times e.g. 1/1/14 to 1/2/2014 – 2 days Sick Leave, 1/3/14 to 1/6/14 – Annual Leave, etc. Now I need to test whether a person has taken any type of 5 consecutive business days leave. i.e. including sick / annual, etc.

    1. Have a look at this video which i posted without using PowerPivot or Power Query. It would be really great if you can help me in developing a model with PowerPivot for this.

      1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
        Chris Webb says:

        It’s not going to be easy to do, but it will be possible. The solution will be something like this:
        1) Create a calculation in a calculated column that returns a table of dates between the start and end date for each row. Then expand this column so you now have one row for each date in each date range.
        2) Add another calculated column that returns 1 if the date range is for any kind of leave, 0 otherwise.
        3) Do a normal group by on this table so you have one row for each date; sum by the value created in step 2. This will result in a table with one row for each date and a column that is 0 when no leave was taken on the date, >0 if any kind of leave was taken.
        4) Aggregate by local groups as in the post

  2. Chris, thanks for your response. I am really new to DAX, hence If you can really elaborate this little more that would be great. Thank you in advance.

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      I will try, but as I said this is a complicated problem and it will take me some time to write it up.

  3. Hi all and thanks
    I’ve found issues when making a Table.NestedJoin using the output table of a Table.Group & GroupKind.Local as the left table of the NestedJoin.
    It includes the grouped rows of the Table.Group in the join

    Anyone with the same problem?

Leave a Reply to LuisCancel reply