Adding More Aggregate Columns To The Output Of Table.Profile

A few years ago I blogged about the Table.Profile M function and how you could use it to create a table of descriptive statistics for your data:

https://blog.crossjoin.co.uk/2016/01/12/descriptive-statistics-in-power-bim-with-table-profile/

Since that post was written a new, optional second parameter has been added to the function called additionalAggregates which allows you to add your own custom columns containing aggregate values to the output of Table.Profile, so I thought I’d write a follow-up on how to use it.

Consider the following query:

[sourcecode language='text'  padlinenumbers='true']
let
    Source = 
    #table(
        type table[Month = text, Sales = number],
        {
            {"Jan",1},
            {"Feb",5},
            {"Mar",17},
            {"Apr",1}
        }
    ),
    Profile = 
    Table.Profile(
        Source
    )
in
    Profile
[/sourcecode]

 

There are two steps here. Source returns the following table using the #table function:

image

The second step uses Table.Profile to return a table of summary statistics:

image

[Several columns returned aren’t shown in the screenshot above]

If you want some extra columns on this table you can now add them with the second additionalAggregates parameter of Table.Profile like so:

[sourcecode language='text'  htmlscript='false' highlight='15,16,17,18']
let
    Source = 
    #table(
        type table[Month = text, Sales = number],
        {
            {"Jan",1},
            {"Feb",5},
            {"Mar",17},
            {"Apr",1}
        }
    ),
    Profile = 
    Table.Profile(
        Source,
        {
        {"Median", each Type.Is(_, type number), List.Median},
        {"Mode", each Type.Is(_, type number), List.Mode}
        }
    )
in
    Profile
[/sourcecode]

In this example I’ve added two extra columns called Median and Mode, which return the median and mode values of every numeric column:

image

The additionalAggregates parameter takes a list, where each item is itself a list of three items:

  • The name of the new column to add.
  • A function to call to determine where the aggregate column will be applied to a given column in the source table. In this case I’m checking to see if a column is type number and only returning a median or mode if it is.
  • A function to call to calculate the aggregate value across the column: in this case List.Median and List.Mode. It’s a good idea to use an aggregate function that does not require storing every value in the column in memory (ideally one that can be folded, if you’re using a data source where query folding is supported) here to ensure fast performance.

[Thanks to Curt Hagenlocher for the information here]

10 thoughts on “Adding More Aggregate Columns To The Output Of Table.Profile

  1. The additional aggregate columns return nulls if the table is imported (for example from an Excel Table) – despite the numerical column made as to type number
    Bug ?

    1. No, not a bug – unfortunately – just a symptom of how types work in M, which is very confusing. See
      https://social.technet.microsoft.com/Forums/en-US/ce44c332-f1a2-4c02-846b-55d529ad8202/questions-on-type?forum=powerquery
      https://social.technet.microsoft.com/Forums/en-US/636e9b44-6820-4ff2-ab60-5dd6a5307bd2/type-conversion-mysteries?forum=powerquery
      for an idea of what I’m talking about! The problem here is that the Type.Is function doesn’t do what you’re expecting it to do if you have a table whose data types have been set with Table.TransformColumnTypes. It might be easier just to say
      each true
      instead of
      each Type.Is(_, type number)

      1. @Chris – That technique works for numerical columns – For Text Columns you get results as Text which would be confusing for a normal user

    1. This. I spent 5 minutes trying to figure out why it wouldn’t work before I remembered how strict some functions are with types.

      Also, the performance is super quick! I expected I’d need to optimize the following code – but nope!

      {“Top5Sample”, each Type.Is(_, type nullable any), each Text.Combine(List.FirstN(List.RemoveItems(List.Transform(List.Distinct(_), Text.From), {“”}), 5), “, “)}

  2. Your post is the only place that allowed me to confirm my suspicion that the version of Excel 2016 I am forced to use (and cannot update) contains a version of the M Table.Profile() function that accepts only one parameter. No other theory could explain the error I was receiving in a query I was attempting. Nowhere else have I found any indication that this M function has evolved to add the 2nd parameter. Not even at Microsoft.

    Thank you very much for confirming my suspicion.

Leave a Reply