A few years ago I blogged about the Table.Profile M function and how you could use it to create a table of descriptive statistics for your data:
https://blog.crossjoin.co.uk/2016/01/12/descriptive-statistics-in-power-bim-with-table-profile/
Since that post was written a new, optional second parameter has been added to the function called additionalAggregates which allows you to add your own custom columns containing aggregate values to the output of Table.Profile, so I thought I’d write a follow-up on how to use it.
Consider the following query:
let Source = #table( type table[Month = text, Sales = number], { {"Jan",1}, {"Feb",5}, {"Mar",17}, {"Apr",1} } ), Profile = Table.Profile( Source ) in Profile
There are two steps here. Source returns the following table using the #table function:
The second step uses Table.Profile to return a table of summary statistics:
[Several columns returned aren’t shown in the screenshot above]
If you want some extra columns on this table you can now add them with the second additionalAggregates parameter of Table.Profile like so:
let Source = #table( type table[Month = text, Sales = number], { {"Jan",1}, {"Feb",5}, {"Mar",17}, {"Apr",1} } ), Profile = Table.Profile( Source, { {"Median", each Type.Is(_, type number), List.Median}, {"Mode", each Type.Is(_, type number), List.Mode} } ) in Profile
In this example I’ve added two extra columns called Median and Mode, which return the median and mode values of every numeric column:
The additionalAggregates parameter takes a list, where each item is itself a list of three items:
- The name of the new column to add.
- A function to call to determine where the aggregate column will be applied to a given column in the source table. In this case I’m checking to see if a column is type number and only returning a median or mode if it is.
- A function to call to calculate the aggregate value across the column: in this case List.Median and List.Mode. It’s a good idea to use an aggregate function that does not require storing every value in the column in memory (ideally one that can be folded, if you’re using a data source where query folding is supported) here to ensure fast performance.
[Thanks to Curt Hagenlocher for the information here]