A few years ago I blogged about the Table.Profile M function and how you could use it to create a table of descriptive statistics for your data:
https://blog.crossjoin.co.uk/2016/01/12/descriptive-statistics-in-power-bim-with-table-profile/
Since that post was written a new, optional second parameter has been added to the function called additionalAggregates which allows you to add your own custom columns containing aggregate values to the output of Table.Profile, so I thought I’d write a follow-up on how to use it.
Consider the following query:
[sourcecode language='text' padlinenumbers='true'] let Source = #table( type table[Month = text, Sales = number], { {"Jan",1}, {"Feb",5}, {"Mar",17}, {"Apr",1} } ), Profile = Table.Profile( Source ) in Profile [/sourcecode]
There are two steps here. Source returns the following table using the #table function:
The second step uses Table.Profile to return a table of summary statistics:
[Several columns returned aren’t shown in the screenshot above]
If you want some extra columns on this table you can now add them with the second additionalAggregates parameter of Table.Profile like so:
[sourcecode language='text' htmlscript='false' highlight='15,16,17,18'] let Source = #table( type table[Month = text, Sales = number], { {"Jan",1}, {"Feb",5}, {"Mar",17}, {"Apr",1} } ), Profile = Table.Profile( Source, { {"Median", each Type.Is(_, type number), List.Median}, {"Mode", each Type.Is(_, type number), List.Mode} } ) in Profile [/sourcecode]
In this example I’ve added two extra columns called Median and Mode, which return the median and mode values of every numeric column:
The additionalAggregates parameter takes a list, where each item is itself a list of three items:
- The name of the new column to add.
- A function to call to determine where the aggregate column will be applied to a given column in the source table. In this case I’m checking to see if a column is type number and only returning a median or mode if it is.
- A function to call to calculate the aggregate value across the column: in this case List.Median and List.Mode. It’s a good idea to use an aggregate function that does not require storing every value in the column in memory (ideally one that can be folded, if you’re using a data source where query folding is supported) here to ensure fast performance.
[Thanks to Curt Hagenlocher for the information here]
The additional aggregate columns return nulls if the table is imported (for example from an Excel Table) – despite the numerical column made as to type number
Bug ?
No, not a bug – unfortunately – just a symptom of how types work in M, which is very confusing. See
https://social.technet.microsoft.com/Forums/en-US/ce44c332-f1a2-4c02-846b-55d529ad8202/questions-on-type?forum=powerquery
https://social.technet.microsoft.com/Forums/en-US/636e9b44-6820-4ff2-ab60-5dd6a5307bd2/type-conversion-mysteries?forum=powerquery
for an idea of what I’m talking about! The problem here is that the Type.Is function doesn’t do what you’re expecting it to do if you have a table whose data types have been set with Table.TransformColumnTypes. It might be easier just to say
each true
instead of
each Type.Is(_, type number)
@Chris – That technique works for numerical columns – For Text Columns you get results as Text which would be confusing for a normal user
This post from Lars Schreiber is a must-read if you’re having trouble testing for data types: https://ssbi-blog.de/technical-topics-english/pitfalls-with-table-columnsoftype
In the context for this post, if you are loading from Excel then you would need to say
{“Median”, each Type.Is(_, type nullable number), List.Median}
I think
This. I spent 5 minutes trying to figure out why it wouldn’t work before I remembered how strict some functions are with types.
Also, the performance is super quick! I expected I’d need to optimize the following code – but nope!
{“Top5Sample”, each Type.Is(_, type nullable any), each Text.Combine(List.FirstN(List.RemoveItems(List.Transform(List.Distinct(_), Text.From), {“”}), 5), “, “)}
Finally! I had to manually do this and it was a NIGHTMARE to optimize!
Your post is the only place that allowed me to confirm my suspicion that the version of Excel 2016 I am forced to use (and cannot update) contains a version of the M Table.Profile() function that accepts only one parameter. No other theory could explain the error I was receiving in a query I was attempting. Nowhere else have I found any indication that this M function has evolved to add the 2nd parameter. Not even at Microsoft.
Thank you very much for confirming my suspicion.
Thanks ever so much, you’ve saved me a lot of hassle!
@Chris How can I use the Optional aggregate field to get the Column’s “Type”?