Rob Collie (or rather David Hager, writing on Rob Collie’s blog) recently posted a detailed explanation of how to use the new DAX TopN() function inside a calculation; I assume, however, that’s it going to be more commonly used inside queries. Here’s a simple example that returns the top 10 rows from DimDate by the sum of Sales Amount:
evaluate(
topn(
10
, DimDate
, calculate(sum(FactInternetSales[SalesAmount]))
, 0)
)
Finding the top n of something is a pretty common requirement; an equally common requirement is finding multiple top ns for another list of values, such as the top 10 products sold every year, remembering that you may well have a different top 10 products for each year. In MDX you would, of course, solve this by using the TopCount() function inside the Generate() function and in DAX the solution is very similar, using either the DAX Generate() or GenerateAll() functions. Here’s a query that shows how to do this:
define
measure DimDate[Sum of Sales] = sum(FactInternetSales[SalesAmount])
evaluate(
addcolumns(
filter(
generate(
values(DimDate[CalendarYear])
, topn(
10
, values(DimProduct[EnglishProductName])
, DimDate[Sum of Sales]
, 0)
)
, DimDate[Sum of Sales]>0)
, “Sum of Sales”
, DimDate[Sum of Sales]
)
)
order by
DimDate[CalendarYear] asc
, DimDate[Sum of Sales] desc
Here’s some of the output:
There’s quite a lot going on here, so let me explain some things…
First, notice how I can define a new calculated measure in the Define clause of a DAX query, as follows:
define
measure DimDate[Sum of Sales] = sum(FactInternetSales[SalesAmount])
I need to use this expression several times in the query, so it makes sense to define it just once. Next, here’s how I get the top 10 Products for each Year:
generate(
values(DimDate[CalendarYear])
, topn(
10
, values(DimProduct[EnglishProductName])
, DimDate[Sum of Sales]
, 0)
)
The Generate() function iterates through every row in the table returned by values(DimDate[CalendarYear]) and evaluates the TopN() call in the context of each row; it then unions each top 10 together. In the main query I’m also using the Filter function to remove all the rows returned where there were no sales, just to make the results look tidier. The GenerateAll() function does pretty much the same as the Generate() function, but it will return a row containing null values in situations where the second parameter (in this case, the call to TopN) returns a table with no rows. Last of all, I’m using the AddColumns() function to ensure that the Sum Of Sales is added on to the resultset, and also using the Order By clause to ensure that the rows are returned in a meaningful order.
In part 6, I’ll look at the Row() function.
Nice Post. I’ve tried the top N along with Generate. The result comes in seconds when I use Calendar Year or Calendar Month. I have a branch table with around 200 distinct Branches. When I add this also, it comes within few seconds but takes more time than the period. However, when I tried to do the same thing using Employee Code from Employee Masters (approx 3000 distinct values with around 2000 sales guys) for a single month, it was still running even after 10 mins and I decided not to wait for the result. I guess the generate function for top N works well only for low distinct values. For unique values above 1000s, normal sql top N queries seem to work faster than DAX.
Chris, what is your experience on this?
Just to add further to my reply above, while I was running the Generate + top N for the Employee Code (top 5 records on sales amount), I decided to check the memory usage, and it was more or less stable with only the CPU spiking up and down throughout the period
It’s hard to say what’s going on here, sorry – but I agree, it seems strange it’s taking so long.
I’ve submitted a Connect Issue on the same. Let’s wait and see what the feedback/work around.
https://connect.microsoft.com/SQLServer/feedback/details/1024573
The DAX code is very hard to read 🙁 Would it be possible to have it formatted on the page?