Even if blogging about MDX feels, these days, a bit like blogging about COBOL (I’ll be returning to DAX soon, I promise), here’s an interesting MDX problem I came across the other day that I thought was writing about.

Calculating ranks is one of those things in MDX that is slightly trickier than it first appears. There’s a RANK function, of course, but in order to get good performance from it you need to know what you’re doing. It’s fairly widely known that with normal ranks what you need to do is to order the set you’re using before you find the rank of a tuple inside that set. Consider the following query on Adventure Works:

WITH

MEMBER MEASURES.REGULARRANK AS

RANK([Customer].[Customer].CURRENTMEMBER,

ORDER(

[Customer].[Customer].[Customer].MEMBERS

, [Measures].[Internet Sales Amount]

, BDESC))

SELECT

{[Measures].[Internet Sales Amount], [Measures].REGULARRANK}

ON COLUMNS,

ORDER(

[Customer].[Customer].[Customer].MEMBERS

, [Measures].[Internet Sales Amount]

, BDESC)

ON ROWS

FROM [Adventure Works]

It’s unbearably slow (in fact I killed the query rather than wait for it to complete) because, in the calculated member, what we’re doing is ordering the set of all customers every time we calculate a rank. Obviously we don’t need to do this, so the solution to this problem is of course to order the set just once, use a named set to store the result, and then reference the named set in the calculated member as follows:

WITH

SET

ORDEREDCUSTOMERS AS

ORDER(

[Customer].[Customer].[Customer].MEMBERS

, [Measures].[Internet Sales Amount]

, BDESC)

MEMBER MEASURES.REGULARRANK AS

RANK([Customer].[Customer].CURRENTMEMBER,

ORDEREDCUSTOMERS)

SELECT

{[Measures].[Internet Sales Amount], [Measures].REGULARRANK}

ON COLUMNS,

ORDEREDCUSTOMERS

ON ROWS

FROM [Adventure Works]

This query now executes in 5 seconds on my laptop. You probably knew all this already though.

But what happens if you need to handle tied ranks? The approach above doesn’t give you tied ranks because the RANK function, in its two-parameter form, simply finds the position of a tuple in a set, and no two tuples can occupy the same position in a set. That’s why you get results like this:

Even though Courtney A. Edwards and Jackson L. Liu have the same value for Internet Sales Amount, their ranks are 799 and 800 respectively, because Courtney A. Edwards comes before Jackson L. Liu in the ORDEREDCUSTOMERS set.

BOL tells us of the three-parameter form of RANK that does give us tied ranks. This is how you use it:

WITH

SET

ORDEREDCUSTOMERS AS

ORDER(

[Customer].[Customer].[Customer].MEMBERS

, [Measures].[Internet Sales Amount]

, BDESC)

MEMBER [Measures].REGULARRANKTIED AS

RANK([Customer].[Customer].CURRENTMEMBER

,[Customer].[Customer].[Customer].MEMBERS,[Measures].[Internet Sales Amount])

SELECT

{[Measures].[Internet Sales Amount], [Measures].REGULARRANKTIED}

ON COLUMNS,

ORDEREDCUSTOMERS

ON ROWS

FROM [Adventure Works]

But, unfortunately, the performance is as bad as the original version of the non-tied rank calculation, ie incredibly bad, because once again we’re sorting the set every time we calculate a rank. So how can we get tied ranks and good performance?

The first approach I tried was to use a recursive calculation, which used the named set approach to calculate the non-tied rank and then checked to see if the CurrentMember on Customer had the same value for Internet Sales Amount as the member before it in the set of Ordered Customers; if it did, it displayed the rank of the previous Customer. Here it is:

WITH

SET

ORDEREDCUSTOMERS AS

ORDER(

[Customer].[Customer].[Customer].MEMBERS

, [Measures].[Internet Sales Amount]

, BDESC)

MEMBER MEASURES.REGULARRANK AS

RANK([Customer].[Customer].CURRENTMEMBER, ORDEREDCUSTOMERS)

MEMBER MEASURES.TIEDRANK AS

IIF(

[Measures].[Internet Sales Amount] =

(ORDEREDCUSTOMERS.ITEM(MEASURES.REGULARRANK-2), [Measures].[Internet Sales Amount])

AND MEASURES.REGULARRANK>1

, (ORDEREDCUSTOMERS.ITEM(MEASURES.REGULARRANK-2), MEASURES.TIEDRANK)

,MEASURES.REGULARRANK)

SELECT

{[Measures].[Internet Sales Amount], MEASURES.TIEDRANK,[Measures].REGULARRANK}

ON COLUMNS,

ORDEREDCUSTOMERS

ON ROWS

FROM [Adventure Works]

Now this particular query performs pretty well – 6 seconds on my laptop, only marginally worse than the non-tied rank. And it gives the correct results; the middle column of values below shows the tied rank:

Unfortunately, the performance of this approach varies a lot depending on the number of tied ranks that are present in the set. If we slice the query by the year 2001, when there were a lot more customers with tied ranks, as follows:

WITH

ORDEREDCUSTOMERS AS

ORDER(

[Customer].[Customer].[Customer].MEMBERS

, [Measures].[Internet Sales Amount]

, BDESC)

MEMBER MEASURES.REGULARRANK AS

RANK([Customer].[Customer].CURRENTMEMBER, ORDEREDCUSTOMERS)

MEMBER MEASURES.TIEDRANK AS

IIF(

[Measures].[Internet Sales Amount] =

(ORDEREDCUSTOMERS.ITEM(MEASURES.REGULARRANK-2), [Measures].[Internet Sales Amount])

AND MEASURES.REGULARRANK>1

, (ORDEREDCUSTOMERS.ITEM(MEASURES.REGULARRANK-2), MEASURES.TIEDRANK)

,MEASURES.REGULARRANK)

SELECT

{[Measures].[Internet Sales Amount], MEASURES.TIEDRANK,[Measures].REGULARRANK}

ON COLUMNS,

ORDEREDCUSTOMERS

ON ROWS

FROM [Adventure Works]

WHERE([Date].[Calendar Year].&[2001])

…then performance gets really bad once again.

Then I came up with a new approach. After ordering the set of all Customers, I made a second pass over it and created a second set with exactly the same number of items in it: for every customer in the first set, in the second set I added the current Customer if that Customer did not have a tied rank; if the Customer did have a tied rank, I added the first Customer in the original set that shared its tied rank. So if there were four customers, A, B, C and D, and if A had sales of 1, B had sales of 2, C had sales of 2 and D had sales of 3, then this new set would contain the members A, B, B, D. I could then say, for Customer C, that it was the third Customer in the original set, but the third item in the new set was B, and that was the Customer whose rank I needed to display for Customer C. So each item in this second set gives us the member whose rank we need to display for the member in the same position in the set of ordered Customers.

Here’s the MDX:

WITH

ORDEREDCUSTOMERS AS

ORDER(

[Customer].[Customer].[Customer].MEMBERS

, [Measures].[Internet Sales Amount]

, BDESC)

MEMBER MEASURES.REGULARRANK AS

RANK([Customer].[Customer].CURRENTMEMBER, ORDEREDCUSTOMERS)

SET CUSTOMERSWITHTIES AS

GENERATE(

{INTERSECT({ORDEREDCUSTOMERS.ITEM(0)} AS FIRSTTIE,{}), ORDEREDCUSTOMERS} AS ORDEREDCUSTOMERS2

, IIF(

({ORDEREDCUSTOMERS2.CURRENT AS CURRCUST}.ITEM(0), [Measures].[Internet Sales Amount]) =

({ORDEREDCUSTOMERS2.ITEM(ORDEREDCUSTOMERS2.CURRENTORDINAL-2) AS PREVCUST}.ITEM(0), [Measures].[Internet Sales Amount])

, IIF(

(PREVCUST.ITEM(0), [Measures].[Internet Sales Amount])

=

(FIRSTTIE.ITEM(0), [Measures].[Internet Sales Amount])

, {FIRSTTIE}

, {PREVCUST AS FIRSTTIE})

, {CURRCUST})

, ALL)

MEMBER MEASURES.HASTIE AS

RANK([Customer].[Customer].CURRENTMEMBER, CUSTOMERSWITHTIES)

MEMBER MEASURES.TIEDRANK AS

(MEASURES.REGULARRANK, CUSTOMERSWITHTIES.ITEM(MEASURES.REGULARRANK-1))

SELECT

{[Measures].[Internet Sales Amount], MEASURES.TIEDRANK,[Measures].REGULARRANK}

ON COLUMNS,

ORDEREDCUSTOMERS

ON ROWS

FROM [Adventure Works]

The named set CUSTOMERSWITHTIES is where the interesting stuff happens. I’m iterating over the set ORDEREDCUSTOMERS using the GENERATE function, and using inline named sets to store the current Customer in the iteration, the previous Customer, and the first Customer containing the shared tied rank (see here for a similar example of using named sets). It consistently executes in 12 seconds regardless of how you slice the query, so it’s not as good as the best performance of the recursive approach but it’s much, much better than the worst performance of the recursive approach. If anyone has any other ideas on how to solve this problem, I’d love to hear them. I’m still sure there’s a better way of doing this…

Of course, what I really want is for the Formula Engine to be able to optimise queries containing set functions like Order in scenarios like this – I’d want it to know that when a particular set operation returns the same result for a block of cells, it should only perform that set operation once. However, even this wouldn’t necessarily be good enough in all cases – there are plenty of situations where you need to perform the same expensive set operation like a sort or a filter in multiple similar calculations, and you’d like to share the result of this set operation between calculations. For example, you might have a calculated member that counted the number of Customers who bought something both this month and in the previous month, and a second calculated member that counted the number of Customers who not only bought this month and in the previous month and spent more than $1000. In both cases you end up finding the set of Customers who bought this month and last month, which may take a long time to do. This is why I think it would be useful to be able to have calculated members return set objects, which can then be cached, so you can share the set between multiple other calculated members; if you agree, please vote on this Connect.

awesome article Chris! Definitely one to bookmark. I did have a few thoughts about your opening sentence though. Check it out here http://fkearney.spaces.live.com/blog/cns!E91A5DFF9098DE4A!316.entry

Great stuff.

Thanks!!! You saved me.

Hi, How to do this exact same thing for ascending order rank? I tried changing BDESC to BASC in line 7 but not sure what I am missing. Any ideas? Thank you!

I just tried it with BDESC changed to BASC and it seems to be working ok for me…?

Hi Chris, thank you for the wonderful sharing,however we found the Ranking Order after Tied was not in correct sequence. Any suggestion in order for us to keep them in order. (With your example we look into item 120 & 121 is tied, therefore it become 120 & 120, however the sub-sequence 122 which suppose to be 121 still remain as 122)

I skipped ranks after a tie on purpose (so that, in your example, you get 120,120,122) – it’s one of the ways that tied ranks are usually handled, so it is ‘correct’, but it’s not the only way. It sounds like you want the other way of handling ties and I’d need to think about that a bit…