Advanced Ranking and Dynamically-Generated Named Sets in MDX

Calculating ranks is a classic MDX problem, and as is usually the case Mosha has an excellent blog entry on the subject which I would encourage you to read if you haven’t done so already:
http://sqljunkies.com/WebLog/mosha/archive/2006/03/14/mdx_ranking.aspx

His advice can be summarised fairly simply – where possible declare a named set which is ordered to use in the Rank function rather than try to do this ordering every time you want to calculate a rank because it’ll give you much better performance. This is all you need to know for 99% of the rank calculations you’ll ever write; however, I’ve been doing a bit of thinking around the remaining 1% of scenarios and here’s what I’ve come up with.

First of all it’s worth pointing out that the Order function isn’t always the most efficient way of ordering a set. As Mosha points out in this blog entry:
http://sqljunkies.com/WebLog/mosha/archive/2007/04/19/stored_procs_best_practices.aspx
… there’s a performance bug in the Order function which makes it very slow. Mosha’s own example query:

with member y as
count(
Order( [Customer].[Customer].[Customer].MEMBERS
*[Product].[Category].[Category].MEMBERS
*[Product].[Style].[Style].MEMBERS
, Measures.[Internet Sales Amount], BDESC))
select y on 0 from [Adventure Works]

…runs in 40 seconds on my laptop, and while he shows how a sproc can improve performance he doesn’t mention that if you rewrite to use the TopCount function you get even better performance. The following query runs in 4 seconds on a cold cache on my machine:

with
member y as count(topcount(
   {[Customer].[Customer].[Customer].MEMBERS
  *[Product].[Category].[Category].MEMBERS
  *[Product].[Style].[Style].MEMBERS} as myset
, count(myset), Measures.[Internet Sales Amount]))
select y on 0
from [Adventure Works]

The TopCount function uses a different algorithm to the Order function and it’s optimised to return the top n members of an ordered set; for a relatively small set like the one in this example it performs better than the existing buggy implementation of Order but that may not always be the case. And of course it’s highly likely that the Order function will be fixed in a future service pack so at some point it will start performing better than TopCount. As a result, use this approach at your own risk and test thoroughly!

But let’s get back onto the subject of ranking proper. The obvious problem that Mosha doesn’t deal with in his article is what happens when you have to calculate a rank on more than one criteria? To take Mosha’s example, what about calculating the rank of Employees by Reseller Sales Amount for several Months, where those Months are to appear on Columns? If you know what those Months are going to be in advance it’s fairly straightforward because you can create multiple named sets to use; the worst problem you’re going to have is that your query is going to be pretty long. But what if you don’t know what those Months are going to be or how many of them there are? For example, you might be filtering months on another criteria or using the TopPercent function. There’s no way you can create the named sets you need to get good performance if you don’t know how many sets you’re going to need, is there? If you had complete control over the code of your client tool then you could dynamically generate your MDX query string to give you all the named sets you needed, but that would be a pain even if it was possible at all (and it wouldn’t be with many off-the-shelf tools like Proclarity Desktop). Well, one solution to this problem simply uses one named set for all your Months:

WITH
SET MyMonths as TopPercent([Date].[Calendar].[Month].Members, 20, [Measures].[Reseller Sales Amount])
SET MyEmployees as [Employee].[Employee].[Employee].MEMBERS
SET OrderedEmployees AS
GENERATE(MyMonths,
ORDER(
[Employee].[Employee].[Employee].members
, ([Measures].[Reseller Sales Amount],[Date].[Calendar].CurrentMember), BDESC)
, ALL)
MEMBER [Measures].[Employee Rank] AS
RANK([Employee].[Employee].CurrentMember,
SubSet(
OrderedEmployees
,(RANK([Date].[Calendar].CurrentMember, MyMonths)-1) * Count(MyEmployees)
, Count(MyEmployees)
))

SELECT
MyMonths
*
[Measures].[Employee Rank]
ON 0
,MyEmployees ON 1
from [Adventure Works]

The set MyMonths contains the months that I’m interested in, and as I said because it uses the TopPercent function I don’t know in advance how many Months it will contain. However I do know that there’s a static number of Employees that I want to rank so in my OrderedEmployees set I use the Generate function to create a concatenated list of ordered Employees for each Month (note the use of the ALL flag here to make sure Generate doesn’t do a distinct on the set it returns). In my Employee Rank calculation I can then use the Subset function to pick out the section of this set which returns the ordered list of Employees for the current month: it’s the subset that starts at index  (RANK([Date].[Calendar].CurrentMember, MyMonths)-1) * Count(MyEmployees) and is Count(MyEmployees) members long.

BUT… of course this only works because we know there are the same amount of Employees each month. What happens if we change the calculation and ask if the current Employee is in the top 75% of Employees by Reseller Sales Amount for each month? In this case, 8 Employees make up the top 75% for August 2003 but there are 10 for September 2003 and so on so this approach isn’t any use. 

The solution to this problem came to me while I was driving home down the motorway on the way back from my in-laws’ house on Saturday afternoon, and when it did my wife asked me why I had suddenly started smiling so broadly – this is something that will get all you MDX fetishists out there (all three of you) equally excited. Basically it’s a way of dynamically generating named sets within a query. Let’s take a look at the whole solution first:

WITH
SET MyMonths as TopPercent([Date].[Calendar].[Month].Members, 20, [Measures].[Reseller Sales Amount])
SET MyEmployees as [Employee].[Employee].[Employee].MEMBERS
SET MyMonthsWithEmployeeSets as
Generate(
MyMonths
, Union(
{[Date].[Calendar].CurrentMember}
,
StrToSet("
Intersect({},
{TopPercent(MyEmployees, 75, ([Measures].[Reseller Sales Amount],[Date].[Calendar].CurrentMember))
as EmployeeSet" + Cstr(MyMonths.CurrentOrdinal) + "})")
))

MEMBER [Measures].[TopEmployee] AS
iif(
RANK([Employee].[Employee].CurrentMember,
StrToSet("EmployeeSet" + Cstr(Rank([Date].[Calendar].CurrentMember, MyMonths)))
)<>0, "Yes", "No")

SELECT
MyMonthsWithEmployeeSets
*
[Measures].[TopEmployee]
ON 0
,MyEmployees ON 1
from [Adventure Works]

This executes in 2 seconds on a cold cache on my laptop, compared to 52 seconds for the equivalent query which evaluates the TopPercent for every single cell, so it’s definitely a big improvement. What I’m doing is in the set declaration for MyMonthsWithEmployeeSets using a Generate function to iterate over the set of Months I’m going to display on columns and return exactly the same set, but while doing so find the set of the top 75% Employees for each Month and store it in a named set declared inline. The way I do this is to return a set containing the current Month from the iteration and union it with an expression which returns an empty set; the top 75% set is evaluated and stored inside the expression which returns the empty set. The fun bit is that the expression which returns the empty set is inside a string which is passed into StrToSet, and as a result I can dynamically generate the names of my named sets using a static string plus the result of the currentordinal function. In the example above I end up creating five named sets called EmployeeSet1 to EmployeeSet5, one for each Month. Thanks to the fact that I can reference these sets in another calculated member so long as it’s evaluated further down the execution tree (see http://www.sqlserveranalysisservices.com/OLAPPapers/ReverseRunningSum.htm), assuming I construct my SELECT statement appropriately I can then get at the contents of these sets within my TopEmployee calculated member using another call to StrToSet and some string manipulation to determine the name of the set that I’m after. How cool is that?

12 thoughts on “Advanced Ranking and Dynamically-Generated Named Sets in MDX

  1. Hi Chris,
     
    How about this approach – results seem similar at first glance, but not sure how well it can be generalized:
    >>

    WITH
    SET MyMonths as TopPercent([Date].[Calendar].[Month].Members
    , 20, [Measures].[Reseller Sales Amount])
    SET MyEmployees as [Employee].[Employee].[Employee].MEMBERS
    SET MyTopEmployees as
    Generate(
    MyMonths
    ,
    {[Date].[Calendar].CurrentMember} *
    TopPercent(MyEmployees
    , 75, [Measures].[Reseller Sales Amount]))
    MEMBER [Measures].[TopEmployee] AS
    iif(
    RANK(([Date].[Calendar].CurrentMember
    , [Employee].[Employee].CurrentMember)
    , MyTopEmployees)
    > 0, "Yes", "No")
    SELECT
    MyMonths
    *
    [Measures].[TopEmployee]
    ON 0
    , MyEmployees ON 1
    from [Adventure Works]
    >>
     – Deepak

    1. Hi Deepak,

      I modified calculated member to return the Rank and checked that Rank is continue across the months and not re-initialized for every month.
      It should have returned Rank starting from 1 for every month but it is continue across the months whereas Chris above method worked fine.

      – Suhail

  2. I need to have dynamic Ratings calculation in cube:-

    Rating Dimension:

    (ID, MainVariance, MaxVariance, Rating, StartDate, EndDate)

    Rating Dimension values:

    SELECT 1 AS ID, 0 AS MinVariance, 0.10 AS MaxVariance, 1 AS Rating, ’01/01/1900′ AS StartDate, ’06/06/2079′ AS EndDate FROM ApplDim

    UNION

    SELECT 2 AS ID, 0.11 AS MinVariance, 0.30 AS MaxVariance, 2 AS Rating, ’01/01/1900′ AS StartDate, ’06/06/2079′ AS EndDate FROM ApplDim AS ApplDim_5

    UNION

    SELECT 3 AS ID, 0.31 AS MinVariance, 1 AS MaxVariance, 3 AS Rating, ’01/01/1900′ AS StartDate, ’06/06/2079′ AS EndDate

    FROM ApplDim AS ApplDim_4

    UNION

    SELECT 4 AS ID, 0 AS MinVariance, 0.20 AS MaxVariance, 1 AS Rating, ’01/01/2011′ AS StartDate, ’12/31/2011′ AS EndDate

    FROM ApplDim AS ApplDim_3

    UNION

    SELECT 5 AS ID, 0.21 AS MinVariance, 0.50 AS MaxVariance, 2 AS Rating, ’01/01/2011′ AS StartDate, ’12/31/2011′ AS EndDate

    FROM ApplDim AS ApplDim_2

    UNION

    SELECT 6 AS ID, 0.51 AS MinVariance, 1 AS MaxVariance, 3 AS Rating, ’01/01/2011′ AS StartDate, ’12/31/2011′ AS EndDate

    ——–

    Created a set in Cube as “RatingSet” :

    RatingSet

    {

    [Rating].[Rating].[Rating] *

    [Rating].[End Date].[End Date] *

    [Rating].[Max Variance].[Max Variance] *

    [Rating].[Min Variance].[Min Variance] *

    [Rating].[Start Date].[Start Date]

    }

    Calculation which needs to be made dynamic:-

    IIF( ([Measures].[Variance Percent]>=0 and [Measures].[Variance Percent]0.10 and [Measures].[Variance Percent]<= 0.30)

    ,2

    ,3 –Rating

    )

    )

    i.e. Variance Percent will be 0, 10, 30, 50 etc… and Rating will be 0, 1, 2 3 etc…

    Now, How to make it dynamic.

    Any help will be appreciated.

    Please suggest asap.

    Many Thanks in Advance.

  3. Itried this but now working as per expectations:
    WITH DYNAMIC SET Test2 as

    IIF( ([Measures].[Ownership Variance Percent]>=[Rating].[Min Variance].[Min Variance].members.Item(0)

    and [Measures].[Ownership Variance Percent][Rating].[Min Variance].[Min Variance].members.Item(1)

    and [Measures].[Ownership Variance Percent]<= [Rating].[Max Variance].[Max Variance].members.Item(1))
    ,[Rating].[Rating].[Rating].members.item(1)

    ,[Rating].[Rating].[Rating].members.item(2)

    )

    )

  4. Thanks Chris.

    Could you please help me in creating the bucket as per my requirement, posted above?

  5. Cheers for this Chris, took me ages to get it working against my data, but then actually installed AdventureWorks to break it down and got there in the end 🙂 Basically I needed to produce a ranking of all Artists with sales in the last 7 days and their ranking relative to each shop their music had sold in. Also had to do album and track sales but just used a find/replace to get the other two. Code for anyone interested:

    WITH
    SET myShops AS
    TopPercent
    (
    [Dim Shop].[Shop ID].[Shop ID].MEMBERS
    ,100
    ,[Measures].[Artist Sales]
    )
    SET myArtists AS
    Filter
    (
    Except
    (
    [Dim Catalogue].[Artist Id].[Artist Id].MEMBERS
    ,{
    [Dim Catalogue].[Artist Id].&[0]
    ,[Dim Catalogue].[Artist Id].UnknownMember
    }
    )
    ,
    [Measures].[Artist Sales] > 0
    )
    SET myShopsWithArtistsSets AS
    Generate
    (
    myShops
    ,Union
    (
    {[Dim Shop].[Shop ID].CurrentMember}
    ,StrToSet
    (”
    Intersect({},
    {order(myArtists,([Artist Sales],[Dim Shop].[Shop ID].currentmember),desc)
    as ArtistSet”
    +
    Cstr(myShops.CurrentOrdinal)
    + “})”
    )
    )
    ,ALL
    )
    MEMBER [Measures].[Artist Rank] AS
    Rank
    (
    [Dim Catalogue].[Artist Id].CurrentMember
    ,StrToSet
    (
    “ArtistSet” + Cstr(Rank([Dim Shop].[Shop ID].CurrentMember,myShops))
    )
    )
    SELECT
    [Artist Rank] ON 0
    ,
    myShopsWithArtistsSets * myArtists ON 1
    FROM datawarehouse;

  6. Thanks for this Chris, that is just brilliant!
    It took me a few hours to make it work with my Cube but it’s perfect now.

    Cheers

Leave a Reply to PuneetCancel reply