Optimising Returning Customers Calculations in MDX

One of the more popular blog posts from my archives (86 comments so far) is the one I wrote on “Counting New and Returning Customers in MDX”. The trouble with all of the calculations in there is that they execute in cell-by-cell mode, and therefore perform quite badly.

For example, take the following query on Adventure Works to find the number of returning customers (customers who have bought from us today and have also bought something before in the past):

with

member measures.[Returning Customers V1] as

count(

intersect(

nonempty([Customer].[Customer].[Customer].members, [Measures].[Internet Sales Amount])

,

nonempty([Customer].[Customer].[Customer].members, 

    [Measures].[Internet Sales Amount] * {null : [Date].[Date].currentmember.prevmember})

)

)

 

select {measures.[Returning Customers V1]} on 0,

[Date].[Date].[Date].members

on 1

from 

[Adventure Works]

 

On a cold cache this takes 47 seconds on my laptop and a quick look in Profiler shows this executes in cell-by-cell mode. In the comments on the original post Deepak Puri suggested an alternative approach using the Customer Count distinct count measure:

with

member measures.customerstodate as

aggregate(null:[Date].[Date].currentmember, [Measures].[Customer Count])

 

member measures.customerstoprevdate as

([Date].[Date].currentmember.prevmember, [Measures].customerstodate)

 

member measures.newcustomers as

measures.customerstodate - measures.customerstoprevdate

 

member measures.[Returning Customers V2] as

[Measures].[Customer Count] - measures.newcustomers

 

select {measures.[Returning Customers V2]} on 0,

[Date].[Date].[Date].members

on 1

from 

[Adventure Works]

Interestingly, this performs even worse than the previous query (although I would have expected it to be better). So how can we write a query that returns in a reasonable amount of time?

I haven’t found a way to do this for a calculated measure defined on the server, to be used in a true ad hoc query environment like Excel (any suggestions welcome – please leave a comment if you can do it), but I have got a way of optimising this calculation for scenarios where you have control over the MDX being used, such as in SSRS.

Here’s the query:

with

 

set customerdates as

nonempty(

[Date].[Date].[Date].members

*

[Customer].[Customer].[Customer].members

, [Measures].[Internet Sales Amount])

 

set nondistinctcustomers as

generate(

customerdates,

{[Customer].[Customer].currentmember}, all)

 

member measures.customercountsum as

sum(null:[Date].[Date].currentmember, [Measures].[Customer Count])

 

member measures.[Returning Customers V3] as

count(

intersect(

subset(nondistinctcustomers

    , (measures.customercountsum, [Date].[Date].currentmember.prevmember)

    , [Measures].[Customer Count])

,

head(nondistinctcustomers

    , (measures.customercountsum, [Date].[Date].currentmember.prevmember))

)

)

 

 

select {measures.[Returning Customers V3]} on 0,

[Date].[Date].[Date].members

on 1

from 

[Adventure Works]

 

On my laptop, this query executes in around 5 seconds on a cold cache. The reason it’s so much faster is also the reason it can’t be used in ad hoc queries – it uses named sets to find all the combinations of customer date needed by the query in one operation. Here’s a step-by-step explanation of how it works:

  • First of all, the customerdates set gets a set of tuples containing every single combination of day and customer where a purchase was made, using a simple Nonempty().
  • Next, the nondistinctcustomers set takes the customerdates set and removes the dates, so what we are left with is a list of customers. It’s not a list of distinct customers, however – a given customer may appear more than once. This still represents a list of the customers that bought something each day, it’s just that we no longer have any information about which day we’re looking at.
  • The customercountsum measure allows us to take the list of customers in the nondistinctcustomers set and find out which customers bought something in any given day. It’s a running sum of the Customer Count measure. This is a distinct count measure, and usually you wouldn’t use the Sum() function on a distinct count, but it’s important we do here. How is it used? For example, let’s imagine we had just three days of data: on the first day we had three customers, on the second four customers and on the third five customers. That would result in the nondistinctcustomers set containing twelve (not necessarily distinct) customers. We can then use the running sum of a distinct count of customers to find out the index of the item in nondistinctcustomers that is the last customer in the list for each day. So on day two we would have a running sum of seven, and therefore the seventh item in nondistinctcustomers gives us the last customer in the list for that day.
  • Finally, the Returning Customers V3 measure gives us the number of returning customers each day. It uses the customercountsum measure to find the subsets of the nondistinctcustomers set that represent the customers that bought on the current day and the customers that bought on all days up to yesterday, then uses the Intersect() function to find the returning customers.

11 thoughts on “Optimising Returning Customers Calculations in MDX

      1. You could probably get something to work, but in order to make the calculation work in all possible scenarios and for all client tools the code would end up being horrible – using the AXIS() function etc. I generally prefer to tell customers that they are better off not trying unless they are really, really good at MDX!

  1. This is more good stuff. Mind a quick help? I have attempted this code but the returning metric is always the same as the original distinct count (VolAtt = RetVol). Mind showing me where I’m off?

    Here’s my code:
    WITH MEMBER VolAtt AS (
    [Volunteer Attendance].[Volunteer Attendance Status].&[Attended]
    , [Volunteer Attendance].[Source].&[New System]
    , [Measures].[Volunteer Distinct Count]
    )

    SET VolDates AS
    NONEMPTY(
    [Fiscal Calendar].[Date].[Date].MEMBERS
    * [Person].[Person].[Person].MEMBERS
    , [VolAtt]
    )

    SET NonDistinctVol AS
    GENERATE(
    VolDates
    , {[Person].[Person].CURRENTMEMBER},ALL
    )

    MEMBER VolCountSum AS
    SUM(NULL:[Fiscal Calendar].[Date].CURRENTMEMBER,VolAtt)

    MEMBER RetVol AS
    COUNT(
    INTERSECT(
    SUBSET(
    NonDistinctVol
    , (VolCountSum,[Fiscal Calendar].[Date].CURRENTMEMBER.PREVMEMBER)
    , VolAtt
    )
    , HEAD(NonDistinctVol
    ,(VolCountSum,[Fiscal Calendar].[Date].CURRENTMEMBER.PREVMEMBER)
    )
    )
    )

    MEMBER NewVol AS
    VolAtt-RetVol

    SELECT {
    VolAtt
    ,RetVol
    ,NewVol
    } ON 0
    FROM People
    WHERE (
    [Fiscal Calendar].[Date].&[20130721]
    )

    Thx!
    Chris

    1. Instead of using the calculated measure VolAtt, can you rewrite the calculation to use just [Measures].[Volunteer Distinct Count] and then put [Volunteer Attendance].[Volunteer Attendance Status].&[Attended]
      and [Volunteer Attendance].[Source].&[New System] in the WHERE clause?

      1. Thanks for that, I gave it a go, but the results didn’t change. It seems to be that the subset() within the intersect() of [Returning Customers V3] (RetVol) returns no results. Any ideas?

        On a similar note, I used your code as a starting point and stumbled across a method that seems to work quite well. Performance is pretty good too. Thoughts?

        WITH MEMBER VolAtt AS (
        [Volunteer Attendance].[Volunteer Attendance Status].&[Attended]
        , [Volunteer Attendance].[Source].&[New System]
        , [Measures].[Volunteer Attendance Raw]
        )

        MEMBER VolLTDPrev AS
        SUM(NULL:[Fiscal Calendar].[Fiscal Calendar].CURRENTMEMBER.PREVMEMBER,VolAtt)

        SET VolCurPd AS
        GENERATE(
        NONEMPTY([Person].[Person].[Person].MEMBERS * [Service].[Service].[Service].MEMBERS,VolAtt)
        ,{[Person].[Person].CURRENTMEMBER}
        )
        SET VolLTDPrevPd AS
        GENERATE(
        NONEMPTY([Person].[Person].[Person].MEMBERS * [Service].[Service].[Service].MEMBERS,VolLTDPrev)
        ,{[Person].[Person].CURRENTMEMBER}
        )
        SET VolPrevPd AS
        GENERATE(
        NONEMPTY([Person].[Person].[Person].MEMBERS * [Service].[Service].[Service].MEMBERS
        ,(VolAtt,[Fiscal Calendar].[Fiscal Calendar].CURRENTMEMBER.PREVMEMBER)
        )
        ,{[Person].[Person].CURRENTMEMBER}
        )

        MEMBER RetVol AS COUNT(INTERSECT(VolCurPd,VolLTDPrevPd))
        MEMBER NewVol AS COUNT(EXCEPT(VolCurPd,VolLTDPrevPd))
        MEMBER LostVol AS COUNT(EXCEPT(VolPrevPd,VolCurPd))

        SELECT {RetVol,NewVol,LostVol} ON 0
        FROM People
        WHERE [Fiscal Calendar].[Fiscal Calendar].[2013-Jun]

        Btw, welcome to Sydney. I hope the weather holds for you. If you have a chance to swing by Single Origin Coffee Roasters in Surry Hills you’ll find the best coffee in the area.

        Thanks for the great code and being so quick to help!
        Chris

      2. Also, here’s a sample against the AW Cube:

        WITH
        MEMBER CustLTDPrev AS
        SUM(NULL:[Date].[Calendar].CURRENTMEMBER.PREVMEMBER,[Measures].[Internet Sales Amount])

        SET CustCurPd AS
        GENERATE(
        NONEMPTY([Customer].[Customer].[Customer].MEMBERS,[Measures].[Internet Sales Amount])
        ,{[Customer].[Customer].CURRENTMEMBER}
        )
        SET CustLTDPrevPd AS
        GENERATE(
        NONEMPTY([Customer].[Customer].[Customer].MEMBERS,CustLTDPrev)
        ,{[Customer].[Customer].CURRENTMEMBER}
        )
        SET CustPrevPd AS
        GENERATE(
        NONEMPTY([Customer].[Customer].[Customer].MEMBERS
        ,([Measures].[Internet Sales Amount],[Date].[Calendar].CURRENTMEMBER.PREVMEMBER)
        )
        ,{[Customer].[Customer].CURRENTMEMBER}
        )

        MEMBER RetCust AS COUNT(INTERSECT(CustCurPd,CustLTDPrevPd))
        MEMBER NewCust AS COUNT(EXCEPT(CustCurPd,CustLTDPrevPd))
        MEMBER LostCust AS COUNT(EXCEPT(CustPrevPd,CustCurPd))

        SELECT {RetCust,NewCust,LostCust} ON 0
        FROM [Adventure Works]
        WHERE [Date].[Calendar].[Month].&[2007]&[10]

      3. Thanks Chris! I went to the Aroma coffee festival down by the Rocks on Sunday morning, and I think the coffee shop you mentioned had a stall there…

  2. BTW (if you haven’t already blogged about this issue) for this to work as calculated members/sets in the cube, all sets have be Dynamic and references to those sets in other calculated members/sets must be wrapped in STRTOSET(”). Evidently Dynamic sets are evaluated with every query even if they are not referenced in that query.

    MEMBER RetCust AS COUNT(INTERSECT(STRTOSET(‘CustCurPd’),STRTOSET(‘CustLTDPrevPd’)))

    http://www.sqljason.com/2011/04/performance-problems-with-dynamic-named.html

Leave a Reply to Chris RossCancel reply