Optimising Returning Customers Calculations in MDX

One of the more popular blog posts from my archives (86 comments so far) is the one I wrote on “Counting New and Returning Customers in MDX”. The trouble with all of the calculations in there is that they execute in cell-by-cell mode, and therefore perform quite badly.

For example, take the following query on Adventure Works to find the number of returning customers (customers who have bought from us today and have also bought something before in the past):

with

member measures.[Returning Customers V1] as

count(

intersect(

nonempty([Customer].[Customer].[Customer].members, [Measures].[Internet Sales Amount])

,

nonempty([Customer].[Customer].[Customer].members, 

    [Measures].[Internet Sales Amount] * {null : [Date].[Date].currentmember.prevmember})

)

)

 

select {measures.[Returning Customers V1]} on 0,

[Date].[Date].[Date].members

on 1

from 

[Adventure Works]

 

On a cold cache this takes 47 seconds on my laptop and a quick look in Profiler shows this executes in cell-by-cell mode. In the comments on the original post Deepak Puri suggested an alternative approach using the Customer Count distinct count measure:

with

member measures.customerstodate as

aggregate(null:[Date].[Date].currentmember, [Measures].[Customer Count])

 

member measures.customerstoprevdate as

([Date].[Date].currentmember.prevmember, [Measures].customerstodate)

 

member measures.newcustomers as

measures.customerstodate - measures.customerstoprevdate

 

member measures.[Returning Customers V2] as

[Measures].[Customer Count] - measures.newcustomers

 

select {measures.[Returning Customers V2]} on 0,

[Date].[Date].[Date].members

on 1

from 

[Adventure Works]

Interestingly, this performs even worse than the previous query (although I would have expected it to be better). So how can we write a query that returns in a reasonable amount of time?

I haven’t found a way to do this for a calculated measure defined on the server, to be used in a true ad hoc query environment like Excel (any suggestions welcome – please leave a comment if you can do it), but I have got a way of optimising this calculation for scenarios where you have control over the MDX being used, such as in SSRS.

Here’s the query:

with

 

set customerdates as

nonempty(

[Date].[Date].[Date].members

*

[Customer].[Customer].[Customer].members

, [Measures].[Internet Sales Amount])

 

set nondistinctcustomers as

generate(

customerdates,

{[Customer].[Customer].currentmember}, all)

 

member measures.customercountsum as

sum(null:[Date].[Date].currentmember, [Measures].[Customer Count])

 

member measures.[Returning Customers V3] as

count(

intersect(

subset(nondistinctcustomers

    , (measures.customercountsum, [Date].[Date].currentmember.prevmember)

    , [Measures].[Customer Count])

,

head(nondistinctcustomers

    , (measures.customercountsum, [Date].[Date].currentmember.prevmember))

)

)

 

 

select {measures.[Returning Customers V3]} on 0,

[Date].[Date].[Date].members

on 1

from 

[Adventure Works]

 

On my laptop, this query executes in around 5 seconds on a cold cache. The reason it’s so much faster is also the reason it can’t be used in ad hoc queries – it uses named sets to find all the combinations of customer date needed by the query in one operation. Here’s a step-by-step explanation of how it works:

  • First of all, the customerdates set gets a set of tuples containing every single combination of day and customer where a purchase was made, using a simple Nonempty().
  • Next, the nondistinctcustomers set takes the customerdates set and removes the dates, so what we are left with is a list of customers. It’s not a list of distinct customers, however – a given customer may appear more than once. This still represents a list of the customers that bought something each day, it’s just that we no longer have any information about which day we’re looking at.
  • The customercountsum measure allows us to take the list of customers in the nondistinctcustomers set and find out which customers bought something in any given day. It’s a running sum of the Customer Count measure. This is a distinct count measure, and usually you wouldn’t use the Sum() function on a distinct count, but it’s important we do here. How is it used? For example, let’s imagine we had just three days of data: on the first day we had three customers, on the second four customers and on the third five customers. That would result in the nondistinctcustomers set containing twelve (not necessarily distinct) customers. We can then use the running sum of a distinct count of customers to find out the index of the item in nondistinctcustomers that is the last customer in the list for each day. So on day two we would have a running sum of seven, and therefore the seventh item in nondistinctcustomers gives us the last customer in the list for that day.
  • Finally, the Returning Customers V3 measure gives us the number of returning customers each day. It uses the customercountsum measure to find the subsets of the nondistinctcustomers set that represent the customers that bought on the current day and the customers that bought on all days up to yesterday, then uses the Intersect() function to find the returning customers.

11 thoughts on “Optimising Returning Customers Calculations in MDX

  1. gbrueckl – Austria – Gerhard has been implementing traditional BI and data warehouse solutions based on the Microsoft Data Platform for over 10 years and has been sharing his knowledge with the community ever since. With the rise of “Big Data” over the last years he also shifted his interests towards new technologies, mainly related to the Hadoop eco system to fight the masses of data he frequently encounters at his customers. Having this strong background with the Microsoft Data Platform, he always tries to combine this “old” and the “new” world which still have a lot of things in common!
    gbrueckl says:

    Its quite late and I probably missed something but why shouldn’t dynamic named sets defined in the cube not work?

    1. gbrueckl – Austria – Gerhard has been implementing traditional BI and data warehouse solutions based on the Microsoft Data Platform for over 10 years and has been sharing his knowledge with the community ever since. With the rise of “Big Data” over the last years he also shifted his interests towards new technologies, mainly related to the Hadoop eco system to fight the masses of data he frequently encounters at his customers. Having this strong background with the Microsoft Data Platform, he always tries to combine this “old” and the “new” world which still have a lot of things in common!
      gbrueckl says:

      anyway, excellent post!

      1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
        Chris Webb says:

        You could probably get something to work, but in order to make the calculation work in all possible scenarios and for all client tools the code would end up being horrible – using the AXIS() function etc. I generally prefer to tell customers that they are better off not trying unless they are really, really good at MDX!

  2. Christopher Ross – Sydney, Australia – I'm simply a problem solver. and Somehow that led me to the wonderful world of BI. One of the greatest things in life is to equip others for success, and that's just what a good BI consultant does. We provide organisations with the tools to turn the raw power of data in to the harnessed energy of information. If the people and businesses I serve can see that I've been a part of helping them make great decisions, save time and make money then my days are made!
    Chris Ross says:

    This is more good stuff. Mind a quick help? I have attempted this code but the returning metric is always the same as the original distinct count (VolAtt = RetVol). Mind showing me where I’m off?

    Here’s my code:
    WITH MEMBER VolAtt AS (
    [Volunteer Attendance].[Volunteer Attendance Status].&[Attended]
    , [Volunteer Attendance].[Source].&[New System]
    , [Measures].[Volunteer Distinct Count]
    )

    SET VolDates AS
    NONEMPTY(
    [Fiscal Calendar].[Date].[Date].MEMBERS
    * [Person].[Person].[Person].MEMBERS
    , [VolAtt]
    )

    SET NonDistinctVol AS
    GENERATE(
    VolDates
    , {[Person].[Person].CURRENTMEMBER},ALL
    )

    MEMBER VolCountSum AS
    SUM(NULL:[Fiscal Calendar].[Date].CURRENTMEMBER,VolAtt)

    MEMBER RetVol AS
    COUNT(
    INTERSECT(
    SUBSET(
    NonDistinctVol
    , (VolCountSum,[Fiscal Calendar].[Date].CURRENTMEMBER.PREVMEMBER)
    , VolAtt
    )
    , HEAD(NonDistinctVol
    ,(VolCountSum,[Fiscal Calendar].[Date].CURRENTMEMBER.PREVMEMBER)
    )
    )
    )

    MEMBER NewVol AS
    VolAtt-RetVol

    SELECT {
    VolAtt
    ,RetVol
    ,NewVol
    } ON 0
    FROM People
    WHERE (
    [Fiscal Calendar].[Date].&[20130721]
    )

    Thx!
    Chris

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      Instead of using the calculated measure VolAtt, can you rewrite the calculation to use just [Measures].[Volunteer Distinct Count] and then put [Volunteer Attendance].[Volunteer Attendance Status].&[Attended]
      and [Volunteer Attendance].[Source].&[New System] in the WHERE clause?

      1. Christopher Ross – Sydney, Australia – I'm simply a problem solver. and Somehow that led me to the wonderful world of BI. One of the greatest things in life is to equip others for success, and that's just what a good BI consultant does. We provide organisations with the tools to turn the raw power of data in to the harnessed energy of information. If the people and businesses I serve can see that I've been a part of helping them make great decisions, save time and make money then my days are made!
        Chris Ross says:

        Thanks for that, I gave it a go, but the results didn’t change. It seems to be that the subset() within the intersect() of [Returning Customers V3] (RetVol) returns no results. Any ideas?

        On a similar note, I used your code as a starting point and stumbled across a method that seems to work quite well. Performance is pretty good too. Thoughts?

        WITH MEMBER VolAtt AS (
        [Volunteer Attendance].[Volunteer Attendance Status].&[Attended]
        , [Volunteer Attendance].[Source].&[New System]
        , [Measures].[Volunteer Attendance Raw]
        )

        MEMBER VolLTDPrev AS
        SUM(NULL:[Fiscal Calendar].[Fiscal Calendar].CURRENTMEMBER.PREVMEMBER,VolAtt)

        SET VolCurPd AS
        GENERATE(
        NONEMPTY([Person].[Person].[Person].MEMBERS * [Service].[Service].[Service].MEMBERS,VolAtt)
        ,{[Person].[Person].CURRENTMEMBER}
        )
        SET VolLTDPrevPd AS
        GENERATE(
        NONEMPTY([Person].[Person].[Person].MEMBERS * [Service].[Service].[Service].MEMBERS,VolLTDPrev)
        ,{[Person].[Person].CURRENTMEMBER}
        )
        SET VolPrevPd AS
        GENERATE(
        NONEMPTY([Person].[Person].[Person].MEMBERS * [Service].[Service].[Service].MEMBERS
        ,(VolAtt,[Fiscal Calendar].[Fiscal Calendar].CURRENTMEMBER.PREVMEMBER)
        )
        ,{[Person].[Person].CURRENTMEMBER}
        )

        MEMBER RetVol AS COUNT(INTERSECT(VolCurPd,VolLTDPrevPd))
        MEMBER NewVol AS COUNT(EXCEPT(VolCurPd,VolLTDPrevPd))
        MEMBER LostVol AS COUNT(EXCEPT(VolPrevPd,VolCurPd))

        SELECT {RetVol,NewVol,LostVol} ON 0
        FROM People
        WHERE [Fiscal Calendar].[Fiscal Calendar].[2013-Jun]

        Btw, welcome to Sydney. I hope the weather holds for you. If you have a chance to swing by Single Origin Coffee Roasters in Surry Hills you’ll find the best coffee in the area.

        Thanks for the great code and being so quick to help!
        Chris

      2. Christopher Ross – Sydney, Australia – I'm simply a problem solver. and Somehow that led me to the wonderful world of BI. One of the greatest things in life is to equip others for success, and that's just what a good BI consultant does. We provide organisations with the tools to turn the raw power of data in to the harnessed energy of information. If the people and businesses I serve can see that I've been a part of helping them make great decisions, save time and make money then my days are made!
        Chris Ross says:

        Also, here’s a sample against the AW Cube:

        WITH
        MEMBER CustLTDPrev AS
        SUM(NULL:[Date].[Calendar].CURRENTMEMBER.PREVMEMBER,[Measures].[Internet Sales Amount])

        SET CustCurPd AS
        GENERATE(
        NONEMPTY([Customer].[Customer].[Customer].MEMBERS,[Measures].[Internet Sales Amount])
        ,{[Customer].[Customer].CURRENTMEMBER}
        )
        SET CustLTDPrevPd AS
        GENERATE(
        NONEMPTY([Customer].[Customer].[Customer].MEMBERS,CustLTDPrev)
        ,{[Customer].[Customer].CURRENTMEMBER}
        )
        SET CustPrevPd AS
        GENERATE(
        NONEMPTY([Customer].[Customer].[Customer].MEMBERS
        ,([Measures].[Internet Sales Amount],[Date].[Calendar].CURRENTMEMBER.PREVMEMBER)
        )
        ,{[Customer].[Customer].CURRENTMEMBER}
        )

        MEMBER RetCust AS COUNT(INTERSECT(CustCurPd,CustLTDPrevPd))
        MEMBER NewCust AS COUNT(EXCEPT(CustCurPd,CustLTDPrevPd))
        MEMBER LostCust AS COUNT(EXCEPT(CustPrevPd,CustCurPd))

        SELECT {RetCust,NewCust,LostCust} ON 0
        FROM [Adventure Works]
        WHERE [Date].[Calendar].[Month].&[2007]&[10]

      3. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
        Chris Webb says:

        Thanks Chris! I went to the Aroma coffee festival down by the Rocks on Sunday morning, and I think the coffee shop you mentioned had a stall there…

      4. Christopher Ross – Sydney, Australia – I'm simply a problem solver. and Somehow that led me to the wonderful world of BI. One of the greatest things in life is to equip others for success, and that's just what a good BI consultant does. We provide organisations with the tools to turn the raw power of data in to the harnessed energy of information. If the people and businesses I serve can see that I've been a part of helping them make great decisions, save time and make money then my days are made!
        Chris Ross says:

        Ah yeah then you’re all set! Hope the week goes well

        Chris

  3. Christopher Ross – Sydney, Australia – I'm simply a problem solver. and Somehow that led me to the wonderful world of BI. One of the greatest things in life is to equip others for success, and that's just what a good BI consultant does. We provide organisations with the tools to turn the raw power of data in to the harnessed energy of information. If the people and businesses I serve can see that I've been a part of helping them make great decisions, save time and make money then my days are made!
    Chris Ross says:

    BTW (if you haven’t already blogged about this issue) for this to work as calculated members/sets in the cube, all sets have be Dynamic and references to those sets in other calculated members/sets must be wrapped in STRTOSET(”). Evidently Dynamic sets are evaluated with every query even if they are not referenced in that query.

    MEMBER RetCust AS COUNT(INTERSECT(STRTOSET(‘CustCurPd’),STRTOSET(‘CustLTDPrevPd’)))

    http://www.sqljason.com/2011/04/performance-problems-with-dynamic-named.html

Leave a Reply to Chris RossCancel reply