Counting Customers Who Have Bought All Selected Products

Market basket analysis is a fairly well understood problem for SSAS – it’s best handled using many-to-many relationships, as shown in the excellent The Many-to-Many Revolution  white paper. However the other week I was asked a question that was an interesting variation on this problem, and which shows up one of the limitations of the many-to-many approach.

Let me show you an example. The following PivotTable on the Adventure Works cube shows the number of distinct customers who bought products in the product category Bikes:

image

The Customer Count measure is a distinct count measure, the Country hierarchy from the Customer dimension is shown on rows, and the slicer here shows the Category hierarchy from the Product dimension. If you select Bikes and Clothing in the slicer, like so:

image

… the measure values are larger because what you’re seeing now is the number of distinct customers who bought products in either the category Bikes or the category Clothing.

But what if you want to see the number of customers who bought Bikes AND Clothing? Well, setting up a many-to-many relationship will give you this (and a lot more useful stuff too) but there are some limitations with this approach:

  • Adding the m2m relationship and the extra measure groups it needs increases processing time
  • You have to select your two product categories from two different (but identical) hierarchies
  • You can only select two product categories – if you want to add a third to your AND filter you need to add a third m2m relationship, and so on. It’s not very flexible.

It is possible to get around these issues with a pure MDX approach, and apply an AND filter that works with however many product categories that the user selects.

The key to understanding how this works is to see how you would write an MDX calculated member that does an AND filter on two product categories:

WITH

MEMBER MEASURES.[Bikes And Clothing] AS

COUNT(

EXISTING 

NONEMPTY(

NONEMPTY(

[Customer].[Customer].[Customer].MEMBERS,

([Measures].[Internet Sales Amount],[Product].[Category].&[3])),

([Measures].[Internet Sales Amount],[Product].[Category].&[1]))

)

SELECT

{MEASURES.[Bikes And Clothing]}

ON 0,

NON EMPTY

[Customer].[Country].[Country].MEMBERS

ON 1

FROM

[Adventure Works]

image

Here I’ve used two nested NonEmpty() functions to filter the set of all customers, first to get the ones that bought Clothing, then to filter these customers again to get the ones that bought Bikes; there’s also an EXISTING there to get only the ones in the current country.

Once again, to add more product categories to the AND filter you need to add more nested NonEmpty() functions… which means you need a way to dynamically generate the code, which of course you can do using StrToSet(). Now normally I avoid using StrToSet() inside MDX calculations because it can cause serious performance problems but in this case it’s the best choice.

The following named set and calculated measure show how to solve the problem for Adventure Works:

CREATE DYNAMIC SET [Selected Product Categories] as  

EXISTING [Product].[Category].[Category].MEMBERS;

 

CREATE MEMBER CURRENTCUBE.MEASURES.[Distinct Customers with all selected Product Categories] AS

IIF(

[Measures].[Internet Sales Amount]=0,

NULL,

COUNT(

EXISTING

STRTOSET(

GENERATE(

[Selected Product Categories],

"NONEMPTY(", "")

+ "[Customer].[Customer].[Customer].MEMBERS" +

GENERATE(

[Selected Product Categories],

", ([Measures].[Internet Sales Amount], " +

[Selected Product Categories].CURRENT.UNIQUENAME

+ "))", "")

)

)

);

The dynamic named set is there to help work out what is selected in the slicer in my Excel worksheet, as shown in Mosha’s old post here. The calculated measure is where all the interesting stuff happens: it uses the ability of the Generate() function to iterate over a set (in this case the dynamic named set), evaluate a string expression and concatenate these strings. The output of this is a series of nested NonEmpty()s, which then goes to StrToSet() to be evaluated, and then the contents of that set are counted.

image

This technique does not replace using a m2m relationship, because it will not allow you to show a query with product category on rows and columns and the number of customers who bought each combination shown. However, if all you want to do is show the number of customers who have bought one specific combination, this has a lot of advantages.

For all of your MDX fans out there, here’s another approach I came up with which doesn’t perform quite as well but is so much fun I had to share it:

CREATE MEMBER CURRENTCUBE.MEASURES.[Distinct Customers with all selected Product Categories V2] AS

IIF(

[Measures].[Internet Sales Amount]=0,

NULL,

COUNT(

UNION(

INTERSECT(EXISTING [Customer].[Customer].[Customer].MEMBERS AS MYCUSTOMERS, {}),

GENERATE(

[Selected Product Categories],

IIF(

[Selected Product Categories].CURRENT.ITEM(0) IS TAIL([Selected Product Categories]).ITEM(0),

NONEMPTY(MYCUSTOMERS, [Measures].[Internet Sales Amount]),

INTERSECT(NONEMPTY(MYCUSTOMERS, [Measures].[Internet Sales Amount]) AS MYCUSTOMERS, {})

)

)

)

)

);

I’m using Generate() here again, but this time I’m using it with inline named sets (which are even worse for performance than StrToSet) in a way similar to this classic post. The theory is the same though: it’s a way of calling NonEmpty() multiple times over a set with an arbitrary number of items in.

Here’s what the calculation does:

  • Inside the first Intersect() I’m declaring an inline named set called MYCUSTOMERS, then doing the Intersect() with an empty set to return… and empty set. But I’ve managed to declare my inline named set, which is the point here.
  • I’m then iterating over the same dynamic named set shown in my earlier example using Generate(), and:
    • For all but the last product category in that set, I’m doing the NonEmpty() on the contents of MYCUSTOMERS on the current product category and then overwriting the contents of MYCUSTOMERS with the output. However, for these iterations of Generate() I’m returning an empty set.
    • For the last product category in the set I’m actually returning the output of NonEmpty() over MYCUSTOMERS for the current product category. The reason I’m only returning something other than an empty set on the last iteration of Generate() is that Generate() returns the union of all the sets returned by each iteration, and that would give me the wrong results! I only want to output the set from the final iteration.

10 thoughts on “Counting Customers Who Have Bought All Selected Products

  1. Hi Chris,

    Is it possible to use a hierarchy for the selection ?
    For example the hierarchy “product categories”, with levels category – subcategory – product, by selecting a combination of members of the different levels ?

  2. CREATE DYNAMIC SET [testvanilse] as
    EXCEPT(EXISTING [Kost Vergoeding].[Vergoedingsgroep Hierarchie].MEMBERS, [Kost Vergoeding].[Vergoedingsgroep Hierarchie].[All]);

    CREATE MEMBER CURRENTCUBE.MEASURES.[testvanilsebasket]
    AS IIF([Measures].[01 Bruto]=0,NULL,
    COUNT(EXISTING
    STRTOSET(
    GENERATE(
    [testvanilse],
    “NONEMPTY(“, “”)
    + “[Pers Identificatie].[Naam].[Naam].MEMBERS” +
    GENERATE(
    [testvanilse],
    “, ([Measures].[01 Bruto], ” +
    [testvanilse].CURRENT.UNIQUENAME
    + “))”, “”))));

    My hierarchy has 3 levels : Vergoedingsgroep, Vergoedingscode and Vergoedingssubcode.
    When I select a Vergoedingsgroep in the slicer, all underlying Vergoedingscodes and Vergoedingssubcodes are also selected.

    • Ah, I think I understand what the problem is here. If you pass every member of a hierarchy (except the All Member) into the dynamic set then that set will include each subcategory and every member in it. Since the code filters by every member in the set then a subcategory will only appear if every product in the subcategory has also been bought. You’ll need to exclude the underlying products and only include the subcategory in the set to get the behaviour you want, I think.

  3. Hi Chris,
    Thanks for the great post. I want to do the same in a Tabular model – Is it possible to do this in DAX(add more than two categories to AND filter) ?
    Thanks.

  4. Great post, thank you Chris.

    I’m having the same problem as Ilse Haak, though. From what I can see the issue comes from the dynamic set. When I apply a filter on a member, the dynamic set gets all the descendants and the ancestors of that member, which is obviously not what I need.

    For instance, in a hierarchy Year-Quarter-Month, if I select 2014-Q4 (Quarter) for instance, the dynamic set would be composed of 2014-Q4 (Quarter), but also 2014 (Year), 2014-10, 2014-11 and 2014-12 (Month). I just want 2014-Q4! Any idea on how to do this? I’m pretty confident that if I fix this the rest would work fine…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s