EXISTING AND NONEMPTY

Here’s an interesting MDX nugget. The other day I was working on something similar to the following query, that finds the count of the number of customers who bought something in the current month as well as the previous month:

WITH
MEMBER MEASURES.TEST AS
COUNT(NONEMPTY(
NONEMPTY(
EXISTING [Customer].[Customer].[Customer].MEMBERS
, [Measures].[Internet Sales Amount])
, ([Measures].[Internet Sales Amount], [Date].[Calendar].CURRENTMEMBER.PREVMEMBER))
)
SELECT MEASURES.TEST
*
[Product].[Subcategory].[Subcategory].MEMBERS ON 0,
[Date].[Calendar Year].[Calendar Year].MEMBERS
*
[Customer].[Country].[Country].MEMBERS ON 1
FROM [Adventure Works]

On my laptop, this runs in almost 16 seconds – not good. What I was doing here was making the assumption that it was a good idea to do the EXISTING first, to filter the set of customers down as much as possible, then do the two NONEMPTYs. In fact, what I found was that by moving the EXISTING to after the NONEMPTYs had been evaluated the query ran much faster:

WITH
MEMBER MEASURES.TEST AS
COUNT(EXISTING 
NONEMPTY(
NONEMPTY(
[Customer].[Customer].[Customer].MEMBERS
, [Measures].[Internet Sales Amount])
, ([Measures].[Internet Sales Amount], [Date].[Calendar].CURRENTMEMBER.PREVMEMBER))
)
SELECT MEASURES.TEST
*
[Product].[Subcategory].[Subcategory].MEMBERS ON 0,
[Date].[Calendar Year].[Calendar Year].MEMBERS
*
[Customer].[Country].[Country].MEMBERS ON 1
FROM [Adventure Works]

On my laptop now, the above query executes in 3 seconds. Why is this? To be honest, I’m not completely sure… looking in Profiler and Perfmon I see differences, but not many; these things are better left to the likes of Mosha who know what’s going on inside in the formula engine. But anyway I thought I’d post this up because it’s a common form of calculation – so if you are using NonEmpty and Existing together, do the NonEmpty first and then the Existing.

8 thoughts on “EXISTING AND NONEMPTY

  1. Yes, very odd indeed. At first I thought it would be something to do with AS evaluating Existing when determining the axis and then again when deriving the values whereas the second evaluates the axis using NonEmpty, potentially creating a smaller set of data over which to evaluate Existing. (Note, I say "something like" \’cos I hadn\’t quite got my head around what I was actually thinking.)Anyway, testing that half-baked most likely hopelessly inane random thought, I can see that the difference in the perfmon counters for "Total Cells Calculated", "Total Existing", "Total Non Empty" and "Total Sonar SubCubes" for the two queries is zero across the board. Um, doesn\’t that imply a similar (I would say "same" but this clearly not true) execution plan? The "with" statement means the use of local cache only so I presume no "cache magic" is happening, so like you I am left with one thought… "Huh?"

  2. It could be something to do with the interfaces that the sets returned by NonEmpty and Existing return (which won\’t make much sense unless you were at Mosha\’s MDX seminar) but that\’s just a guess…

  3. My 2 cents.Existing is a loop operation and as such a costly, not prefered one in database worlds. It iterates over the entire set one has provided next to it, eliminating members that don\’t comply with current coordinates enforced by other related hierarchies of involved dimension.NonEmpty() is everything but a loop operation. It\’s a true set operation in its nature, probably a sort of scan operation over indexed fields. And as such – very fast.By limiting potentially huge set of members using NonEmpty() first, a smaller portion is left for iteration as everyone instinctively senses. Therefore, it is faster, as Chris has discovered, to apply that immediatelly, not after. The same as in OLTP world, where the most selective condition be should be applied first, right?The problem is that NonEmpty() has a bad reputation of destroying sets and therefore causing outer operations to slow down. It is in my opinion a doubtful reputation, because this special function encompasses strenght for both performance gain and performance deteoration. One should just respect its behaviors and think how it will apply in particular scenario. Just because there are natural disasters like tornados doesn\’t mean we should give up on energy sources like the wind, etc.

  4. I generally try to avoid using NonEmpty inside a calculated measure as it generates one storage engine query per cell calculated. For example, your second MDX query above generates over 1000 storage engine queries.

    The following rewrite pattern seems to perform better on many queries I’ve tried it on:

    WITH
    MEMBER MEASURES.TEST AS
    Sum(
    {[Customer].[Customer].[Customer].MEMBERS}
    *[Customer].[Country].CurrentMember
    ,IIf(
    IsEmpty([Measures].[Internet Sales Amount])
    OR IsEmpty(([Measures].[Internet Sales Amount], [Date].[Calendar].CURRENTMEMBER.PREVMEMBER))
    ,null
    ,1
    )
    )
    SELECT MEASURES.TEST
    *
    [Product].[Subcategory].[Subcategory].MEMBERS ON 0,
    [Date].[Calendar Year].[Calendar Year].MEMBERS
    *
    [Customer].[Country].[Country].MEMBERS ON 1
    FROM [Adventure Works]

  5. Chris, interesting stuff, but I have tried the method you described here in another aggregation mdx logic only to discover that it has the adverse effect. The code below seems to run faster 25 seconds but I’m sure there’s a way to make it run even faster, any ideas?

    With Member [Measures].[Over 250 K Headline %]
    As

    –Nonempty({Existing([Period].[Financial Year].[Financial Year].members,[Matters – Dynamic])},[Measures].[Fees Billed Amount])

    — Over 250K
    1 – (

    Sum(
    Existing
    filter(
    Nonempty({Existing([Period].[Financial Year].[Financial Year].members,[Matters – Dynamic])},[Measures].[Fees Billed Amount])
    ,[Measures].[Fees Billed Amount] > 250000
    )

    ,[Measures].[Fees Billed To Bill Amount]

    )/
    Sum(
    Existing
    filter(
    Nonempty({Existing([Period].[Financial Year].[Financial Year].members,[Matters – Dynamic])},[Measures].[Fees Billed Amount])
    ,[Measures].[Fees Billed Amount] > 250000
    )

    ,[Measures].[Fees Billed Standard Amount]

    ) )
    ,format_string=’#.##’

    Select

    [Period].[Financial Year].[Financial Year].members
    On Columns
    ,

    {[Measures].[Over 250 K Headline %]}
    On Rows

    From [Pricing]

Leave a Reply to Le MDX c’est facile. Enfin presque. | François Jehl sur Microsoft BICancel reply