EXISTING AND NONEMPTY

Chris Webb MDX March 31, 2009June 1, 2010 1 Minute

Here’s an interesting MDX nugget. The other day I was working on something similar to the following query, that finds the count of the number of customers who bought something in the current month as well as the previous month:

WITH
MEMBER MEASURES.TEST AS
COUNT(NONEMPTY(
NONEMPTY(
EXISTING [Customer].[Customer].[Customer].MEMBERS
, [Measures].[Internet Sales Amount])
, ([Measures].[Internet Sales Amount], [Date].[Calendar].CURRENTMEMBER.PREVMEMBER))
)
SELECT MEASURES.TEST
*
[Product].[Subcategory].[Subcategory].MEMBERS ON 0,
[Date].[Calendar Year].[Calendar Year].MEMBERS
*
[Customer].[Country].[Country].MEMBERS ON 1
FROM [Adventure Works]

On my laptop, this runs in almost 16 seconds – not good. What I was doing here was making the assumption that it was a good idea to do the EXISTING first, to filter the set of customers down as much as possible, then do the two NONEMPTYs. In fact, what I found was that by moving the EXISTING to after the NONEMPTYs had been evaluated the query ran much faster:

WITH
MEMBER MEASURES.TEST AS
COUNT(EXISTING
NONEMPTY(
NONEMPTY(
[Customer].[Customer].[Customer].MEMBERS
, [Measures].[Internet Sales Amount])
, ([Measures].[Internet Sales Amount], [Date].[Calendar].CURRENTMEMBER.PREVMEMBER))
)
SELECT MEASURES.TEST
*
[Product].[Subcategory].[Subcategory].MEMBERS ON 0,
[Date].[Calendar Year].[Calendar Year].MEMBERS
*
[Customer].[Country].[Country].MEMBERS ON 1
FROM [Adventure Works]

On my laptop now, the above query executes in 3 seconds. Why is this? To be honest, I’m not completely sure… looking in Profiler and Perfmon I see differences, but not many; these things are better left to the likes of Mosha who know what’s going on inside in the formula engine. But anyway I thought I’d post this up because it’s a common form of calculation – so if you are using NonEmpty and Existing together, do the NonEmpty first and then the Existing.

Published by Chris Webb

My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel. View all posts by Chris Webb

Published March 31, 2009June 1, 2010

8 thoughts on “EXISTING AND NONEMPTY”

Phil says:

March 31, 2009 at 5:01 pm

Yes, very odd indeed. At first I thought it would be something to do with AS evaluating Existing when determining the axis and then again when deriving the values whereas the second evaluates the axis using NonEmpty, potentially creating a smaller set of data over which to evaluate Existing. (Note, I say "something like" \’cos I hadn\’t quite got my head around what I was actually thinking.)Anyway, testing that half-baked most likely hopelessly inane random thought, I can see that the difference in the perfmon counters for "Total Cells Calculated", "Total Existing", "Total Non Empty" and "Total Sonar SubCubes" for the two queries is zero across the board. Um, doesn\’t that imply a similar (I would say "same" but this clearly not true) execution plan? The "with" statement means the use of local cache only so I presume no "cache magic" is happening, so like you I am left with one thought… "Huh?"

Loading...

Reply
Chris says:

April 1, 2009 at 11:23 am

It could be something to do with the interfaces that the sets returned by NonEmpty and Existing return (which won\’t make much sense unless you were at Mosha\’s MDX seminar) but that\’s just a guess…

Loading...

Reply
Tomislav says:

September 18, 2009 at 3:52 pm

My 2 cents.Existing is a loop operation and as such a costly, not prefered one in database worlds. It iterates over the entire set one has provided next to it, eliminating members that don\’t comply with current coordinates enforced by other related hierarchies of involved dimension.NonEmpty() is everything but a loop operation. It\’s a true set operation in its nature, probably a sort of scan operation over indexed fields. And as such – very fast.By limiting potentially huge set of members using NonEmpty() first, a smaller portion is left for iteration as everyone instinctively senses. Therefore, it is faster, as Chris has discovered, to apply that immediatelly, not after. The same as in OLTP world, where the most selective condition be should be applied first, right?The problem is that NonEmpty() has a bad reputation of destroying sets and therefore causing outer operations to slow down. It is in my opinion a doubtful reputation, because this special function encompasses strenght for both performance gain and performance deteoration. One should just respect its behaviors and think how it will apply in particular scenario. Just because there are natural disasters like tornados doesn\’t mean we should give up on energy sources like the wind, etc.

Loading...

Reply
Chris says:

September 20, 2009 at 9:23 pm

Hmm, well, I think only someone on the dev team will be able to answer this question.

Loading...

Reply
Greg Galloway says:

June 30, 2011 at 6:14 pm

I generally try to avoid using NonEmpty inside a calculated measure as it generates one storage engine query per cell calculated. For example, your second MDX query above generates over 1000 storage engine queries.

The following rewrite pattern seems to perform better on many queries I’ve tried it on:

WITH
MEMBER MEASURES.TEST AS
Sum(
{[Customer].[Customer].[Customer].MEMBERS}
*[Customer].[Country].CurrentMember
,IIf(
IsEmpty([Measures].[Internet Sales Amount])
OR IsEmpty(([Measures].[Internet Sales Amount], [Date].[Calendar].CURRENTMEMBER.PREVMEMBER))
,null
,1
)
)
SELECT MEASURES.TEST
*
[Product].[Subcategory].[Subcategory].MEMBERS ON 0,
[Date].[Calendar Year].[Calendar Year].MEMBERS
*
[Customer].[Country].[Country].MEMBERS ON 1
FROM [Adventure Works]

Loading...

Reply
Pingback: Le MDX c’est facile. Enfin presque. | François Jehl sur Microsoft BI
Dan says:

September 24, 2014 at 11:51 am

Chris, interesting stuff, but I have tried the method you described here in another aggregation mdx logic only to discover that it has the adverse effect. The code below seems to run faster 25 seconds but I’m sure there’s a way to make it run even faster, any ideas?

With Member [Measures].[Over 250 K Headline %]
As

–Nonempty({Existing([Period].[Financial Year].[Financial Year].members,[Matters – Dynamic])},[Measures].[Fees Billed Amount])

— Over 250K
1 – (

Sum(
Existing
filter(
Nonempty({Existing([Period].[Financial Year].[Financial Year].members,[Matters – Dynamic])},[Measures].[Fees Billed Amount])
,[Measures].[Fees Billed Amount] > 250000
)

,[Measures].[Fees Billed To Bill Amount]

)/
Sum(
Existing
filter(
Nonempty({Existing([Period].[Financial Year].[Financial Year].members,[Matters – Dynamic])},[Measures].[Fees Billed Amount])
,[Measures].[Fees Billed Amount] > 250000
)

,[Measures].[Fees Billed Standard Amount]

) )
,format_string=’#.##’

Select

[Period].[Financial Year].[Financial Year].members
On Columns
,

{[Measures].[Over 250 K Headline %]}
On Rows

From [Pricing]

Loading...

Reply
1. Chris Webb says:
  
  September 24, 2014 at 12:48 pm
  
  Hi Dan,
  
  Do you need to use Existing twice inside each of your Sum() expressions? I don’t think you do, and that might help. Also check out this post on optimising this type of calculation: http://sqlblog.com/blogs/mosha/archive/2007/11/22/optimizing-count-filter-expressions-in-mdx.aspx
  
  Loading...
  
  Reply