Arbitrary-shaped sets and the Storage Engine cache

Here’s a companion-piece to my post last week about query context and the formula engine cache – another scenario where you can easily stop caching taking place without knowing it, which has already been documented (although there is at least one important point to add) but again needs more visibility. This time the problem is that when you write an MDX query with an arbitrary-shaped set in the Where clause it stops Analysis Services using the storage engine cache. Queries that suffer from this will always read from disk and always perform as well or as badly as they did the first time they were run – so if cold cache performance is a problem for you, then this is an issue you need to understand and avoid.

Rather than repeat the information, let me direct you to the blog entry where I first found out about this problem, on Thomas Keyser’s blog from 2006:
http://schastar.spaces.live.com/blog/cns!12BCB785A5D8B3D4!135.entry

I can confirm that everything he says is still relevant on SSAS2008 except for the last query, where he has the whole of the Product.[Product Categories] hierarchy in the Where clause – run it twice and the second time you run it you’ll see it does hit the storage engine cache. One other point I picked up on Mosha’s MDX seminar in November is that it is possible for Analysis Services to think a set is arbitrary-shaped when it really isn’t. Take the following query:

SELECT [Measures].[Internet Sales Amount] ON 0,
[Date].[Calendar Year].MEMBERS ON 1
FROM [Adventure Works]
WHERE
({[Product].[Category].&[1], [Product].[Category].&[3]}
,[Customer].[Country].&[Australia])

This does not have an arbitrary-shaped set in the Where clause, and as a result the second time you run it you’ll see it hit the storage engine cache. However, if you rewrite the query so you have a set of tuples in the Where clause as follows:

SELECT [Measures].[Internet Sales Amount] ON 0,
[Date].[Calendar Year].MEMBERS ON 1
FROM [Adventure Works]
WHERE
({([Customer].[Country].&[Australia], [Product].[Category].&[1]),
([Customer].[Country].&[Australia], [Product].[Category].&[3])})

Even though you might think this query is equivalent to the first one, you’ll see that it does not use the storage engine cache.

What can we do about this then? Not a lot with most client tools; I’ve not checked, but I’d be surprised if any of them generated their MDX to avoid this situation. If your users frequently use certain arbitrary-shaped sets the only thing you could maybe do is hack your dimension data to make them non-arbitrary – but that would almost certainly end up being a bad compromise; otherwise you’d just have to build aggregations to make cold cache queries fast.

However, if you’re using SSRS then of course you can rewrite the MDX yourself. Let’s build a quick report on AdventureWorks that displays this problem:

image

As you can see, I’ve got a multiselect parameter on the slicer that has a default selection of members from two different levels from [Product].[Product Categories] – an arbitrary shaped set. Here’s the MDX that gets generated:

SELECT
NON EMPTY
{ [Measures].[Internet Sales Amount] }
ON COLUMNS,
NON EMPTY
{ ([Date].[Calendar Year].[Calendar Year].ALLMEMBERS ) }
DIMENSION PROPERTIES MEMBER_CAPTION, MEMBER_UNIQUE_NAME
ON ROWS
FROM (
SELECT ( STRTOSET(@ProductProductCategories, CONSTRAINED) )
ON COLUMNS FROM [Adventure Works])
WHERE (
IIF( STRTOSET(@ProductProductCategories, CONSTRAINED).Count = 1,
STRTOSET(@ProductProductCategories, CONSTRAINED),
[Product].[Product Categories].currentmember ) )
CELL PROPERTIES VALUE, BACK_COLOR, FORE_COLOR, FORMATTED_VALUE,
FORMAT_STRING, FONT_NAME, FONT_SIZE, FONT_FLAGS

And here’s how I would rewrite it:

SELECT
{ [Measures].[Internet Sales Amount] }
ON COLUMNS,
NON EMPTY
{ ([Date].[Calendar Year].[Calendar Year].ALLMEMBERS ) }
ON ROWS
FROM
[Adventure Works]
WHERE(
DESCENDANTS(
STRTOSET(@ProductProductCategories, CONSTRAINED),
[Product].[Product Categories].LEVELS(
MAX(STRTOSET(@ProductProductCategories, CONSTRAINED),
[Product].[Product Categories].CURRENTMEMBER.LEVEL.ORDINAL))
)
)
CELL PROPERTIES VALUE

What I’ve done here is:

  • Get rid of the completely useless Non Empty on columns (Why is it there? We don’t even want to remove empty columns – surely doing that would break the report?)
  • Removed the subselect and used a Where clause instead to do the slicing, so if we needed it we could use the formula engine cache (see here for why).
  • Removed any cell or dimension properties that aren’t used by the report (see here for why, although it’s only relevant for really big reports).
  • Used an expression in the Where clause to find the descendants of all members in the parameter set at the level of the lowest member of the set, using the Descendants, Levels and Max functions. This I think will turn all arbitrary-shaped set selections into non-arbitrary-shaped sets.

5 thoughts on “Arbitrary-shaped sets and the Storage Engine cache

Leave a Reply