Storage Engine Caching, Measures and Measure Groups

I’ve been doing some performance tuning work on SSAS Multidimensional recently that has forced me to look at some behaviour I’ve observed several times but never properly understood: what happens with Storage Engine caching when you are querying multiple measures in the same measure group. Here are some of my findings (thanks, as always, to Akshai and Marius for answering my questions on this) although this post only deals with a few basic scenarios…

Consider the following, quite basic cube built from Adventure Works. It has one measure group and two measures, Sales Amount and Tax Amount, that both have AggregateFunction Sum:

image

And a single Date dimension with the following attribute relationships:

image

If I run a Profiler trace, clear the cache and run the following query twice:

SELECT
{[Measures].[Sales Amount]}
ON 0,
[Date].[Year].[Year].MEMBERS
ON 1
FROM
[Measure Caching]

I can see that the first time the query is run it doesn’t hit cache, and the second time the query (in the second red box below) is run it does hit the Storage Engine cache:

image

This is as you’d expect. However, now look what happens when I run a query that returns the Tax Amount measure – which was not in the original query – without clearing the cache:

SELECT
{[Measures].[Tax Amount]}
ON 0,
[Date].[Year].[Year].MEMBERS
ON 1
FROM
[Measure Caching]

image

Even though this is the first time I’ve queried for this measure since the cache was cleared, this query still hits the cache. This is because when you query for one measure, the SSAS Storage Engine will retrieve data for all other measures in the same measure group for the granularity of data requested.

This means that the AggregateFunction property of a measure is significant here. If I add a new measure to the cube with AggregateFunction set to Count instead of Sum:

image

I see the same thing happening, ie queries that request data for Sales Amount or Tax Amount also warm the SE cache with values for Internet Sales Count. This is because a query for Internet Sales Count can be answered with data of the same granularity as a query for Sales Amount. However, if I add a new measure called Last Sales Amount with AggregateFunction Last Non-Empty:

image

And then clear the cache, and run the two following queries one after the other:

SELECT
{[Measures].[Sales Amount]}
ON 0,
[Date].[Year].[Year].MEMBERS
ON 1
FROM
[Measure Caching]

SELECT
{[Measures].[Last Sales Amount]}
ON 0,
[Date].[Year].[Year].MEMBERS
ON 1
FROM
[Measure Caching]

I can see that the first query does not warm the cache for the second query – both queries go to disk:

image

Why is this happening? Why isn’t the cache being used? A clue lies in the Query Subcube Verbose event for both queries. For the first query, using Sales Amount, the following granularity of data is being requested:

Dimension 0 [Date] (0 0 0 *)  [Date]:0  [Month]:0  [Quarter]:0  [Year]:*

Whereas the second query, using Last Sales Amount, requests this granularity:

Dimension 0 [Date] (* 0 0 *)  [Date]:*  [Month]:0  [Quarter]:0  [Year]:*

Both queries have Years on rows, but because Last Sales Amount is semi-additive the values returned are actually from the Year and Date granularity. So, when the semi-additive measure is requested in the second query the data needed for it is not in the Storage Engine cache: the first query requested data at the Year granularity only.

From what I understand, the logic governing this behaviour is very complex and the exact query plan that gets generated will depend on the overall design of your cube, the AggregateFunction used for the measures in each measure group (measures with measure expressions are going to work in a similar way to semi-additive measures) and the queries you’re running. However it is useful to be aware of this kind of behaviour when designing and tuning SSAS cubes. For example, it could be that if you have a large number of measures (tens or even a hundred) in the same measure group it could be worth splitting them out into separate measure groups to improve performance, especially if some measures are never queried together – you would need to test this thoroughly first though. This behaviour would also be relevant in cases where you’re designing aggregations manually.

7 thoughts on “Storage Engine Caching, Measures and Measure Groups

  1. Interesting. I wonder if that is why there is the “warning” that if you have similar measure groups they should be combined into one? This would provide better query performance because all measures are cached at once.

    1. That’s part of the reason, I think, though as I said in some cases there could be a performance hit for having a large number of measures in one measure group. As always, it depends.

  2. thanks for the insight.
    so, the pattern wold be to put your highly used measures together and/or related measures together and then the less used measures. would you then think that breaking the semi-additive measures apart make sense.

    it seems making sure any calculated measures should be grouped together as well?

    thanks

    1. Yes, putting measures that are frequently used together in the same measure group might be a good idea. You’d want to test to be sure though. This doesn’t apply to calculated measures though – they are only ever calculated when they are explicitly requested.

  3. Hi Chriss ,

    I am worrying on the SSAS desgin which i recently encountered . Let me explain the scenario clearly

    I have 3 tables 1)Prodcuts 2)SKU 3) SkuTargetEnvironments .
    The relationShip b/w Products and SkuTargetEnviroments is One -Many Relation in Database.

    Prodcuts table has the data for all the Apps related to version of Phones

    Productid SkuID ProductName CreatedTime Other Columns
    1 10 FreeCell 12-Nov-2012
    1 12 FreeCell 13-Nov-2012
    1 13 FreeCell 15-Nov-2012
    1 14 FreeCell 19-Nov-2012
    2 15 OneWay 23-Nov-2012
    3 16 AngryBirdsAndEggs 28-Nov-2012

    Sku table has all the details related to SKU .It has SKUid ,productID,SkuTargetEnvironmentID and other Attributes

    SkuID ProductID SKuTargetEnvironmentID Other Columns
    10 1 50
    12 1 51
    13 1 52
    14 1 53
    15 2 54
    16 3 55

    SkuTargetEnviromentTable has the actual details of the Phone App.

    SkuTargetEnvironmentID LocaleName Platform ResolutioName Other Columns
    50 English Android WXVGA
    51 Spanish Andriod HD
    52 Espanol Andriod HD-WXVGA
    53 French AndriodS HD
    54 English OpenSource HD
    55 Spanish Mango WXVGA

    If you see tha above data , Products and SKU has One-Many relation and SKU to SKUTargetEnviromentTable as One- Many Relation .
    Now my goal is to get the product Count for a Platform or Resolution .User may select one or more values from any attributes in SKUTargetEnvironment Table.
    I created a DSV named query which as the all the ProductIds, SKUIds and SkutargetEnviromentIDs and Named that as “FactProductSKUs”.This Fact data has the following data

    ProductID SKUIDs SKuTargetEnvironmentID
    1 10 50
    1 12 51
    1 13 52
    1 14 53
    2 15 54
    3 16 55

    I made 2 dimensions one Sku and other for SKUTargetEnvironment Tables and related them to the “FactProductSKUs”.
    Now To get count of products for a Platform or Locale , i created a DistinctCount Measure Group measure on “FactProductSKUs” and i am able to get desired output upon filtering

    with Locale or Platform.

    Now my question , Do you call the relationship b/w Prodcuts and SKUTargetEnvironment is Many-Many Relation from Cube Prespective . If So , i am able to get the count of

    products without implementing Many-Many Relation in Cube ?. If not , Can you tell me how to identify a Many-Many relation in Cube basing on the above example ?

    Can you please tell me a solution for my problem.

    1. Hi Rajesh,

      If you only have three tables, then you don’t have a many-to-many relationship in SSAS. You just have two dimensions, both of which have regular relationships with your measure group. In fact you don’t have enough tables for a many-to-many relationship: you would need at least four tables, because a m2m relationship needs a dimension connected to a fact table connected to another dimension connected to another fact table. Have you read this paper on m2m? http://www.sqlbi.com/articles/many2many/ It should explain how m2m relationships work.

      Chris

  4. Hi Chris,

    I would like know how SSAS clears these cache when a particular partition gets processed. Does it invalidate the cache at a measure group scope or just at a partition scope.

Leave a Reply to Chris WebbCancel reply