Storage Engine Caching, Measures and Measure Groups

Chris Webb Analysis Services, Multidimensional November 27, 2012 3 Minutes

I’ve been doing some performance tuning work on SSAS Multidimensional recently that has forced me to look at some behaviour I’ve observed several times but never properly understood: what happens with Storage Engine caching when you are querying multiple measures in the same measure group. Here are some of my findings (thanks, as always, to Akshai and Marius for answering my questions on this) although this post only deals with a few basic scenarios…

Consider the following, quite basic cube built from Adventure Works. It has one measure group and two measures, Sales Amount and Tax Amount, that both have AggregateFunction Sum:

And a single Date dimension with the following attribute relationships:

If I run a Profiler trace, clear the cache and run the following query twice:

SELECT
{[Measures].[Sales Amount]}
ON 0,
[Date].[Year].[Year].MEMBERS
ON 1
FROM
[Measure Caching]

I can see that the first time the query is run it doesn’t hit cache, and the second time the query (in the second red box below) is run it does hit the Storage Engine cache:

This is as you’d expect. However, now look what happens when I run a query that returns the Tax Amount measure – which was not in the original query – without clearing the cache:

SELECT
{[Measures].[Tax Amount]}
ON 0,
[Date].[Year].[Year].MEMBERS
ON 1
FROM
[Measure Caching]

Even though this is the first time I’ve queried for this measure since the cache was cleared, this query still hits the cache. This is because when you query for one measure, the SSAS Storage Engine will retrieve data for all other measures in the same measure group for the granularity of data requested.

This means that the AggregateFunction property of a measure is significant here. If I add a new measure to the cube with AggregateFunction set to Count instead of Sum:

I see the same thing happening, ie queries that request data for Sales Amount or Tax Amount also warm the SE cache with values for Internet Sales Count. This is because a query for Internet Sales Count can be answered with data of the same granularity as a query for Sales Amount. However, if I add a new measure called Last Sales Amount with AggregateFunction Last Non-Empty:

And then clear the cache, and run the two following queries one after the other:

SELECT
{[Measures].[Sales Amount]}
ON 0,
[Date].[Year].[Year].MEMBERS
ON 1
FROM
[Measure Caching]

SELECT
{[Measures].[Last Sales Amount]}
ON 0,
[Date].[Year].[Year].MEMBERS
ON 1
FROM
[Measure Caching]

I can see that the first query does not warm the cache for the second query – both queries go to disk:

Why is this happening? Why isn’t the cache being used? A clue lies in the Query Subcube Verbose event for both queries. For the first query, using Sales Amount, the following granularity of data is being requested:

Dimension 0 [Date] (0 0 0 *) [Date]:0 [Month]:0 [Quarter]:0 [Year]:*

Whereas the second query, using Last Sales Amount, requests this granularity:

Dimension 0 [Date] (* 0 0 *) [Date]:* [Month]:0 [Quarter]:0 [Year]:*

Both queries have Years on rows, but because Last Sales Amount is semi-additive the values returned are actually from the Year and Date granularity. So, when the semi-additive measure is requested in the second query the data needed for it is not in the Storage Engine cache: the first query requested data at the Year granularity only.

From what I understand, the logic governing this behaviour is very complex and the exact query plan that gets generated will depend on the overall design of your cube, the AggregateFunction used for the measures in each measure group (measures with measure expressions are going to work in a similar way to semi-additive measures) and the queries you’re running. However it is useful to be aware of this kind of behaviour when designing and tuning SSAS cubes. For example, it could be that if you have a large number of measures (tens or even a hundred) in the same measure group it could be worth splitting them out into separate measure groups to improve performance, especially if some measures are never queried together – you would need to test this thoroughly first though. This behaviour would also be relevant in cases where you’re designing aggregations manually.

Published by Chris Webb

My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel. View all posts by Chris Webb

Published November 27, 2012

7 thoughts on “Storage Engine Caching, Measures and Measure Groups”

David Shroyer says:

November 28, 2012 at 12:17 am

Interesting. I wonder if that is why there is the “warning” that if you have similar measure groups they should be combined into one? This would provide better query performance because all measures are cached at once.

Loading...

Reply
1. Chris Webb says:
  
  November 28, 2012 at 9:12 am
  
  That’s part of the reason, I think, though as I said in some cases there could be a performance hit for having a large number of measures in one measure group. As always, it depends.
  
  Loading...
  
  Reply
rlharmon says:

November 30, 2012 at 3:22 pm

thanks for the insight.
so, the pattern wold be to put your highly used measures together and/or related measures together and then the less used measures. would you then think that breaking the semi-additive measures apart make sense.

it seems making sure any calculated measures should be grouped together as well?

thanks

Loading...

Reply
1. Chris Webb says:
  
  November 30, 2012 at 3:27 pm
  
  Yes, putting measures that are frequently used together in the same measure group might be a good idea. You’d want to test to be sure though. This doesn’t apply to calculated measures though – they are only ever calculated when they are explicitly requested.
  
  Loading...
  
  Reply
Rajesh says:

December 6, 2012 at 7:34 am

Hi Chriss ,

I am worrying on the SSAS desgin which i recently encountered . Let me explain the scenario clearly

I have 3 tables 1)Prodcuts 2)SKU 3) SkuTargetEnvironments .
The relationShip b/w Products and SkuTargetEnviroments is One -Many Relation in Database.

Prodcuts table has the data for all the Apps related to version of Phones

Productid SkuID ProductName CreatedTime Other Columns
1 10 FreeCell 12-Nov-2012
1 12 FreeCell 13-Nov-2012
1 13 FreeCell 15-Nov-2012
1 14 FreeCell 19-Nov-2012
2 15 OneWay 23-Nov-2012
3 16 AngryBirdsAndEggs 28-Nov-2012

Sku table has all the details related to SKU .It has SKUid ,productID,SkuTargetEnvironmentID and other Attributes

SkuID ProductID SKuTargetEnvironmentID Other Columns
10 1 50
12 1 51
13 1 52
14 1 53
15 2 54
16 3 55

SkuTargetEnviromentTable has the actual details of the Phone App.

SkuTargetEnvironmentID LocaleName Platform ResolutioName Other Columns
50 English Android WXVGA
51 Spanish Andriod HD
52 Espanol Andriod HD-WXVGA
53 French AndriodS HD
54 English OpenSource HD
55 Spanish Mango WXVGA

If you see tha above data , Products and SKU has One-Many relation and SKU to SKUTargetEnviromentTable as One- Many Relation .
Now my goal is to get the product Count for a Platform or Resolution .User may select one or more values from any attributes in SKUTargetEnvironment Table.
I created a DSV named query which as the all the ProductIds, SKUIds and SkutargetEnviromentIDs and Named that as “FactProductSKUs”.This Fact data has the following data

ProductID SKUIDs SKuTargetEnvironmentID
1 10 50
1 12 51
1 13 52
1 14 53
2 15 54
3 16 55

I made 2 dimensions one Sku and other for SKUTargetEnvironment Tables and related them to the “FactProductSKUs”.
Now To get count of products for a Platform or Locale , i created a DistinctCount Measure Group measure on “FactProductSKUs” and i am able to get desired output upon filtering

with Locale or Platform.

Now my question , Do you call the relationship b/w Prodcuts and SKUTargetEnvironment is Many-Many Relation from Cube Prespective . If So , i am able to get the count of

products without implementing Many-Many Relation in Cube ?. If not , Can you tell me how to identify a Many-Many relation in Cube basing on the above example ?

Can you please tell me a solution for my problem.

Loading...

Reply
1. Chris Webb says:
  
  December 10, 2012 at 3:50 pm
  
  Hi Rajesh,
  
  If you only have three tables, then you don’t have a many-to-many relationship in SSAS. You just have two dimensions, both of which have regular relationships with your measure group. In fact you don’t have enough tables for a many-to-many relationship: you would need at least four tables, because a m2m relationship needs a dimension connected to a fact table connected to another dimension connected to another fact table. Have you read this paper on m2m? http://www.sqlbi.com/articles/many2many/ It should explain how m2m relationships work.
  
  Chris
  
  Loading...
  
  Reply
Vikram says:

November 27, 2018 at 8:57 am

Hi Chris,

I would like know how SSAS clears these cache when a particular partition gets processed. Does it invalidate the cache at a measure group scope or just at a partition scope.

Loading...

Reply