Many-to-Many Relationships and Partition Slices

As you’re probably aware one of the benefits of partitioning a measure group is that it can improve query performance: for example, if you have partitioned your measure group by year and you run a query that only requests data for a particular year, then SSAS should only query the partition that holds the data for the year you’ve requested. There are however several scenarios where SSAS will scan partitions you would not expect it to scan – the ones I knew about up to now are all listed in the “Unexpected Partition Scans” section in this excerpt from “Expert Cube Development” – and I’ve just come across another scenario where this happens.

I was working with a customer the other week that had implemented Dave Fackler’s approach to handling multiple calendars using many-to-many relationships, as detailed in these two blog posts:
http://davefackler.blogspot.com/2008/05/handling-multiple-calendars-with-m2m.html
http://davefackler.blogspot.com/2008/06/handling-multiple-calendars-with-m2m.html

Now there’s absolutely nothing wrong with the design that Dave describes here – it’s a classic application for many-to-many relationships – and there aren’t any better ways of meeting this requirement. However, what I noticed when looking in Profiler at the customer’s cube was that all of the queries I was writing were resulting in reads on all of the partitions in the main measure group, even when I expected only one partition should be hit, and this was slowing the queries down a lot.

You can recreate the same problem in the version of the Adventure Works that Dave posts on his blog. Take the following query that uses the regular Adventure Works Date dimension and does not use the m2m relationship:

select {[Measures].[Reseller Sales Amount]} on 0,
{[Date].[Calendar].[Month].&[2004]&[3]}
on 1
from [Adventure Works]

When run on a cold cache, in Profiler you can see that it only results in the Reseller Sales 2004 partition being hit in the Reseller Sales measure group:

image

This is exactly as you’d expect. However if you request the same month (March 2004) via the Calendar dimension, which is connected to the Reseller Sales measure group via a m2m relationship and the Date dimension, you see all the partitions on Reseller Sales are hit:

select {[Measures].[Reseller Sales Amount]} on 0,
{[Calendar].[Year-Quarter-Month].[Month].&[2000200409]}
on 1
from [Adventure Works]

image

Not so good. But it turns out that this is an unavoidable side-effect of using many-to-many relationships: a filter on a many-to-many dimension (in this case the Calendar dimension) does not result in a filter being applied on the intermediate dimension (the Date dimension) to the measure group (Reseller Sales). Greg Galloway came across this some time ago and filed an issue on Connect which confirms that this is by design.

Given that I doubt this behaviour will be changed any time soon, the takeaway is that when you’re planning your partitioning strategy you should think twice about partitioning using a dimension that is used as an intermediate dimension in a many-to-many relationship. Otherwise you will lose all of the performance benefits of partitioning when your queries use that many-to-many relationship…

9 thoughts on “Many-to-Many Relationships and Partition Slices

  1. Did you consider denormalizing DimCalendar (many-to-many) dimension? it will definately not give as much performance improvement as partition filtering but could have been some relief.

    1. Yes that is one option to avoid this problem, but in order to do this you’d need to add new columns to your date dimension table every time you wanted to add a new hierarchy, which isn’t great.

      In any case, the real point of the post was to show the behaviour of m2m relationships with partitions so I didn’t discuss workarounds for this specific scenario.

  2. As the customer concerned – thanks for spotting this before we went any further with it.

    I can confirm that we reverted to a single Date Dim with many columns. We only foresee using 3 or 4 calendars max so not too traumatic. We are making use of DisplayFolder both as a visual tidy and a grouping ID for our applications but attribute names are still untidy, eg. YrWk – Cal 1, YrWk – Cal 2. Partitioning by some other dimension does not tie in with the ETL in our warehouse.

  3. This is an old post, but thought I would ask anyway. I’m seeing the opposite occurring with SSAS 2012. If the intermediate table is not partitioned and a full process is completed, the target fact will only return values in the partitions that were fully processed. It’s almost like the many to many relationship is not able to see partitions of the target table unless they are processed after the intermediate tables has been processed.

Leave a Reply to HarveyCancel reply