Optimising SSAS Many-To-Many Relationships By Adding Redundant Dimensions

The most elegant way of modelling your SSAS cube doesn’t always give you the best query performance. Here’s a trick I used recently to improve the performance of a many-to-many relationship going through a large fact dimension and large intermediate measure group…

Consider the following cube, built from the Adventure Works DW database and showing a many-to-many relationship:

image

The Fact Internet Sales measure group contains sales data; the Product, Date and Customer dimensions are what you would expect; Sales Order is a fact dimension with one member for each sales transaction and therefore one member for each row in the fact table that Fact Internet Sales is built from. Each Sales Order can be associated with zero to many Sales Reasons, and the Sales Reason dimension has a many-to-many relationship with the Fact Internet Sales measure group through the Fact Internet Sales Reason measure group. Only the Sales Order dimension connects directly to both the Fact Internet Sales Reason and Fact Internet Sales measure groups.

There’s nothing obviously wrong with the way this is modelled – it works and returns the correct figures – and the following query shows how the presence of the many-to-many relationship means you can see the Sales Amount measure (from the Fact Internet Sales measure group) broken down by Sales Reason:

[sourcecode language=”text” padlinenumbers=”true”]
select
{[Measures].[Sales Amount]} on 0,
non empty
[Sales Reason].[Sales Reason].[Sales Reason].members
on 1
from m2m1
where([Date].[Calendar Year].&[2003],
[Product].[Product Category].&[3],
[Customer].[Country].&[United Kingdom])
[/sourcecode]

 

image

However, to understand how we can improve the performance of a many-to-many relationship you have to understand how SSAS resolves the query internally. At a very basic level, in this query, SSAS starts with all of the Sales Reasons and then, for each one, finds the list of Sales Orders associated with it by querying the Fact Sales Reason measure group. Once it has the list of Sales Orders for each Sales Reason, it queries the Fact Internet Sales measure group (which is also filtered by the Year 2003, the Product Category Clothing and the Customer Country UK) and sums up the value of Sales Amount for those Sales Orders, getting a single value for each Sales Reason. A Profiler trace shows this very clearly:

image

The Resource Usage event gives the following statistics for this query:

READS, 7

READ_KB, 411

WRITES, 0

WRITE_KB, 0

CPU_TIME_MS, 15

ROWS_SCANNED, 87299

ROWS_RETURNED, 129466

Given that the Sales Order dimension is a large one (in this case around 60000 members – and large fact dimensions are quite common with many-to-many relationships) it’s likely that one Sales Reason will be associated with thousands of Sales Orders, and therefore SSAS will have to do a lot of work to resolve the relationship.

In this case, the optimisation comes with the realisation that in this case we can add the other dimensions present in the cube to the Fact Sales Reason measure group to try to reduce the number of Sales Orders that each Sales Reason is resolved to. Since Sales Order is a fact dimension, with one member for each sales transaction, then since each sales transaction also has a Date, a Product and a Customer associated with it we can add the keys for these dimensions to the fact table on which Fact Sales Reasons is built and join these dimensions to it directly:

image

This is not an assumption you can make for all many-to-many relationships, for sure, but it’s certainly true for a significant proportion.

The Product, Date and Customer dimensions don’t need to be present for the many-to-many relationship to work, but adding a Regular relationship between them and Fact Internet Sales Reason helps SSAS speed up the resolution of the many-to-many relationship when they are used in a query. This is because in the original design, in the test query the selection of a single member on Sales Reason becomes a selection on all of the Sales Orders that have ever been associated with that Sales Reason; with the new design, the selection of a single member on Sales Reason becomes a selection on a combination of Dates, Customers, Products and Sales Orders – and since the query itself is also applying a slice on Date, Customer and Product, this is a much smaller selection than before. For the query shown above, with the new design, the Resource Usage event now shows:

READS, 11

READ_KB, 394

WRITES, 0

WRITE_KB, 0

CPU_TIME_MS, 0

ROWS_SCANNED, 47872

ROWS_RETURNED, 1418

The much lower numbers for ROWS_SCANNED and ROWS_RETURNED shows that the Storage Engine is doing a lot less work. For the amount of data in Adventure Works the difference in query performance is negligible, but in the real world I’ve seen this optimisation make a massive difference to performance, resulting in queries running up to 15 times faster.

Don’t forget that there are many other ways of optimising many-to-many relationships such as the those described in this white paper. Also, if you have a large fact dimension, if it does not need to be visible to the end user and is only needed to make the many-to-many relationship work, you can reduce the overhead of processing it by breaking it up into multiple smaller dimensions as described here.

9 thoughts on “Optimising SSAS Many-To-Many Relationships By Adding Redundant Dimensions

  1. Great article Chris.

    I’m wondering, with a similar m2m design, if only part of the sales have a sales reason, is there a better way to make an Unknown/”No reason” member visible for the sales reason dimension than by loading all the sales data to the fact sales reason even though a majority of the data will not have a sales reason. I actually thought that SSAS would figure it out automatically. If I load fact sales reason with just sales that have a sales reason the reason subtotals are there and the grand total is correct, it´s only missing the variance as the unknown member.

    Thanks,
    Andri

  2. Hi Chris,

    This post is great. I have a question on the same topic but with a different design.

    Scenario:
    Dimension Usage:
    Two Fact Tables F1 and F2. 5 Dimension Tables – D1 to D5
    F1 is in a regular relationship with D1 and D2.
    F2 is in a regular relationship with D3, D4 and D5.
    F1 also needs to be sliced by D3. F1 is related to D3 by a many-to-many relationship with F2.
    F1 F2
    D1 D1F1
    D2 D2F1
    D3 F2 (M-M) D3F2
    D4 D4F2
    D5 D5F2

    Problem:
    Measures – F1.M1 and F2.M2 are placed side by side and only D3 is chosen for slicing the cube is fast.
    If D1, D2 and D4, D5 are chosen individually or together with F1.M1 and F2.M2, the cube gets stuck.
    Is this the reason – F1 is not related to D4 and D5. F2 is not related to D1 and D2?
    But users will put D1, D2, D4 and D5 alongside as well (even when dimensions are not related to fact).

    Possible solution:
    Create many-to-many relationships on F1 for D4 and D5 using F2.
    Create many-to-many relationships on F2 for D1 and D2 using F1.
    Is this feasible and a good use of dimension usage performance wise.
    F1 F2
    D1 D1F1 F1 (M-M)
    D2 D2F1 F1 (M-M)
    D3 F2 (M-M) D3F2
    D4 F2 (M-M) D4F2
    D5 F2 (M-M) D5F2

    Uday

  3. HI Chris,

    I have design question.
    Scenario:
    Dimension Usage:
    Two Fact Tables F1 and F2. 6 Dimension Tables – D1 to D5
    F1 is in a regular relationship with D1 and D2.
    F2 is in a regular relationship with D3, D4 and D5.
    F1 also needs to be sliced by D3. F1 is related to D3 by a many-to-many relationship with F2.
    F1 F2
    D1 D1F1
    D2 D2F1
    D3 F2 (M-M) D3F2
    D4 D4F2
    D5 D5F2

    Problem:
    Measures – F1.M1 and F2.M2 are placed side by side and only D3 is chosen for slicing the cube is fast.
    If D1, D2 and D4, D5 are chosen individually or together with F1.M1 and F2.M2, the cube gets stuck.
    Is this the reason – F1 is not related to D4 and D5. F2 is not related to D1 and D2?
    But users will put D1, D2, D4 and D5 alongside as well (even when dimensions are not related to fact).
    In the current design, doing this, the cube gets stuck due to missing relationships
    Possible solution:
    Create many-to-many relationships on F1 for D4 and D5 using F2.
    Create many-to-many relationships on F2 for D1 and D2 using F1.
    Is this feasible and a good use of dimension usage performance wise.
    F1 F2
    D1 D1F1 F1 (M-M)
    D2 D2F1 F1 (M-M)
    D3 F2 (M-M) D3F2
    D4 F2 (M-M) D4F2
    D5 F2 (M-M) D5F2

    Uday

    1. Hi Uday, what do you mean by “stuck” exactly? Are the queries slow because of the duplicated values that result when you put a dimension next to a measure where there is no relationship? If so, changing the IgnoreUnrelatedDimensions property on the measure groups might help here. Otherwise, adding the extra many-to-many relationships might help, but I would want to know why the queries are slow before suggesting any changes.

      1. Hi Chris,

        Stuck means when cube is browsed through excel and values are dropped. The excel gets stuck. I am not sure why. As of now I have not set the IgnoreUnrelated Dimensions property. I am unable to figure it out the reason. Of course a meausre is being sliced by a dimension when there is no relationship. Could you throw some light on it.

        Also, just by chance, I happened to stumble upon the creation of extra many-to-many relationships for unrelated dimensions and slice the measures by these dimensions. The cube pivot in excel works fast.

        But is this a good design having so many M-M relationships

        Uday

      2. I still don’t understand what you mean by ‘stuck’, I’m afraid. Are you saying that values disappear? If so, it sounds like there is something else wrong with your cube.

Leave a Reply