It’s a fairly common occurrence to have several attributes in a dimension describing the same thing: products can have long and short descriptions, for example, or you might want to display dates in different formats, or show the full names of states or just their abbreviations. AS doesn’t give you any functionality to specifically handle this – something I’ve heard from people who’ve used other OLAP tools is that it would be great to set up ‘aliases’ for attributes, rather like how you can set up translations – so you just have to design it into your dimension. One thing I’ve noticed, however, is that when you’re designing attribute relationships to cater for this scenario it’s tempting to do so in a particular way that leads to problems later on; at least I’ve done this in the past and seen several other people do the same thing, so I thought it was worth blogging about.
Here’s an example Time dimension based on the Adventure Works relational data:
We’ve got a simple Year-Quarter-Month-Date structure, but we want to see the same structure once with a short description (eg ‘2004’ for the year) and once with a long description (eg ‘Calendar Year 2004’). There are two natural user hierarchies that reflect these two choices, but notice the way the attribute relationships are designed: they take no account of the fact that there’s a one-to-one relationship between the two sets of attributes. If you use BIDS Helper’s excellent ‘Visualise Attribute Lattice’ functionality you can see the relationships make a fork shape:
What’s wrong with this? Well, rather than answer that question, let’s look at my preferred alternative:
Here’s the accompanying lattice:
You can see that instead of a fork shape we’ve now got a long chain, and the one-to-one relationships I mentioned earlier are now in place; remember that this doesn’t affect the way the dimension appears to the end user at all. Incidentally, you might wonder like I did if the Cardinality property of a relationship should be changed from ‘Many’ to ‘One’ for one-to-one relationships: it’s probably worth doing but according to BOL it doesn’t have any effect at the moment:
http://msdn2.microsoft.com/en-us/library/ms176124.aspx
The advantages of this design are as follows:
- More efficient autoexists. In the fork approach, if you crossjoin Year with Month Long Desc AS will have to resolve that relationship through the Date attribute; in the chain approach because there’s a direct relationship between the two working out which members on Month Long Desc exist with Year is going to be faster. You’ll probably only notice this on fairly large dimensions though. UPDATE – see Mosha’s comment below, this isn’t correct.
- In general, I reckon that the aggregation design wizard will produce better designs with the chain approach. Again I’m not sure how much difference there will really be, but when I ran the wizard on a test cube containing just my fork dimension it came up with three aggregations at Month/Quarter Long Desc; Month Long Desc/Quarter; and Year. When I did the same thing on a test cube containing just my chain dimension I got four aggregations at Month; Quarter; Year; and All. In the latter case, of course, the Calendar Long Desc hierarchy can benefit from aggregations built on the non-Long Desc attributes, and overall the aggregation coverage seems better. Results, however, may well vary in the real world…
- Scoping calculations becomes much, much easier with the chain approach rather than the fork approach. Imagine that you want to implement something like a Total-to-Date calculation, which sums up a measure from the beginning of Time to the currentmember on Time. Naturally you want it to work on both your user hierarchies, but you’ll find that with the fork approach it becomes fiendishly difficult to get your scoping correct – it’s a lot harder than you think, and in fact I’ve not come up with a satisfactory solution. With the chain approach however there’s a really elegant solution: because of strong hierarchies, every member you select on the Calendar hierarchy automatically forces the currentmember on the Calendar Long Desc hierarchy to the equivalent: so, selecting ‘2004’ on Calendar changes the currentmember on Calendar Long Desc to ‘Calendar Year 2004’. This means you can scope all your calculations on the Calendar Long Desc user hierarchy and they’ll automatically work on the Calendar hierarchy; for example, here’s how to do the TTD calc:
Create Member CurrentCube.Measures.TTD as null;
Scope(Measures.TTD);
Scope([ForkTime].[Calendar].Members, [ForkTime].[Calendar Long Desc].members);
This=sum({null:[ForkTime].[Calendar].currentmember}, [Measures].[Sales Amount]);
End Scope;
End Scope;
(Thanks are due to Richard Tkachuk for answering some emails on these issues)
> More efficient autoexists. In the fork approach, if you crossjoin Year with Month Long Desc AS will have to resolve that relationship through the Date attribute; in the chain approach because there\’s a direct relationship between the two working out which members on Month Long Desc exist with Year is going to be faster. You\’ll probably only notice this on fairly large dimensions though.
Chris – are you sure about this ? I think what you say here is only right, if there is a user hierarchy defined which goes through all 6 attributes. It should be very easy to check – just do the autoexist you described, and look for the Query Dimension Subcube event in the profiler – by looking at the granularity it should become evident what grain did SE go to.
> In general, I reckon that the aggregation design wizard will produce better designs with the chain approach.
I reckon that aggregation design is a gamble 🙂 Perhaps what\’s more important here is that in the chain approach the SE datacaches will work better than for forked approach, thanks to the strong hierarchies decoding, but to take advantage of it again you will have to create user hierarchy going through all 6 attributes.
Lastly your example with Total-To-Date is somewhat oversimplified since you didn\’t include any other interesting attributes. In Time dimension it is impossible to avoid some forks. The approach that I always recommend for Total-To-Time is to run the SUM over the lowest attribute of the dimension (measure group granularity really, but usually it is the same as key) and create measure as semiadditive LastChild – this will give the best performance and correct answers.
Chris,
this is a common issue but as Mosha says, it\’s very uncommon having only a hierarchy that can be mapped on the other one. I often have users that want a hierarchy "by week" and another one "by month". In these cases, you cannot solve the issue of having potential crossing aggregations between those two hierarchies, even if in the reality nobody will use them.
Months ago I suggested to add a property that give some hint to the aggregation designer: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=277428
The need for that feature came in a cube with two parent-child hierarchies that are never used together – I\’d like to tell this to the aggregation designer, but today I\’m unable to do it.
Such a feature would reduce the space of possible aggregations that aggregations designer has to consider… and this would produce more efficient aggregations, I think.
Marco Russo
http://www.sqlbi.eu
http://sqlblog.com/blogs/marco_russo
Mosha, you\’re right about the autoexists. I think I misinterpreted something Richard said.
I think using a Time dimension and a TTD as an example is a bit misleading. I agree forks are almost always unavoidable, but what I wanted to show was that it was actually possible to scope a calculation across both hierarchies and have it work with the chain example, whereas with the fork example it\’s really nasty to even get the scoping correct.