Breaking up large dimensions

Chris Webb Analysis Services June 27, 2006June 1, 2010 2 Minutes

One clever trick I learned on the newsgroup a few years ago was from someone called dxd, who wrote it up in this post and others in the same thread:

http://groups.google.co.uk/group/microsoft.public.sqlserver.olap/browse_frm/thread/d9fc3b8f5bf297a6/f11d54495319e57a?q=dxd+large+dimension&rnum=1#f11d54495319e57a

It describes how to break up a single, large dimension that you need in the cube but which users don’t themselves want to view most of the time (typically this would be a degenerate/fact dimension). In the AS2K world this was useful for getting multi-select to work with distinct count calculations; in AS2005, of course, distinct counts already work with multi-select but I recently found a new application for this technique which I thought I’d share.

I was doing a PoC in a scenario which was similar to the following: imagine a data warehouse which contains data recording purchases in a supermarket with two fact tables. The first fact table contains data on the whole transaction, with a transaction id as the primary key and other dimensions like Customer and Store and a measure recording the value of the whole transaction; the second contains each purchase in the transaction, has all the same dimensions as the first fact table but also includes a Product dimension. The users wanted to run queries like ‘Show me the total value of all transactions which contain Product X’, so it was clearly a distinct sum problem and needed a many-to-many relationship between the Product dimension and the first fact table with the second fact table as the intermediate measure group.

Unfortunately, the only way to be sure of this working properly was to link the two fact tables together using the transaction id – but there were hundreds of millions of transactions, so building a MOLAP dimension was out of the question and I wasn’t sure that a ROLAP dimension would perform well enough. Then I remembered the approach in the newsgroup post above and realised that I could break up the transaction dimension into three identical dimensions of 999 members each. It’s quite easy to visualise how this works. Imagine you have a transaction with the following id:

123456789

You could express this as three different dimensions with keys of 123, 456 and 789. And of course since each of these three dimensions was identical, I only needed to build it once and could use role-playing dimensions for the other two. I added them to the cube and made them invisible, added them to both the fact tables and bingo – I had the dimensions I needed to make the many-to-many relationship work.

Performance resolving the many-to-many relationship seemed very good when I looked at the queries I ran in Profiler. Unfortunately I ran into the problem that Mark Hill talks about here:

http://markhill.org/blog/?p=8

…and overall performance of the cube wasn’t great (I assumed I’d messed up my partition definitions), but if I had used a ROLAP transaction dimension instead I’m pretty sure that the cube would have been unusable.

Thinking some more about other applications, I wonder if this could be used to work around the problems that are becoming evident with drillthrough in AS2005? See

http://www.mrachek.com/PermaLink,guid,7cb06281-9394-438b-b2aa-8996106ee481.aspx

and

http://www.mrachek.com/PermaLink,guid,ea29b58b-2797-43cf-85f6-94108ca46c06.aspx

I think this deserves some further investigation…

Published by Chris Webb

My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel. View all posts by Chris Webb

Published June 27, 2006June 1, 2010

21 thoughts on “Breaking up large dimensions”

Mark Garner says:

June 27, 2006 at 4:47 pm

Chris,
This reminded me of something else I read about role-playing dimesnions…The project Real documentation also specifies that role-playing dimension aren\’t materialized or aggreaged. Do you know this to be true?
Mark
http://mgarner.wordpress.com

Loading...

Reply
Chris says:

June 28, 2006 at 10:08 am

Your comment was the first I heard of this, but you\’re right – the Project REAL docs do mention this. It sounds like I would have got better performance building three identical physical dimensions from the same table in the data source view. Which begs the question – what\’s the point of having role-playing dimensions then?

Loading...

Reply
Rakesh says:

July 4, 2006 at 10:10 am

Hi Chris,
How are you doing? I have some question regarding very large single dimension. I have splitted the dimension as per the idea given at http://groups.google.co.uk/group/microsoft.public.sqlserver.olap/browse_frm/thread/d9fc3b8f5bf297a6/f11d54495319e57a?q=dxd+large+dimension&rnum=1#f11d54495319e57a
It workes fine if there is a single transaction for which we want to get the result or for the complete list of transactions. How can we we get the results for set of transactions or for range of transactions?
Please do let me know if you need any further Information in this regard.
Any suggestions is most welcome.

Thanks!
Rakesh

Loading...

Reply
Harsh says:

July 12, 2006 at 8:46 pm

I have a feeling that the right way to do this would be with the ROLAP dimension for the transaction id. The performance is definitely unforseen – but would be worth trying out… I think we can keep the rest of the cube and dimensions MOLAP – and this particular dimension ROLAP. What do you say?

Loading...

Reply
Chris says:

July 13, 2006 at 7:10 am

My gut feeling is that ROLAP would perform worse – but I agree, it would be worth testing it out.

Loading...

Reply
Andy says:

September 14, 2006 at 8:04 am

I\’m having problems getting role-palying dimensions to work. I have a time dimension with years from 2000 to 2050. But the InvoiceDate is only between 2000 and 2006, ShipmentDate is betwen 2005 and 2007. Yet the Invoice Date and ShipmentDate Roleplaying Dimensions shows all years 2000 – 2050. In AS2000 i had them as separate Dimensions and they worked perfect. It seems to me that the role-palying dimensions are not really smart enough to do \’inner\’ joins, instead they only do \’outer\’ joins and show ALL members in the base dimension. Is this true, ?

Loading...

Reply
Chris says:

September 14, 2006 at 10:16 pm

Yes, role-playing dimensions will show all members in the base dimension. If you only want to display certain dates you\’re probably better off building separate dimensions.

Loading...

Reply
Patrizia says:

January 20, 2008 at 7:58 pm

thanks for the info.
Liu Jo

Loading...

Reply
saru says:

September 5, 2014 at 10:57 am

Hi,

I have a query related to MDX, One of my dimension has large data when i use that dimension in MDX its taking long time which causes my report rendering slow….

Can you please suggest some idea ?

Loading...

Reply
1. saru says:
  
  September 5, 2014 at 11:14 am
  
  give some idea please
  
  Loading...
  
  Reply
  1. Chris Webb says:
    
    September 5, 2014 at 11:28 am
    
    A large dimension on its own won’t necessarily cause performance problems. There’s no way of telling you what’s causing your problem without a lot more information, unfortunately. I suggest you read the performance tuning chapters from a book I co-wrote called “Expert Cube Development”, which you can get for free here:
    http://cwebbbi.wordpress.com/2010/03/23/query-performance-tuning-chapter-from-%E2%80%9Cexpert-cube-development%E2%80%9D-available-online/
    This will tell you what the real problem is.
    
    Loading...
Yeti says:

November 3, 2014 at 2:42 pm

Hi Chris! Your article is a bit old now. I really like the technique you described here. Would you still recommend to use it?

Loading...

Reply
1. Chris Webb says:
  
  November 4, 2014 at 12:58 am
  
  Yes, I think this technique is still useful, but only for very large dimensions of over a million rows.
  
  Loading...
  
  Reply
Aleksandr says:

November 18, 2014 at 7:14 pm

Hi, Chris! What would you advise for large/huge (10 millons members and >) dimensions in SSAS MD , Tabular? I would appreciate any links to the best practices!

Loading...

Reply
1. Chris Webb says:
  
  November 19, 2014 at 9:35 am
  
  I don’t think there are any general best practices. Very large dimensions can be a problem for many different reasons in SSAS MD and Tabular, so a lot will depend on what you want to do exactly.
  
  Loading...
  
  Reply
  1. Aleksandr says:
    
    November 19, 2014 at 10:43 am
    
    Abc/RFI- analysis customers/products for example. Both of these dimensions contain several millions members.
    
    Loading...
  2. Chris Webb says:
    
    November 19, 2014 at 10:53 am
    
    But is the problem query performance? Many-to-many relationships? Some kind of calculation?
    
    Loading...
Aleksandr says:

November 19, 2014 at 11:38 am

I try to avoid M2M 🙂 I do not use them in this case. Really very slow to perform: mdx dynamic sets & DAX (With that tabular model was slower than in MDХ) calculations (ABC by profit union with ABC by qty equal to 9 combinations of which make 3 groups: best, medium, low. And these 3 groups to be used as the measurement). I’m thinking of doing calculations in DWH. Incidentally, the same calculation is done in a Qlik, they were faster.

Loading...

Reply
Aleksandr says:

November 19, 2014 at 11:40 am

And these 3 groups to be used as the dimension
>> sorry

Loading...

Reply
1. Chris Webb says:
  
  November 20, 2014 at 10:56 am
  
  Unfortunately large dimensions are almost always slow in MDX – it will depend on how you write the MDX of course, but I know from experience that there is a limit to how well operations over sets containing millions of members will perform. I’m surprised that DAX is even slower than MDX though. If you can do the calculation in the DWH that’s probably the best option.
  
  Loading...
  
  Reply
Pingback: Optimising SSAS Many-To-Many Relationships By Adding Redundant Dimensions | Chris Webb's BI Blog