One Cube vs Multiple Cubes

One of the questions discussed in the book that Marco, Alberto and I wrote last year, “Expert Cube Development with SSAS 2008” (available in all good bookshops, folks!) was whether, if you have multiple fact tables, you should create one big cube with multiple measure groups or multiple cubes each with a single measure group. While I still stand by what we wrote then, I recently took part in an interesting debate on this subject in the MSDN Forum with Akshai Mirchandani from the dev team about the pros and cons of each approach where some interesting new details came to light:

http://social.technet.microsoft.com/Forums/en-US/sqlanalysisservices/thread/681e59bd-93ca-4a91-9f26-8ed96e825553

Here are the main points you need to consider when deciding whether to use the single cube approach or the multiple cube approach. In favour of the multiple cube approach:

  • Having multiple, smaller cubes may result in faster query performance than one large cube in some cases, especially if your fact tables have very different dimensionality. This was true in SSAS 2005, and while it’s less obvious in 2008 it’s apparently still there. This was what I’d previously not been sure about: I’d heard rumours about this, and seen it happen in some cases myself with 2005 – although in other cases when I’d tested this out I’d seen no difference in performance – and I wasn’t sure what the situation was with 2008. You’d need to test the two approaches yourself on your cubes and queries to be sure. Here’s what Akshai says on the matter:

    if you keep adding lots of dimensions to a cube, then the virtual space of the cube grows — it does not add to the storage cost, but it does hurt formula engine performance in some scenarios because the cell coordinates are based on the number of attributes in the cube space. Increasing the number of attributes in the cube space will start costing performance in lots of small ways and result in performance regressions. Adding lots of unrelated measure groups would result in you adding lots unrelated dimensions to the cube space and cause a performance slowdown — if you had 10 measure groups and they all shared lots of common dimensions, then one cube makes the most sense…

    …As I already explained… it affects the sizes of data structures inside the formula engine that are based on the number of attributes in the cube space. When those data structures get larger, there is an incremental cost that can add up (depending on your calculations and query patterns).

    For example, you see the Query Subcube Verbose events in Profiler — they show you the subcubes that are used for accessing measure groups. There are similar subcubes that are used for calculating formulas and cell coordinates — all those subcubes get wider and wider as you start adding more attributes into the cube. The cost of accessing and indexing those data structures is what we’re talking about here. If adding new measure groups doesn’t require adding new attributes/dimensions, then there is no problem…

    We had measured the difference before 2005 shipped for some real customer cubes and found there there was a noticeable performance improvement to split up into multiple cubes…

  • While it is possible to apply dimension security to the Measures dimension, it is much easier to allow or deny access to a cube with the multiple cube approach than it is to apply security to all the measures in a measure group using the single cube approach.
  • Having multiple, simpler cubes can be much more user friendly than one monster cube with loads of dimensions and measure groups. If you have Enterprise Edition you can of course use Perspectives to counter this, but if you are using Standard Edition then Perspectives aren’t available.
  • Maintenance can be easier and less disruptive with multiple cubes: if you need to make changes to a cube while users are querying it, you might end up invalidating users’ connections and dropping caches. With one cube the chances of this disruption affecting more users increases.
  • It’s easier to scale out with multiple cubes: if you find your server is maxing out, you can simply buy another server and distribute your cubes equally between the two. With a single cube approach you end up having to look at (admittedly not that much) more complex scale-out scenarios like network load balancing.

On the other side, here are the arguments in favour of the single cube approach:

  • If you ever need to work with data from two fact tables in the same query or calculation, or if you think you might ever need to in the future, you should go with the single cube approach. The two options for cross-cube querying, linked measure groups and the LookUpCube MDX function, should be avoided. Linked measure groups are a pain to manage, carry a slight query performance overhead, and can result in the same MDX calculations being duplicated across the original cube and the cube containing the linked measure group (which means maintenance becomes more difficult). The LookUpCube function is probably the worst MDX function to use in a calculation from a performance point of view and should be avoided at all costs. So a single cube is the only feasible option.
  • Even if your users tell you they will not ever need to analyse data from two fact tables in the same query, be prepared for them to change their minds. In my experience, SSAS projects have a tendency to grow in complexity over time, and cubes that start out simple in a first release often grow lots of new functionality as time goes on – and the more successful the project, the quicker things get complicated. As soon as your users see what’s possible with SSAS they will start to have new, more ambitious ideas about the kind of analysis they want to do with their data, and it’s very likely that they will realise they do need to do cross-measure-group queries and calculations. If you started out on the multiple cube approach and then this happens you will have no choice but to use linked measure groups, and as I said this can make maintenance difficult; using the single-cube approach from the start means you won’t have this problem.

My personal preference is to use the single cube approach by default, and then move to multiple cubes if there are pressing reasons to do so, for example if query performance is a problem. This might seem a bit strange given the number of reasons I’ve given for the multiple cube approach, but frankly the need to support cross-measure-group querying and calculations trumps them all. As I said, if you need to do it (and 99% of the time you will), or you even half suspect you might need to do it sometime in the future, you have to go with the single cube approach. That said, I know other people are more inclined to the multiple cube approach than I am and to a certain extent it’s a matter of taste.

47 thoughts on “One Cube vs Multiple Cubes

  1. Nice article :)For me, the decisive factor is the number of shared dimensions we have between the fact tables – which defines whether we could be building any cross-measure-group calculations. This is imho not related to current user need – as it is more of a measure of the data itself – if it is related enough to make sense in the same report/query.

  2. There could be cases where the measures share some of the dimensions, but those measures might not make sense for cross analysis. If the measures share all of the dimensions, I agree with Chris that they might be used for cross analysis, if not now, maybe in the future. But I would differ to say the same about measures which don\’t share all the dimensions. And it is in such cases where you would start to think about splitting your cubes for better performance.

  3. Having said that, I would say that this issue is like a double edged sword, you might as well as harm the performance if you don\’t do this correctly, as Chris rightly said

  4. Another reason for 2 cubes could be, to have measure groups on a different granularity level. Or you have similar dimensions with one dimensions that should be aggregatable for one measure group and not aggregatable for another measure group

  5. Chris, there is one other aspect (w/o being destructive): in companies like mine where at least more then 6 developers in 2 countries woking together it is hard nough to work with one solution in one cube catalog (i believe you rember all that discussions we had while you was with us), so i think work with big cube woul be impossible at the end, isn\’t it? But on the other hand – it still looks analysis services isn\’t a product to work with developer teams……

  6. Good point, Dietmar – yes, having more than one developer on a cube is practically impossible. However to avoid any problems you\’d need to have multiple databases, not just multiple cubes…

    1. of course you are right Chris – and then we are back in trouble with having redundancy and the problem to hold all the confirmed dimensions similiar….

  7. Viktor, I don\’t really understand/agree with your point about granularity – you know you can join a dimension to two measure groups in the same cube at different granularities?

  8. Nice article Chris.
    Like you I tend to default to a single cube by default, but often I find it’s beneficial to create one cube per department or business function. This makes security a breeze, and each cube can contain the combination of dimensions and facts relevent to each department.

    It often means that some measure groups are duplicated in multiple cubes, so it’s only a viable approach if space and processing time allow.

  9. Great post Chris

    On the other hand my case is a bit different. I’m just starting to be involved in a project where the currently proposed design, because of security, is to have up to 200 (identical) cubes!

    Am I correct to assume that the overhead of managing all these cubes would counterbalance any other type of advantage here?

    Using dynamic security (for example as proposed in your book) should allow to have one single cube…

  10. Reblogged this on David Laplante and commented:
    Recently, I was having a discution on whether a given cube implementation should have, or not, been implemented as multiple cubes instead of just one. This blog post from Chris Webb pretty much sums up my thinking. I tend to go the “1 cube route” by default for the ease of use. But I do like the arguments for both cases.

  11. What about using single cubes then creating a cube combining all the single cubes together using “linked” measure groups?

    1. I would not recommend using linked measure groups – there is a slight performance hit using them, and more importantly you end up duplicating calculations between the original cube and the cube with the linked measure groups in, which can make maintenance difficult.

      1. Thanks for the response.

        What about only placing the calculations in the “linked measure group” cube?

        For example, we have 15 cubes, 14 regular and one cube linking together the 14 other cubes. We have the 14 cubes hidden from the end user and place all the calculations in that master cube.

        15 total dimensions, only four of them are common across the 14 measure groups. The end user is smart enough to know what to slice by in the master cube.

        Thanks again!

      2. You could just put the calcs in the linked measure group cube, but then what would be the point of building separate cubes? If the calcs aren’t there in the individual cubes then users will tend to use the linked measure group cube and you might as well have just created one cube in the first place.

  12. In Analysis Services, I am using sharing dimensions and have 2 cubes against one database. But today I could only see one cube, I have checked the security and everything looks ok to me. can anyone help me please? Recently I have done some changes to the cube but don’t know where to start from..

  13. I have two cubes with shared dimensions, when I process one cube (after processing its dimensions) the other cube is unavailable, any solutions, Please ?

    1. You’re probably doing a full process on the dimensions in your first cube – when you do a full process on a dimension, all cubes that use that dimension also need to be fully processed. Have you looked at other processing options, such a Process Update?

      1. Thanks for the response.
        Yes, It’s done using Process Update.

        Thanks again !

  14. Hi Chris,
    It’s really a very good discussion and suggestion in your blog for Multiple Cube V Single Cube topics…. I have something similar issue in my current project… we have build a large Single Cube which contains 30 Dimension and 7 Cubes.. but the problem we are facing currently is we are using self Service reporting in Excel and sharepoint Poverview functionaly for the users… so when user is trying to create a report they are not sure which facttable columns are related to specific dimension so they are getting Cartesian product result.. this all Fact table are using a common dimension.. our concern is we are planning to use a multiple tabular model.. So do you think is this the right approach or we should go for the Multiple cube Approach….For example If we use multiple cube and suppose we add new attribute to the single dimension source table for one cube then do we have to refresh all the DSV and cube for other Cubes which are using same dimension. how difficult is to manage the dimension and fact….. Please give us some Idea.
    thanks

    1. Hi Diptesh, have you considered using Perspectives to make your cube look like lots of smaller cubes? That might be the best solution.

  15. We will think about it ..but can you restrict the particular group of users to see only those perspective which are related to them when they browsing the cube

    1. No, you can’t. You can use security to make sure those users can’t see any data if they do look at a particular perspective though.

  16. If you have different subject area for which the kind of user load you expect is different and the latency requirements are different, is it make sense to go with single cube?

    Do we need to have any other consideration for tabular model?

    1. User load and latency should only be considerations if you think you might need to scale out.

      For Tabular, lots of things are different! That question would require more thought and probably another blog post.

  17. Does the one big cube recommendation still hold in the modern world of tabular models and Power BI? You can still merge data together using CUBEVALUE type formulas in Excel and SSRS via SSIS packages which join data sources, albeit with a more static presentation layer. Plus Power BI allows you to easily bring all of your data together in a single dashboard although I realise it can’t be combined on the same visual.

    We are going down the route of 4 or 5 smaller more targeted cubes, mostly for usability and scalability reasons. But we will combine data in these cubes from the same data warehouse and also take advantage of Dashboards in Power BI to bring the data together.

    We have had bad experience of one massive multi dimensional cube – trying to shoehorn round data into square holes led to hierarchies 15 levels deep and lots of invalid data intersections which makes it really difficult to use.

    1. Yes, I think it does still hold *in general* – trying to combine data using CubeValue() etc is a bit of a hack. It’s hard to comment on the problems you’ve had without seeing your cube, but it sounds like in your case multiple cubes could be the right way to go though.

  18. Hi Chris, still this applies now days for Sql server 2016?

    In my scenario I have clear these points:
    1. there will be multiple cubes per process (sales, purchase) to manage access security
    2. there will be the need to make cross cubes analysis

    So, I would aproach this by starting from one master cube and then create several per process cubes using linked measure groups. What is the difference from mashing up the several per process cubes into one master cube by using linked measure groups?

    Sincerely

    1. Yes, this still applies for SSAS 2016 Multidimensional. Linked measure groups have lots of problems associated with them: difficult maintenance, poor performance etc. I strongly recommend that you create a single cube with multiple measure groups and then use perspectives to make it look like the small cubes you need.

      1. On the other hand we’ll have some issues when deploying the changes: this will put off all the reports 🙁 Also, implementing security cannot be done by means of perspectives – therefore it is needed to have multiple cubes :/
        In your experience what were the choiches you made?

        Thank you

      2. No, you don’t need multiple cubes to apply security. Instead of allowing/denying access to cubes, you can use dimension security on measures for example.

  19. Hi Chris,

    I am new to Power BI and SSAS Tabular Model so I need your kind help for some clearifications please,

    1. Power BI does not allow to write MDX or DAX query to SSAS Tabular Model directly in live
    connection?

    2. How do you see columnstore index vs SSAS Tabular Model?

    Kind regards,

    1. Hi Waheed,

      1) No, it doesn’t unfortunately
      2) This is a subject that a lot of people have covered in depth, but generally speaking I say this: the two can’t be compared easily. Columnstore indexes make SQL queries go faster; SSAS is much more than fast queries though, it is all about providing a semantic layer so that your users can get to data without needing to know how to write queries.

  20. Hi Chris,

    Glad to hear so quick from you!!

    With Ref to question no 2,

    Can’t we use Relational DWH as a semantic layer for users to get data without queries?

    Kind regards,

  21. Hi Chris,

    I am having error “Can’t determine relationships between the fields”.

    It happens when i am in live connection to tabular model and try to put a field to visual from another table which is not directly linking to my fact table but linking via another table.

    so my question is, Does POWER BI only understand Star Schema in tabular model?

    Kind regards,

Leave a Reply to seenualnatCancel reply