Designing Effective Aggregations in AS2005

Aggregations are, as we all know, the key to getting the best performance out of your cube. In AS2K it was usually enough to run the Aggregation Design Wizard to get a reasonable aggregation design; in my experience with AS2005 it’s much harder to achieve good results – a lot of people run the wizard and find that performance is still poor. So what I thought I’d do is detail the process I go through when I’m designing aggregations…

  1. The first step in getting good performance is making sure your dimension design is as clean as possible. The commonest thing I see when I do performance tuning on a cube is that there are loads of useless attributes in the dimensions which have been created by the cube design wizard. Let’s take a time dimension as an example: imagine you had a table with four columns, timeid (containing the surrogate key), month, quarter and year, where month is the lowest level of granularity. In a lot of cases the wizard will translate this to four attributes, one based on each column, but of course you don’t need an attribute based on the timeid column – no user is ever going to want to analyse data by a surrogate key value. Since timeid is in fact the key value corresponding to the month name given in the month column, what you should do in this case is delete the month attribute, set the name column property of the timeid attribute to point to the month column and rename the timeid attribute to Month; its key column property will stay pointing to the timeid column. As a result you have one attribute where you used to have two, a lot less complexity in the ‘space’ of your cube which will make designing aggregations easier, and a more efficient design.
  2. Set the AttributeHierarchyEnabled property of any attributes which you don’t want users to analyse by to False. An example would be an address attribute, which you might want to display as an AS2K-style member property, but you would never expect a user to drag onto a grid or slice by on its own. Again, this reduces the ‘space’ of the cube and makes designing aggregations easier by reducing the number of attributes that need to be considered during design.
  3. Next, I always see if I can design in one-to-many relationships between attributes where possible. Going back to the time dimension example, it could be that in the quarter column of your dimension table you have values like ‘Q1’ and ‘Q2’ – these tell you what quarter any given month is in, but don’t tell you what year the quarter is in, so there are no one-to-many relationships between year and quarter and quarter and month. If you add a column concatenating the quarter with the year, ie so you get values like ‘Q1 1999’ and ‘Q2 2000’ and build your quarter attribute from the new column, one-to-many relationships will exist; if your users still want to analyse by quarter independently of year then you can still build an attribute from that column too. Quite why this is a good thing will become clear later on.
  4. Model attribute relationships between attributes where possible. Attribute relationships are, as I’ve said on several occasions in the past, the single most important thing to consider when designing a dimension and because their importance is not flagged anywhere in the UI or BOL I find they’re widely neglected. Setting them correctly is extremely important for aggregation design and also for allowed AS to use aggregations at lower levels of granularity when your query doesn’t hit an aggregation directly. For more information about attribute relationships and how to set them, I recommend taking a look at the TechEd presentation on AS performance I linked to in my last post http://cwebbbi.spaces.live.com/blog/cns!7B84B0F2C239489A!906.entry, BIN 316. You should also read my post on the dangers of modelling redundant attribute relationships: http://cwebbbi.spaces.live.com/Blog/cns!7B84B0F2C239489A!619.entry
  5. Model all the user hierarchies you think you’ll need. If you’ve set you’re attribute relationships you’ll (hopefully) find that a fair propotion of them are natural hierarchies, ie a one-to-many relationship exists between each level. This is probably a good place to link to Elizabeth Vitt’s excellent post on influencing aggregation candidates: http://www.sqlskills.com/blogs/liz/2006/07/03/InfluencingAggregationCandidates.aspx . Before you read this post any further I suggest you read, learn and inwardly digest her post. The only thing I disagree with her about is possibly a typo – at the end of her post she says "scenario 6 and scenario 2 provide the best solutions from an aggregation candidate point of view" – I think she meant to say scenario 6 and scenario 4, since scenario 4 has attribute relationships set whereas scenario 2 doesn’t.
  6. Set the AggregationUsage property on each attribute appropriately. Elizabeth’s post above is very good on what this does; the only thing I have to add is that in many cases I find that relying on natural user hierarchies alone to influence aggregation design isn’t enough. For example, I find that on a Time dimension users almost always analyse data at the month attribute level even when higher levels of granularity on that dimension exist – in these cases I set the AggregationUsage property to Full for the month attribute, and this ensures that every aggregation designed by a wizard includes the month attribute. If you don’t do this then you might find you run the wizard and no aggregations are built at month, and so when your users start querying they get no benefit from the aggregations you’ve built. Similarly, I find it’s a good idea to set AggregationUsage to Unrestricted on any attribute hierarchies you’re exposing to users which you expect to be queried extensively, and setting AggregationUsage to None on any attributes which are part of user hierarchies (especially at higher levels) but which you nevertheless don’t expect to be queried that much.
  7. Run the Aggregation Design Wizard. What I usually do before this is to deploy and fully process my cube with no aggregations; you can then run the wizard and design aggregations, then process the aggregations separately by doing a Process Index. This way you can see what the performance of your cube is like before aggregations have been built and get a better feel for how much of an impact they’re making on performance; you can then throw your aggregations away, design some new ones and do another Process Index (which is generally fairly quick) if you need to. When you run the wizard, on the ‘Specify Object Counts’ step, where possible click count to make sure that the values in the Estimated Count column are as up-to-date as possible. More importantly, if you’re partitioning your cube, you need to make sure that the values in the Partition Count column reflect the count of the numbers of members on each attribute that will exist in each partition: for example, if you’re partitioning by month, you will want to tell AS that there’s only one month in each partition. These counts are, I believe, used during the aggregation design process to estimate the size of an aggregation, and if you don’t specify correct values then the algorithm may incorrectly think a useful aggregation is much bigger than it actually would be and not build it as a result. On the ‘Set Aggregation Options’ step of the wizard the rule was in AS2K to choose the ‘Performance Gain Reaches’ option and set it to stop at 30%. This is still good advice but more and more in AS2005 I find myself using the ‘Estimated Storage Reaches’ option instead: in an AS2005 cube there are a lot of very small aggregations that can be built, and by stopping at 30% Performance Gain you may find that you have still only built a few Kb of aggregations. If you select the ‘I click Stop’ option and watch the design grow until the estimated size is ridiculously large (maybe over a couple of Gb) you can then get a feeling for how many small aggregations can be built; you can then stop it, reset the aggregations and then restart using either the ‘Performance Gain’ or ‘Storage Reaches’ option set to an appropriate level.
  8. Design aggregations manually if necessary. In AS2K you only needed to design aggregations manually in very rare cases, but I find myself doing it much more frequently in AS2005 typically when I have a particular query or RS Report which needs tuning. There is apparently a tool in the pipeline which will help with this task (I’ve heard several people mention it on the newsgroup) but I’ve not seen it yet and I don’t know when it will be released; in the meantime designing aggregations manually means hacking the XMLA definition of a measure group. When you design aggregations using one of the wizards, the end result is an ‘aggregation design’ which gets saved on the measure group; each partition then has a property which refers to the id of the aggregation design on the measure group its assocated with. In SQL Management Studio, to see an aggregation design, you need to connect to AS in the Solution Explorer pane, expand your cube so you can see the measure groups in it then right-click on a measure group and select ‘Script Measure Group as/ALTER to/New Query Editor Window’. In the resulting XMLA if you collapse the Measures, Dimensions and Partitions nodes you should be able to see the AggregationDesign node; if you collapse the Dimensions node underneath it you’ll see an Aggregations collection containing the list of aggregations in the aggregation design. It’s pretty easy to work out what’s going on: each aggregation has a name and id which by default is a number in hexadecimal, and it consists of a collection of dimensions. If there’s an aggregation designed which is at a lower granularity than the root of the dimension then there’ll be an attribtues collection, inside which will be one or more attribute objects.  An aggregation is just fact data summarised up to a certain level, and when you see an attribute mentioned in an aggregation definition you know that the aggregation contains data summarised up to that level of granularity. To design your own aggregations manually all you need to do is copy the last aggregation in the list, paste it into the definition at the end of the list, update its id and name properties and add or remove attributes to set its granularity; to determine what granularity you need to build your aggregation at you’ll find it useful to run a Profiler trace and look at the Query Subcube and Query Subcube Verbose events which Mosha describes in more detail here: http://sqljunkies.com/WebLog/mosha/archive/2006/01/05/cache_prefetching.aspx. Assuming you’re editing an ALTER TO script you can then just run the command in SQLMS and run a Process Index to get all your aggregations, including the new one, built.
  9. Run the Usage Based Optimisation Wizard. In theory, Usage-Based Optimisation should be the icing on the cake as far as performance tuning goes. All you should need to do is turn on the query log (right-click on the server name in SQLMS and look at the server properties to do this) so that the queries your users are actually running are captured, then you just run the wizard, select which queries you want tuned and then extra aggregations are built appropriately. This is how it worked in AS2K, but the other week I made the discovery that in AS2005 when you run the wizard it overwrites any aggregations you have previously designed on the cube without giving you the option to add to the ones you’ve already got, so if you’re not careful you could end up tuning one set of queries and find that other ones start running slower. To stop this happening what you’d need to do is script your measure groups before running the wizard to save your existing aggregations, then after the wizard has completed try to merge the two sets of aggregations together manually. This isn’t exactly ideal so I’ve opened a Connect to try to get it changed: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=221844 Feel free to vote on this issue to get it fixed!
  10. Check to see whether aggregations are actually being used by your queries. You can do this very easily in Profiler that includes the ‘Get Data From Aggregation’ event (found under ‘Query Processing’ and only visible if you check ‘Show all events’). It’s worth remembering that even when your query hits an aggregation directly it will still perform poorly – there are a lot of other factors in play such as calculations which can affect query performance.

So anyway, there are my thoughts on designing aggregations. If I’ve forgotten anything or got anything wrong, or you’ve got any of your own tips to add, please leave me a comment! Hopefully the AS2005 version of the "Analysis Services Performance Guide" which I understand is coming soon will contain all of this information and more, but in the meantime I hope you find it useful.

UPDATE: James Snape has a useful blog post describing what all those 1s and 0s mean in Profiler – something I probably should have explained myself:

22 thoughts on “Designing Effective Aggregations in AS2005

  1. Hi Chris, have you tried to move UBO results from a productive server into a test server?
    We could not get this approach up and running…

  2. Excellent post Chris. I live by the Partition Aggregation Utility in AS2000 and was considering writing a similar app but since it\’s in the pipeline, nah.
    We\’ve been seeing bad query performance on our large SSAS2005 cubes – we have a single cube with two very large dimensions (five million and three million members, commonly cross joined) and about 12mm fact rows coming in per day. Having the design wizard set up the aggs just didn\’t cut it. I need precise control over the aggregations.
    Like your ideas on setting aggs manually in 05, haven\’t put much thought into it myself.

  3. Hi Chris,
    Excellent article!
    question around the aggregation design…
    I have a problem with the "modified attribute relationship". Which property is changed and where? Because I have optimized my rigid/flexible options in my dimensions but the aggregation design don\’t bold attributes not in a user defined hierarchy until I change the aggregationusage of this attribute in the cube to "unrestricted".
    Also, what is the impact of selecting "unrestricted" at the dimension level instead of the attribute level? doing this will apply the unrestricted option for all of the attributes in the dimension?

  4. Jerome,
    I\’m not sure I understand your first question – attributes not in a natural user hierarchy won\’t be considered as candidates for aggregation unless AggregationUsage is set to Unrestricted. To answer your second question, you can\’t set the AggregationUsage property at the dimension level, it only exists on an attribute (unless I\’ve missed it somewhere).

  5. Hi Again,
    you are right, at the dimension level the option is "AllMemberAggregationUsage" and at the attribute level the property is "AggregationUsage"; Both properties propose the same options.
    Have you some tips and comment on these 2 properties?
    For my first question, its more related to the Eliz. post. But I think there is something wrong in the comments and we must use the aggregationusage property to add it in the design.
    Last question: Can you confirm the best options for my time dimension. I have a user hierarchy based on year/month/day; Most of my webpages use the year level; Another common used attribute is the week day; So to be efficient, does I have to setup the aggregationusage to unrestricted to both year and week day? (others attributes set to default) or set it to "Full"?

  6. Oh yes, sorry. To be honest I\’ve never noticed the AllMemberAggregationUsage property before…! So as a result I\’d need to have a play with it before I can see what it does.
    When Elizabeth refers to modified attribute relationships, she means changing them (where possible) from the relationships generated by the dimension wizard to an optimised state, which is what I talk about in step #4. Take a look at the Powerpoint deck I link to in the post, it has a good explanation of what to do and where.
    To answer your last question, yes, you could set the Year and Day attributes to Unrestricted but that doesn\’t mean that all your aggregations would include the Year or Day level – some might not. What I would do is set the Day level to Full, see what performance is like, and if necessary save your aggregation design, change Day to Unrestricted and Year to Full, rerun the aggregation design wizard and then merge the old aggregation design with the new one.

  7. Hey Chris,
    The AllMemberAggregationUsage is to tell AS whether to create an aggregation on the All member of a dim (or not as the case may be).  The property settings are the same (except maybe the default is smarter being Unrestricted) but it looks like there could be an issue with it not quite working (http://support.microsoft.com/kb/912390).

  8. Thanks for the guides.
    I\’ll do some tests with these different aggregation options.
    But I have found another point to discuss with you…
    I have a measure group with 4 partitions (1 by year)
    3 partitions are filled, the last one is empty. (1 partition = 40 million of rows)
    I design my aggregations for the 4 partitions at the same time, then I test. Excellent result (1 second for my test query).
    Now I fill the latest partition and process everything and I keep the current aggregation design (the amobrowser says that the partition 4 use theaggregation design)
    The performance drops (30 seconds)!!! and SSAS query the whole cube and no aggregation are asked anymore and the query is not splitted among each partition!!!
    If I recreate the aggregations from scratch to make sure that SSAS will have everything correct. Ok, adding a huge amount of data required this, But  I\’m surprised that SSAS is no longer able to query correctly the partitions. Having bad aggregations ok, but bad partition access?
    any idea?

  9. Thanks for the excellent information, Chris.  You wouldn\’t believe how much this has helped me, as I\’m quite new to BI.
    I do have one question.  Specifically regarding step 3, I have a date dimension with attributes Day, Month, Quarter and Year.  My Month & Quarter Data has only integers stored in the relational db (i.e. month 1 thru 12, quarter 1 thru 4.)  From your post I\’m creating a new field for the Quarter: 2006 Q1.  My question is what should I be making the data look like for a new month column?  It seems like it would either be something like "Q1 January" or does it need to relate all the way back to year like "2006 Q1 January" ?
    Any help woudl be appreciated.  Thanks,

  10. Hi Craig,
    I would use something like January 2006 – the Q1 information isn\’t necessary to disambiguate the data, and no-one in real life talks about Q1 January do they?

  11. Hi Chris,
    First of all I would say that your post is really helpful during agg design and thanks for that. I have question regarding designing aggregaions and selecting candidates. Consider we have only two Dims viz. Dim Geography [Country/State/City] and Dim Time [Year/Qtr/Month/Day]. In case most of the queries are at State/Day levels (lets say 50%), and some are at State/Month (say 20%) and few (say 10%) are at Country/Month Levels, what kind of aggregation usage property for these atrributes should be set.
    Now as per my understanding we will consider State and day as "FULL" aggregation candidates.
    My question is that when we are specifying "Day" as aggregation candidate, should we have aggs above day level as any queries which involve levels above "Day" should use aggregated values for "Day" and sum them to get "Month" and so on.. similarly as we have "state" as part of aggs, should we explicitly make "country" as part of aggs or its agg. usage should be none?

  12. Hi Sameer,
    I\’d agree that State would seem the obvious candidate for a Full AggregationUsage; perhaps you could create multiple aggregations at Day and Month level. But if you only had two dimensions in the measure group anyway I\’d wonder how much gain you\’d get from a State/Day aggregation as it wouldn\’t be that much smaller than the underlying fact table. 

  13. Hi Chris
    Thanks for your input. Actually I dont have only two dims as part of that partition, but many other as well. This was just a simple scenerio taken to understand the aggregation usage property. I would appreciate if you could try to suggest me agg. usage properties for all attributes of these two dims if I want the aggs to include following only [day/state]. (pls. take a look at the question in my previous comment)
    Thanks a lot!

  14. Hi Chris.
    In your article, you described a problem  with \’Usage Based Optimisation Wizard overwrites existing aggregations\’.  I followed your link to vote on this feedback, but the status on this issue is \’Closed/Fixed\’ by 30/07/2007. Do you know if it is possible to download a fix for this now?

  15. Hi,
    I am currently using aggregation design wizard and one of my partitions is giving me this results:
    Maximum optimization % = 32%
    Number of aggregations = 1146
    Space occupied = 186M
    Is this normal? So many aggregations for so little optimization and space?

  16. It\’s strange, certainly, but not impossible. \’Optimisation\’ is something of a misleading term: it doesn\’t mean your queries will run 32% faster, it just means AS has built 32% of the possible aggregations it thinks it could have built. Anyway 1146 aggregations is way too many, and I would recommend never building more than 100 and usually a lot less. 

  17. Hi Chris,I followed your presentation last month on the SQL Server Day event in Mechelen, Belgium. Great one, as it was exactly what I was struggling with at my "larger" projects: AS performance. It really helped me a lot and gave me a lot of confirmation in what I am doing. Though I still have one question left…More specificly, it goes about issue nbr 2 of the post above. What if disabling hierarchies is not possible? (For building local cubes, I need my hierarchies to be enabled, otherwise my sorting not work). So I was wondering, what about the AttributeHierarchyOptimizedState property? Does this needs to be set to FullyOptimized for sorting attributes or not? If not, what is the impact?Thanks a lot in advance,Tom

Leave a ReplyCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.