Partitions and Aggregations

Something that’s easy to forget (at least I’m always forgetting it) when creating an aggregation design is the impact that partitioning can have on it. Aggregations are only created on a per-partition basis – you can’t aggregate across partitions – so that means, for example, when you’re partitioning by Month then there’s little point building aggregations at granularities higher than Month; after all, if you only have one Month of data in your partition and you aggregate that data to the Month, Quarter or Year levels then the figures for each aggregation will be identical! In fact building aggregations above the granularity of Month can be a bit of a waste: an aggregation at Month granularity can be used by queries at Quarter and Year, but an aggregation for the same partition at Year granularity that is in all other respects identical can only be used by queries at the Year granularity.

The best thing to do to ensure this doesn’t happen is to set the AggregationUsage property to None for all attributes above the granularity that you’re partitioning by, and also to make sure your partition row counts are set correctly, before you run either of the Aggregation Design wizards. This should ensure that your aggregation design properly reflects your partitioning strategy.

One of the questions I’ve always meant to research further on this topic is whether partitioning could hurt query performance, given that it effectively prevents certain aggregations being built. Taking the example we’ve been using so far, what happens if you’re partitioning by Month and you’ve got a slow query at Year granularity that you’d like to build an aggregation for – would partitioning by Year rather than Month, so you could build a true Year-level aggregation, make sense? Well, I’ve just had a chance to test this out on a fairly large cube (with approx 100 million rows in the main fact table, and Product and Customer dimensions with 1 million+ members on their key attributes) and interestingly partitioning seems to make no difference at all. I created two measure groups, one partitioned by Month and one partitioned by Year, and then created one aggregation on the former at Month/Product and one aggregation on the latter at Year/Product. I then ran a query that returned the top 10 Products at Year on each measure group and the performance was identical. Clearly this is not something to be worried about, thank goodness…

[Thanks are due to Eric Jacobsen of the SQLCat team for talking through some of these issues after the BI Conference earlier this year. He accurately predicted that partitioning wouldn’t hurt performance because the AS query engine is very efficient at merging the data caches of partitions.]

7 thoughts on “Partitions and Aggregations

  1. Chris,
      I read ur blog a lot and was reading the article about partitions.  We have cubes that are fairly small…~1 gig.  We have placed aggregations on them and used the BIDS helper with the usage based optimization and gained some performance results.
      First,  is there anything else MAJOR that we can do on the SSAS side?  Secondly, from your perspective will partitioning help greatly and is it worth trying or is this a 5-10% gain ?  I have read all the major white papers and from what I gather Partitions/Aggregations are the two biggest wins for performance and the rest of the stuff (i.e. memory tuning, attribute properties) have minimal effect on performance.
    Thanks in advance,
    Bart Czernicki

  2. Hi Bart,
     
    Partitions and aggregations are the key to good performance from the AS storage engine. If your queries don\’t include any calculations of any kind, then partitions and aggregations are going to be the key to performance tuning although hardware does have an impact too (especially IO performance, but also processor and memory are important when you have many concurrent users). Even after you\’ve build aggregations using the wizards, I would recommend downloading and installing BIDSHelper (http://www.codeplex.com/bidshelper) and taking a look at the aggregations that have been built, and also looking in Profiler to see whether those aggregations are being used by your queries. Partitions and aggregations have more of a noticeable impact when you get above a few million rows of data but they are always worth trying.
     
    The other major topic for tuning is the tuning of the MDX used in calculations and queries. Mosha has blogged a lot on this and this can have a very dramatic impact on performance, especially when the calculations are complex and performance is poor for the volume of data. It\’s still something of a black art, though, to know what kind of MDX performs best in each scenario.
     
    HTH,
     
    Chris

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      An aggregation is like the result of a GROUP BY query in SQL run on your fact table, calculated when the cube is processed and stored inside SSAS, which is used by SSAS to make queries run faster.

  3. Hi Chris
    When you say “you can’t aggregate across partitions”, do you mean the slicer property that seperates the partitions or just the fact that the where clause seperates the partitions of a single fact table? Like UTC> 2010 AND utc 2011 AND UTC < 3/2013
    and say aggregatons are build based on months.

    Thanks much

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      Hi Sarah,

      What I mean is that aggregations can only contain data from a single partition. Whether you set the Slice property on a partition or not is irrelevant here; if you divide your measure group up into partitions, then when you build aggregations you will find that aggregations are built on a per partition basis.

      HTH,

      Chris

Leave a ReplyCancel reply