What happens when you do a Process Update on a dimension?

Over the last few days I’ve been involved in an interesting thread on the SSAS forum regarding what happens when you do a Process Update on a dimension. It’s a topic that is not widely understood, and indeed I’ve not known all the details until today, but it’s nonetheless very important: one of the commonest performance-related problems I see in my consultancy work is partitions that have aggregations designed for them, but where those aggregations aren’t in a processed state because a Process Update has been run on one or more dimensions. Anyway, just now Akshai Mirchandani from the dev team posted a really good overview of what actually happens when you run a Process Update on that thread, so I thought I’d copy here to ensure it gets a wider audience:

Here is a quick summary of what happens when you do ProcessUpdate:

1. After the dimension has been updated, the server analyzes the changes that occurred to the dimension. In 2005, this analysis was pretty simple and would often incorrectly detect that major changes had occurred that required clearing of indexes and aggregations. In 2008, this code was improved such that it more often would realize that nothing significant has occurred. It’s a fairly small (but useful) optimization — I guess nobody thought it was worth documenting!

2. Based on this analysis, the server will decide whether or not indexes and aggregations need to be cleared. If no (e.g. because records were only added and not deleted/updated), then the partitions won’t be affected.

3. If indexes/aggregations need to be cleared, then the server will check if ProcessAffectedObjects was enabled — if yes, then instead of clearing the indexes/aggregations it will rebuild the indexes/aggregations.

4. The act of clearing the indexes/aggregations also shows up as "partition processing operations" in Profiler — that’s one of the things that has been confusing some of you.

5. When aggregations are cleared, only the flexible aggregations need to be cleared because we’re guaranteed by the rigid relationships that the members cannot have moved and therefore the rollups cannot have changed. However, indexes can still have changed and therefore you may still see the partition processing jobs kick off to clear the indexes.

6. ProcessIndexes and ProcessClearIndexes take care of building both bitmap indexes (aka map) and aggregations — the context is that both aggregations and bitmap indexes are generically considered "indexes".

Really the main takeaway here is that if you ProcessUpdate a dimension, you should strongly consider either doing ProcessAffectedObjects or an explicit ProcessIndexes on the affected partitions so that bitmap indexes and flexible aggregations get rebuilt. The advantage of explicitly doing ProcessIndexes is that you can bring your cube online earlier and have the indexes/aggregations get processed more lazily in the background — a number of customers prefer to do that because their processing windows are too small to wait for the indexes to get processed.

Also related to this topic, I thought I’d also highlight a great post by Darren Gosbell where he shows how to find out if your aggregations are processed or not:

http://geekswithblogs.net/darrengosbell/archive/2008/12/02/ssas-are-my-aggregations-processed.aspx

20 thoughts on “What happens when you do a Process Update on a dimension?

  1. Hi I have deployed PPS2007 in a test lab. All went well and without issues. From the Administration Console I configured everything without issues. However, when I went to Excel and tried to use the Report Wizard to create a report, I am asked to select the required model which I do without issues. Then I click Next, where I am asked to select the dimension and Hierarchy which I select OK. Then when I click the memeber selection field I get an error as follows: "Error getting the database context for the model site[2035d974-eeea-4143-9231-7d76089fb68f]can someone help please?regards

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      Can you explain what you mean exactly?

  2. Christopher Ross – Sydney, Australia – I'm simply a problem solver. and Somehow that led me to the wonderful world of BI. One of the greatest things in life is to equip others for success, and that's just what a good BI consultant does. We provide organisations with the tools to turn the raw power of data in to the harnessed energy of information. If the people and businesses I serve can see that I've been a part of helping them make great decisions, save time and make money then my days are made!
    Chris Ross says:

    Would setting the processing mode to lazyaggregations for measure groups related to a dimension that gets the processupdate command be helpful?

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      It would mean that there’s one less thing to have to do explicitly, but in general I don’t like the idea of lazy aggregations – I like to be in control of what’s happening and I don’t like the idea of unpredictable performance.

      1. Christopher Ross – Sydney, Australia – I'm simply a problem solver. and Somehow that led me to the wonderful world of BI. One of the greatest things in life is to equip others for success, and that's just what a good BI consultant does. We provide organisations with the tools to turn the raw power of data in to the harnessed energy of information. If the people and businesses I serve can see that I've been a part of helping them make great decisions, save time and make money then my days are made!
        Christopher Ross says:

        I can relate, thx

  3. Greg Coopman – Independent SQL Server Business Intelligence Contractor from Dallas, Texas. Go to http://www.sqlcoop.com for more information.
    sqlcoop says:

    Can you run Process Cube Default to rebuild any of the aggregations or indexes that were dropped as a result of the dimension update process?

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      Yes, you can.

  4. Hi Chris,

    First of all, thank you for all your very helpfull articles! Yours as well as Marco Russo’s and many other’s have been life savers many many times for me.

    I have a question regarding specifically the subject of this post.

    I have a cube with 3 measure groups, all partitioned by a “snapshot” dimension.
    My fact data is versionned and so have EFFECTIVE_DATE and EXPIRY_DATE.
    Snapshots are set dates of effectivity for my fact data. This way the user can compare what was effective today (“CurrentSnapshot”) versus a week ago (“CurrentSnapshot-7”)

    The 3 measure groups are Facts for “Reservation Sales” (really an aggregated view of detailed sales), “Reservation Item Sales” (Sales for individual Reservation Items) and “Reservation Detailed Sales” (Sales for each reservation – no Reservation Item Breakdown – by type of sales: Airport Taxes, Commission, etc)

    Partitions on snapshots are made so that I have x amount of “Moving Snapshots”: CurrentSnapshot (today), CurrentSnapshot-1 (yesterday), CurrentSnapshot-7(last week) and so on.

    I also have “fixed” snapshots which contain fact data that is effective at the 1st of each month. It is concerning what is happening to these partitions that I have questions.

    The data in these fixed partitions never change and so we do not process them in our SSIS Script (based on one of your post on processing through Script Task and AMO 🙂 )

    However, even if we do not process these partitions, they still get “processed” in some form or another. I’m guessing they have their indexes dropped since the time taken for each fixed partition is small (between 5 and 15 seconds). The thing is that we now have 22 fixed partitions x 3 measure groups making 66 partitions. If each of these take 10 seconds, we easily have over 10 minutes of processing on these partitions.

    Key members for each dimension related to these measure groups do not change and do not get deleted. However, it is very possible that another attribute in the dimension gets changed (All dimensions are SCD1).

    And my question 🙂
    Since the key member attributes do not change and do not get deleted, aren’t the fixed partitions supposed to stay exactly the same when we “ProcessUpdate” the cube’s dimensions? Are are the indexes being dropped (assuming that this is what’s happenning)? Or is it normal that ALL partitions get touched when a process update occurs on a dimension ?

    Tanks in advance for taking the time to reply! Let me know if you need more details

      1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
        Chris Webb says:

        If there are no aggregations, then it must be the indexes of the attributes where data has changed that are being rebuilt. Some things to try to improve processing performance would be to set the RelationshipType property of your attribute relationships to Rigid where appropriate, setting the AttributeHierarchyOptimizedState to NotOptimized on attributes that won’t be used for querying often (or even better, setting AttributeHierarchyEnabled to false or even deleting the attributes), and everything Andrew Calvett recommends here:
        http://blog.calvett.co.uk/2013/02/09/the-anatomy-of-a-process-update-part-1/
        http://blog.calvett.co.uk/2013/02/21/the-anatomy-of-a-process-update-part-2/

      2. *Curiously, I could not reply to your reply so I am doing it here*

        Thanks! That helps alot!

        We have dimensions that come from data in MDS and I know that one user is changing some attributes in a few entities and eventually those are brought back to our dimensions so this makes a lot of sense to me now. I was Under the impression that indexes were only for leaf levels attributes and then any index for rollups were managed from inside the dimensions. I never really looked at all the internal files for SSAS. I see that I have a lot of .map files (81 000) so as I understand it now, I have a great potential of having to recalculate the indexes. Especially since the dimensions are SCD1.

        I have also come upon the “Delete Unused Indexes” feature from BIDS Helper that I never gave much attention to. I am running a SSSAS Query Trace for a week and then I will run it to see what that wizard suggests I set NotOptimized.

        Thanks again for replying so quickly!

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      No, there’s no need to do a Process Index on a dimension after a Process Update.

  5. Hi Chris,
    For processing dimensions and Fact data we have steps in Sql Server Agent Job.
    1) First Dimensions are processed with “Process Update”
    2) Next Step is processing only one particular partition with “Process Full” option. Though we have more than 5 partitions only one is processed.
    So my doubt here is whether the Indexes will be recreated when Partition is processed with “Process Full” option.

    Currently we are facing Performance issues when cube is queried by users.
    So could you please suggest whether the processing is fine or do we need to change the processing here?
    Many Thanks in advance

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      Hi Thiyag, indexes are created when you do a Process Full. However doing a Process Update on a dimension may also delete indexes and aggregations on other partitions, not just the one you are processing. You need to add a third step here and do a Process Default on the entire cube to make sure all indexes and aggregations are built.

      1. Hi Chris,
        So it should be like process update on dimensions
        Processfull on the partition of measure group
        ProcessDefault on the entire cube?

      2. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
        Chris Webb says:

        I recommend a ProcesDefault on the entire database, not just the cube.

Leave a ReplyCancel reply