Over the last few days I’ve been involved in an interesting thread on the SSAS forum regarding what happens when you do a Process Update on a dimension. It’s a topic that is not widely understood, and indeed I’ve not known all the details until today, but it’s nonetheless very important: one of the commonest performance-related problems I see in my consultancy work is partitions that have aggregations designed for them, but where those aggregations aren’t in a processed state because a Process Update has been run on one or more dimensions. Anyway, just now Akshai Mirchandani from the dev team posted a really good overview of what actually happens when you run a Process Update on that thread, so I thought I’d copy here to ensure it gets a wider audience:
Here is a quick summary of what happens when you do ProcessUpdate:
1. After the dimension has been updated, the server analyzes the changes that occurred to the dimension. In 2005, this analysis was pretty simple and would often incorrectly detect that major changes had occurred that required clearing of indexes and aggregations. In 2008, this code was improved such that it more often would realize that nothing significant has occurred. It’s a fairly small (but useful) optimization — I guess nobody thought it was worth documenting!
2. Based on this analysis, the server will decide whether or not indexes and aggregations need to be cleared. If no (e.g. because records were only added and not deleted/updated), then the partitions won’t be affected.
3. If indexes/aggregations need to be cleared, then the server will check if ProcessAffectedObjects was enabled — if yes, then instead of clearing the indexes/aggregations it will rebuild the indexes/aggregations.
4. The act of clearing the indexes/aggregations also shows up as "partition processing operations" in Profiler — that’s one of the things that has been confusing some of you.
5. When aggregations are cleared, only the flexible aggregations need to be cleared because we’re guaranteed by the rigid relationships that the members cannot have moved and therefore the rollups cannot have changed. However, indexes can still have changed and therefore you may still see the partition processing jobs kick off to clear the indexes.
6. ProcessIndexes and ProcessClearIndexes take care of building both bitmap indexes (aka map) and aggregations — the context is that both aggregations and bitmap indexes are generically considered "indexes".
Really the main takeaway here is that if you ProcessUpdate a dimension, you should strongly consider either doing ProcessAffectedObjects or an explicit ProcessIndexes on the affected partitions so that bitmap indexes and flexible aggregations get rebuilt. The advantage of explicitly doing ProcessIndexes is that you can bring your cube online earlier and have the indexes/aggregations get processed more lazily in the background — a number of customers prefer to do that because their processing windows are too small to wait for the indexes to get processed.
Also related to this topic, I thought I’d also highlight a great post by Darren Gosbell where he shows how to find out if your aggregations are processed or not:
http://geekswithblogs.net/darrengosbell/archive/2008/12/02/ssas-are-my-aggregations-processed.aspx
Hi I have deployed PPS2007 in a test lab. All went well and without issues. From the Administration Console I configured everything without issues. However, when I went to Excel and tried to use the Report Wizard to create a report, I am asked to select the required model which I do without issues. Then I click Next, where I am asked to select the dimension and Hierarchy which I select OK. Then when I click the memeber selection field I get an error as follows: "Error getting the database context for the model site[2035d974-eeea-4143-9231-7d76089fb68f]can someone help please?regards
Hi Hassan,This isn\’t really the right place for this question. Try posting on http://social.technet.microsoft.com/Forums/en-US/ppsmonitoringandanalytics/threadsChris
Hi, Chris. I can not understand what will happen if ProcessUpdate mode selected when a fact table contains missed dimension members.
Can you explain what you mean exactly?
Would setting the processing mode to lazyaggregations for measure groups related to a dimension that gets the processupdate command be helpful?
It would mean that there’s one less thing to have to do explicitly, but in general I don’t like the idea of lazy aggregations – I like to be in control of what’s happening and I don’t like the idea of unpredictable performance.
I can relate, thx
Can you run Process Cube Default to rebuild any of the aggregations or indexes that were dropped as a result of the dimension update process?
Yes, you can.
Hi Chris,
First of all, thank you for all your very helpfull articles! Yours as well as Marco Russo’s and many other’s have been life savers many many times for me.
I have a question regarding specifically the subject of this post.
I have a cube with 3 measure groups, all partitioned by a “snapshot” dimension.
My fact data is versionned and so have EFFECTIVE_DATE and EXPIRY_DATE.
Snapshots are set dates of effectivity for my fact data. This way the user can compare what was effective today (“CurrentSnapshot”) versus a week ago (“CurrentSnapshot-7”)
The 3 measure groups are Facts for “Reservation Sales” (really an aggregated view of detailed sales), “Reservation Item Sales” (Sales for individual Reservation Items) and “Reservation Detailed Sales” (Sales for each reservation – no Reservation Item Breakdown – by type of sales: Airport Taxes, Commission, etc)
Partitions on snapshots are made so that I have x amount of “Moving Snapshots”: CurrentSnapshot (today), CurrentSnapshot-1 (yesterday), CurrentSnapshot-7(last week) and so on.
I also have “fixed” snapshots which contain fact data that is effective at the 1st of each month. It is concerning what is happening to these partitions that I have questions.
The data in these fixed partitions never change and so we do not process them in our SSIS Script (based on one of your post on processing through Script Task and AMO 🙂 )
However, even if we do not process these partitions, they still get “processed” in some form or another. I’m guessing they have their indexes dropped since the time taken for each fixed partition is small (between 5 and 15 seconds). The thing is that we now have 22 fixed partitions x 3 measure groups making 66 partitions. If each of these take 10 seconds, we easily have over 10 minutes of processing on these partitions.
Key members for each dimension related to these measure groups do not change and do not get deleted. However, it is very possible that another attribute in the dimension gets changed (All dimensions are SCD1).
And my question 🙂
Since the key member attributes do not change and do not get deleted, aren’t the fixed partitions supposed to stay exactly the same when we “ProcessUpdate” the cube’s dimensions? Are are the indexes being dropped (assuming that this is what’s happenning)? Or is it normal that ALL partitions get touched when a process update occurs on a dimension ?
Tanks in advance for taking the time to reply! Let me know if you need more details
I forgot to mention that the fixed partitions do not have aggregations defined.
If there are no aggregations, then it must be the indexes of the attributes where data has changed that are being rebuilt. Some things to try to improve processing performance would be to set the RelationshipType property of your attribute relationships to Rigid where appropriate, setting the AttributeHierarchyOptimizedState to NotOptimized on attributes that won’t be used for querying often (or even better, setting AttributeHierarchyEnabled to false or even deleting the attributes), and everything Andrew Calvett recommends here:
http://blog.calvett.co.uk/2013/02/09/the-anatomy-of-a-process-update-part-1/
http://blog.calvett.co.uk/2013/02/21/the-anatomy-of-a-process-update-part-2/
*Curiously, I could not reply to your reply so I am doing it here*
Thanks! That helps alot!
We have dimensions that come from data in MDS and I know that one user is changing some attributes in a few entities and eventually those are brought back to our dimensions so this makes a lot of sense to me now. I was Under the impression that indexes were only for leaf levels attributes and then any index for rollups were managed from inside the dimensions. I never really looked at all the internal files for SSAS. I see that I have a lot of .map files (81 000) so as I understand it now, I have a great potential of having to recalculate the indexes. Especially since the dimensions are SCD1.
I have also come upon the “Delete Unused Indexes” feature from BIDS Helper that I never gave much attention to. I am running a SSSAS Query Trace for a week and then I will run it to see what that wizard suggests I set NotOptimized.
Thanks again for replying so quickly!
Hi Chriss, do you recommand doing proces index on dimension after dimension update?
No, there’s no need to do a Process Index on a dimension after a Process Update.
Hi Chris,
For processing dimensions and Fact data we have steps in Sql Server Agent Job.
1) First Dimensions are processed with “Process Update”
2) Next Step is processing only one particular partition with “Process Full” option. Though we have more than 5 partitions only one is processed.
So my doubt here is whether the Indexes will be recreated when Partition is processed with “Process Full” option.
Currently we are facing Performance issues when cube is queried by users.
So could you please suggest whether the processing is fine or do we need to change the processing here?
Many Thanks in advance
Hi Thiyag, indexes are created when you do a Process Full. However doing a Process Update on a dimension may also delete indexes and aggregations on other partitions, not just the one you are processing. You need to add a third step here and do a Process Default on the entire cube to make sure all indexes and aggregations are built.
Hi Chris,
So it should be like process update on dimensions
Processfull on the partition of measure group
ProcessDefault on the entire cube?
I recommend a ProcesDefault on the entire database, not just the cube.