I’ve spent the last week working with a fairly large cube – several fact tables have hundreds of millions of rows in – and, as always on an SSAS project, there are occasions when you have no choice but to reprocess the whole cube. Sitting round waiting for processing to complete can be pretty tedious and, while I’m sure we’ve all worked out how to plan our work so that processing can take place overnight, during lunch or a meeting, it’s always good to know some tricks to make processing faster.
There’s a good set of tips on how to optimise processing in the SSAS Operations Guide, but this post is about something not mentioned in that white paper because it’s a scenario that shouldn’t happen in production – although it’s very common in development. In my case I’ve been building a cube on top of a half-finished data warehouse where the data is still quite dirty. That means that during processing there are lots of key errors, and while they get fixed I’m handling them in my processing by selecting the ‘Convert to Unknown’ option. What I’ve half-known for a long time and only properly investigated this week, though, is the impact that reporting these key errors can have on processing times.
Here’s an extreme example using Adventure Works. I’ve created a new cube based on the FactInternetSales table and added the Customer dimension to it, but deliberately joined the Customer dimension on to the Order Date column in FactInternetSales to simulate a lot of key errors.
If I then do a ProcessFull with the following error configuration:
ie with Key Error Action set to Convert To Unknown, Ignore Errors Count selected, and Key Not Found and Null Key Not Allowed both set to Report And Continue, when I do a full process all the errors are logged in the UI and it takes around 23 seconds to finish:
If, on the other hand, you set Key Not Found and Null Key Not Allowed to Ignore Error, as follows:
…a full process takes under a second:
Now this is an extreme example, of course, and in a production system you should not have any key errors anyway, and if you do you definitely do not want to ignore these key errors. But if you’re developing a cube and you know that key errors exist but you don’t care about them (for the moment), you can save a massive amount of time by not reporting these errors. In my case cube processing went down from around 3 hours to 45 minutes. Which is a good thing if you want to get on with your work faster, but I suppose is a bad thing if it interrupts your web browsing/coffee drinking/chatting or whatever you do to pass the time while processing’s taking place.