Uncategorized

Processing Dimensions By Attribute and By Table

It was recently pointed out to me (by Hugo Lopes and Paulo Faria of Enabler in Portugal, who I had the pleasure of working with recently) that setting the ProcessingGroup property of a dimension can have quite a big impact on the amount of time it takes to process a dimension. By default this property is set to By Attribute, which means that each attribute in the dimension is processed separately using its own SQL Select statement which returns the distinct values for each member on the attribute. Here’s an example query used to process an attribute based on the Adventure Works Product dimension:

SELECT
DISTINCT
[dbo_DimProduct].[Color] AS [dbo_DimProductColor0_0]
FROM [dbo].[DimProduct] AS [dbo_DimProduct]

If you set it to By Table, however, in some circumstances (when your dimension is built from a single table?) only one SQL Select statement per table in the dimension structure is issued, reading all the data in, with AS presumably finding the distinct values itself.

When do you use which setting? I suppose it depends on a lot of things (performance of your data source, size and structure of your dimension, whether you’re processing other objects at the same time) so it’s going to be one of those things you’ll have to experiment with. Obviously most dimensions process extremely quickly anyway, but if you’ve got a large dimension which you need to process as quickly as possible this might come in handy.

3 thoughts on “Processing Dimensions By Attribute and By Table

  1. Ive noticed that for new dimensions the default seems to be \’by attribute\’ but for dimensions converted from AS2000 using the upgrade wizard the setting seems to be \’by table\’.
     
    The side effect of this became visible in one large dimension (4 million rows) with many attributes 30+ which fails to process when set to \’by table\’.  Some time into the processing it suffers a write error in one of the temp files its processing and judging by the file growth of these it may be hitting an OS limition on file size.
     
    Processing these large dimensions with many attributes \’by attribute\’ does not appear to suffer this same limitation.
     
     

  2. Trying to resurrect an old post, but be careful when you set your processing by table if your dimension is build using a snowflake schema. Your processed dimension will not have any missing members, i.e. the table query will have inner joins only.
    Processing by attributes is essentially a logical equivalent to left outer joins in a table query.

Leave a ReplyCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.