Optimising Distinct Count

Great blog entry from Denny Lee on optmising distinct count measures here:
 
I’d seen a ppt deck of a presentation he did at PASS a few years ago where he first set out these ideas, but it looks like he’s now updated them for AS2005.
 
On a similar topic, I’ve been working with a customer recently where we needed to optimise a calculated measure which did a count on a distinct list of customers (>1 million in the dimension) who met a certain criteria. The filter couldn’t be designed into the cube or dimension completely so it had to be done with a filter function, and similar to what Denny describes here I was seeing in Profiler that the data was being read of the disk pretty quickly but it was the filter/count that was taking the time because only one CPU was being utilised. Unfortunately I found out that there’s no way to design parallelism into the formula engine in the same way that Denny’s technique above designs parallelism into the work the storage engine has to do – apparently AS prevents two queries executing at the same time unless one of them is a Cancel, and despite experimenting with sprocs to do this I had no luck because none of the code in the formula engine is thread safe. I was able to pass the raw data into a sproc using SetToArray and do the work outside the formula engine but that performed even slower. Hohum…

Leave a ReplyCancel reply