Optimising GENERATE() type operations

I need to get back to answering more questions on newsgroups – it’s the best way of learning, or at least remembering stuff you’ve learnt in the past and since forgotten. Take, for instance, the following thread I was involved with today:
 
It reminded me of some very similar queries I worked on a few years ago, and although the example in the thread above is on AS2K the techniques involved are still relevant on AS2005. Take the following Adventure Works query, which is an approximation of the one in the thread:

WITH

SET MYROWS AS

GENERATE

(

NONEMPTY

([Customer].[Customer Geography].[Full Name].MEMBERS, [Measures].[Internet Sales Amount])

,TAIL(

NONEMPTY([Customer].[Customer Geography].CURRENTMEMBER * [Date].[Date].[Date].MEMBERS, [Measures].[Internet Sales Amount])

,1)

)

SELECT

[Measures].[Internet Sales Amount] ON 0,

MYROWS ON 1

FROM

[Adventure Works]

 

What we’re doing here is finding the last date that each customer bought something. Using the TAIL function within a GENERATE might be the obvious thing to do here, but in fact it isn’t the most efficient way of solving the problem: on my machine, with a warm cache, it runs in 16 seconds whereas the query below which does the same thing only takes 6 seconds:

WITH SET MYROWS AS

FILTER(

NONEMPTY

(

[Customer].[Customer Geography].[Full Name].MEMBERS

* [Date].[Date].[Date].MEMBERS

, [Measures].[Internet Sales Amount])

AS

MYSET,

NOT(MYSET.CURRENT.ITEM(0) IS MYSET.ITEM(RANK(MYSET.CURRENT, MYSET)).ITEM(0))

)

SELECT [Measures].[Internet Sales Amount] ON 0,

MYROWS ON 1

FROM

[Adventure Works]

What I’m doing differently here is rather than iterating through each Customer finding the set of dates when each Customer bought something and then finding the last one, I’m saying give me a set of tuples containing all Customers and the Dates they bought stuff on and then using a FILTER to go through and find the last Date for each Customer by checking to see if the Customer mentioned in the current tuple is the same as the Customer in the next tuple in the set – if it isn’t, then we’ve got the last Date a Customer bought something. Obviously operations like this within a GENERATE are something to be avoided if you can.

One thought on “Optimising GENERATE() type operations

  1. And here is small improvement over this technique – it eliminates call to Rank:
     

    WITH
    SET MYROWS AS
    Filter
    (
    NonEmpty
    (
    [Customer].[Customer Geography].[Customer].MEMBERS
    *
    [Date].[Date].[Date].MEMBERS

    ,[Measures].[Internet Sales Amount]
    ) AS MYSET
    ,(NOT
    MYSET.Current.Item(0) IS MYSET.Item(MYSET.CurrentOrdinal).Item(0))
    )
    SELECT
    [Measures].[Internet Sales Amount] ON 0
    ,MYROWS ON 1
    FROM [Adventure Works];

Leave a ReplyCancel reply