MDX

Avoiding an expensive TOPCOUNT

An interesting MDX problem came up the other day, the solution to which I thought would be worth sharing. It involved a query which for a set of months showed the top n products plus a member which showed the sum of all other products outside the top n – the Foodmart 2000 equivalent is something like this:

WITH
SET MYDATES AS ‘{[Time].[1997].[Q1].[1]:[Time].[1997].[Q2].[5]}’
MEMBER PRODUCT.OTHERS AS ‘([Product].[All Products], TIME.CURRENTMEMBER) – SUM(TOPCOUNT([Product].[PRODUCT NAME].MEMBERS, 3, (MEASURES.[UNIT SALES], TIME.CURRENTMEMBER)), (MEASURES.[UNIT SALES], TIME.CURRENTMEMBER))’
SELECT MEASURES.MEMBERS ON 0,
GENERATE(
MYDATES,
CROSSJOIN(
{TIME.CURRENTMEMBER},
{[Product].[All Products], TOPCOUNT([Product].[PRODUCT NAME].MEMBERS, 3, (MEASURES.[UNIT SALES], TIME.CURRENTMEMBER))
, PRODUCT.OTHERS}
) )
ON 1
FROM SALES

The question was, could the query be optimised? What struck me was that in the query the top n was worked out in two places: first in the rows axis, then in the calculated member. And doing a TOPCOUNT is usually a very expensive operation. So how could you make sure you only did it once? At first I thought you might be able to declare a set containing the top n inside the GENERATE statement and then reference it from the calculated member, but I couldn’t make that work. String manipulation turned out to be the key and the Foodmart 2000 version of the optimised query is as follows:

WITH
SET MYDATES AS ‘{[Time].[1997].[Q1].[1]:[Time].[1997].[Q2].[5]}’
MEMBER MEASURES.MYTOPSTR AS ‘SETTOSTR(TOPCOUNT([Product].[PRODUCT NAME].MEMBERS, 3, (MEASURES.[UNIT SALES], TIME.CURRENTMEMBER)))’
MEMBER PRODUCT.OTHERS AS ‘[Product].[All Products] – SUM(STRTOSET(CSTR(MEASURES.MYTOPSTR)), (MEASURES.[UNIT SALES], TIME.CURRENTMEMBER))’
SELECT MEASURES.MEMBERS ON 0,
GENERATE(
MYDATES,
CROSSJOIN(
{TIME.CURRENTMEMBER},
{[Product].[All Products],
STRTOSET(CSTR(MEASURES.MYTOPSTR))
, PRODUCT.OTHERS}
) )
ON 1
FROM SALES

What I did was create a calculated measure which returned the top n set in string form; using STRTOSET you could then reference it from both the rows axis and the OTHERS calculated member. The reason this second query is faster than the first – and the guy who came to me with the problem says that the second version runs in almost a quarter of the time – is that the values returned by calculated members are cached so the second time that Measures.MyTopStr is called it doesn’t need to do the topcount again.

The one drawback to this approach is something that is common to all MDX that relies on STRTOSET and other similar functions – if you are trying to turn a very large set into a string, you may exceed the maximum string length and get an error. However if, as in this example, you’re only dealing with sets containing a few members you’ll never run into this problem. 

One thought on “Avoiding an expensive TOPCOUNT

  1. Hi Chris,Don\’t know whether you\’ve been following this thread on the SQL Server OLAP newsgroup, but I tried to create a calculated measure, to capture a set as a string. I used the AdventureWorks DW cube, but the query timing slowed greatly versus a Named Set (which was working when declared within the query). Any ideas on improving the performance, or other approaches?Thanks, – Deepakhttp://groups.google.com/group/microsoft.public.sqlserver.olap/browse_frm/thread/51673105feb39eaf/9d9bb820e3d6ad2c#9d9bb820e3d6ad2c>>Newsgroups: microsoft.public.sqlserver.olap From: "alanr" <alanrl…@yahoo.com>Date: 13 Oct 2005 09:06:25 -0700 Subject: visual rank calculated member on server is very very slow Greetings! I am struggling with the performance of a visual ranking calculation defined on the server. I based this on posts from Tom Chester and others. I am using the September CTP (which as a side note seems to have eliminated the need to do order the set before ranking?). Excel 2003 pivot table is client. Note that the member won\’t work with OWC because the axis doesn\’t have the measure on it. …>>

Leave a ReplyCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.