Tuning SSRS-Generated MDX Parameter Queries

Sometimes you’ll find yourself in the position of building SSRS reports where you have parameters with a large number of available values. Using the Adventure Works cube as an example, if you were to drag the Customer attribute from the Customer dimension onto the filter area of the Query Designer for a simple query and check the Parameters box like so:

image 

…you’d end up with a parameter where you can choose any customer to filter on – and there are 18485 customers on that hierarchy.

If you right-click on your data source in the Report Data pane (in BIDS in SSAS 2008) and check the Show Hidden Datasets option, you can see the MDX query that BIDS generates to return the list of available values for the parameter query:

image 

Here’s what the query will look like for the Customers hierarchy for the Customer dimension:

WITH
MEMBER [Measures].[ParameterCaption]
AS [Customer].[Customer].CURRENTMEMBER.MEMBER_CAPTION
MEMBER [Measures].[ParameterValue]
AS [Customer].[Customer].CURRENTMEMBER.UNIQUENAME
MEMBER [Measures].[ParameterLevel]
AS [Customer].[Customer].CURRENTMEMBER.LEVEL.ORDINAL
SELECT
{[Measures].[ParameterCaption]
, [Measures].[ParameterValue]
, [Measures].[ParameterLevel]}
ON COLUMNS
, [Customer].[Customer].ALLMEMBERS
ON ROWS
FROM [Adventure Works]

On my laptop this query executes in just over 1.5 seconds. Not bad, you might think, for a query that returns a fairly large number of rows. But we can do better!

This query returns all customers on rows and three columns: the caption of each member, the unique name, and the ordinal of the level (which is used for indenting the caption of each member in the dropdown list for the parameter, so you can easily distinguish between members on different levels). These values are returned as calculated members, but they can also be obtained as member properties and this is the key to tuning the query. So, if you create a new OLEDB connection to the cube (ie you don’t use the built-in Analysis Services connection type but you create a data source that connects to the cube using the OLEDB connection type)…

image

…and then create a new dataset with the following query:

WITH
MEMBER MEASURES.DUMMY AS NULL
SELECT
{MEASURES.DUMMY}
ON COLUMNS
, [Customer].[Customer].ALLMEMBERS
DIMENSION PROPERTIES UNIQUE_NAME, MEMBER_CAPTION, LEVEL_NUMBER
ON ROWS 
FROM [Adventure Works]
CELL PROPERTIES VALUE

You’ll find you get almost exactly the same data back, although this time the query returns in around 0.2 seconds.

A few things need to be noted here. First of all, although MDX allows you to put an empty set on columns, in the query above I had to create a dummy calculated measure that returned null because otherwise the query didn’t return any rows from the OLEDB data source. Secondly, in the new query the All Member unique name and caption come out as nulls – that’s normal behaviour for flattened rowsets (which is what you get when you run queries through an OLEDB connection), unfortunately, and again something we’re going to have to work around ourselves. Thirdly, we also need to create a column with indented member names – the original parameter dataset did this using a SSRS expression in a calculated field – although in this case, where there’s only one level underneath the all member, we could probably skip this and not hurt usability.

To trap the nulls and make sure the All Customers member appears as a parameter option, you can use a calculated field on the new dataset with an expression like this:

=iif(
Fields!Customer_Customer_Customer_UNIQUE_NAME.Value is Nothing
, "[Customer].[Customer].[All Customers]"
, Fields!Customer_Customer_Customer_UNIQUE_NAME.Value)

And to generate the indented captions you can use an expression like this:

=iif(
Fields!Customer_Customer_Customer_UNIQUE_NAME.Value is Nothing
, "All Customers"
, " " + Fields!Customer_Customer_Customer.Value)

You then need to delete the original parameter dataset, point the report parameter to the new dataset and bind these two columns to it value and label fields. And lo and behold, you have a report that runs just over a second faster than it did before. This might seem like a lot of hassle to go through for such a small gain, but if you have more than one large parameter the time savings will add up and your users will notice the difference.

BI User Group, London, May 20th

Just a quick post to say I’ll be hosting and speaking at a BI user group event in London on May 20th. My session will be on ‘Implementing Common Business Calculations in DAX” and there’ll be one other session from someone still to be confirmed. More details and registration here:

http://sqlserverfaq.com/events/223/Business-Intelligence-Implementing-common-business-calcs-using-DAX-in-PowerPivot-Chris-Webb.aspx

Counting Returning Customers in DAX

As promised, I’m going to be putting up a few more DAX examples here – although they may not demonstrate any new and interesting concepts, I’m still learning the language and blogging is a good way for me to practise implementing common calculations.

So, today’s problem is: how do we find the number of distinct customers who bought something in the current time period and who have also bought something in the past? That’s to say we want to count the number of returning customers, as opposed to completely new customers who have never bought anything from us before. This is, basically, a variation on the problem of how to get a distinct count in DAX that Marco has already dealt with comprehensively here, but with aspects of a time intelligence calculation (a topic which is well covered here). I’ll be using two tables from Adventure Works to illustrate this: FactInternetSales and DimDate.

What we need to start off with is find the set of distinct customers who bought something in the current time period. The following simple DAX expression finds this set and gives me the distinct count:

=COUNTROWS(DISTINCT(FactInternetSales[CustomerKey]))

We now need to filter these customers so that we only return the ones who bought something in the range of dates from the very first date we have data for, up to the day before the first date in the current time range. This can be accomplished with the DatesBetween, FirstDate and DateAdd functions. With the DatesBetween function, if you pass a Blank value to the first parameter it will give you the first date you have data for as your start date (see here for another example of this); for the end date in the range, we find the first date in the current date range with FirstDate, then get the day before using DateAdd:

DATESBETWEEN(
DimDate[FullDateAlternateKey]
,BLANK()
, DATEADD(FIRSTDATE(DimDate[FullDateAlternateKey]), -1, DAY)
)

We then need to use this date range inside the Filter function as follows:

=CALCULATE(COUNTROWS(DISTINCT(FactInternetSales[CustomerKey]))
, FILTER(DISTINCT(FactInternetSales[CustomerKey])
, CALCULATE(SUM(FactInternetSales[SalesAmount]),
DATESBETWEEN(DimDate[FullDateAlternateKey]
,BLANK()
, DATEADD(FIRSTDATE(DimDate[FullDateAlternateKey]), -1, DAY))
, ALL(DimDate))>0))

However, there’s one last problem to solve: on the first date in the time dimension using the DateAdd function in this way will return a Blank value, and using Blank as both a start and an end date will result in us getting all the dates in the column back from the DatesBetween function. This means that for this first date we get the same value as the distinct count because we are finding the set of all customers who bought something on that first date and applying a filter to see whether these customers bought something on any date, which of course they did:

image

So we need to use the IF function to check if using DateAdd to get the previous date in this way returns a Blank, and if it does to return a Blank. Here’s the final expression:

=IF(
DATEADD(FIRSTDATE(DimDate[FullDateAlternateKey]), -1, DAY)=BLANK()
, BLANK()
,CALCULATE(COUNTROWS(DISTINCT(FactInternetSales[CustomerKey]))
, FILTER(DISTINCT(FactInternetSales[CustomerKey])
, CALCULATE(SUM(FactInternetSales[SalesAmount]),
DATESBETWEEN(DimDate[FullDateAlternateKey]
,BLANK()
, DATEADD(FIRSTDATE(DimDate[FullDateAlternateKey]), -1, DAY))
, ALL(DimDate))>0)))

image

One last point: even though I’ve written this expression on the RC version of PowerPivot, I still need to use ALL(DimDate) on the last line I am joining FactInternetSales and DimDate using the OrderDateKey column, which is an integer surrogate key. This wouldn’t be necessary if I was joining the two tables using a datetime column. Personally I think this is going to confuse no end of people and is a bit rubbish – especially because, in many data warehouses, the only way to join a fact table and a time dimension table is using an integer surrogate key; even worse, if you try to bring a datetime column from a time dimension table down onto the fact table using a calculated column, you’ll get a circular reference error. Hopefully this will get fixed in the next release…

UPDATE: I’ve found a better way of doing this in DAX. Instead of using the Filter function, which can be a bit slow, you can just use the filter arguments of the Calculate function itself. Here’s the new version:

=IF(
CALCULATE(COUNTROWS(), DATEADD(FIRSTDATE(DimDate[FullDateAlternateKey]), -1, DAY))=0
, BLANK()
, CALCULATE(
COUNTROWS(DISTINCT(vwFactInternetSalesWithDate[CustomerKey]))
,DISTINCT(vwFactInternetSalesWithDate[CustomerKey])
, DATESBETWEEN(DimDate[FullDateAlternateKey]
,BLANK()
, DATEADD(FIRSTDATE(DimDate[FullDateAlternateKey]), -1, DAY)
)))

Here, the two filter arguments I’m using are:

  1. DISTINCT(vwFactInternetSalesWithDate[CustomerKey]), to filter by the customers who bought something in the current time period, and
  2. DATESBETWEEN(DimDate[FullDateAlternateKey], BLANK(),  DATEADD(FIRSTDATE(DimDate[FullDateAlternateKey]), -1, DAY)), to filter by the set of all dates up to the day before the first date in the current period

ANDing these two arguments together gives us a context that contains only rows that have a customer that bought in the current time period and dates up to the current time period. It’s a little harder to understand what’s going on here but a lot more elegant, I think.

 

%d bloggers like this: