Using Linear Regression to Calculate Growth

A few blog entries back I showed the MDX I used to calculate a seasonally-adjusted total in my chalk talk at the BI conference. This is useful but if we’re looking for a calculation that we can use for the Trend property of a KPI it’s not the whole story – we still need to find a way of expressing how much a value is growing or shrinking over time. Although previous period growth calculations are a lot more useful with seasonally-adjusted values, we can use simple linear regression (and it has to be simple because, as I said, I’m no statistician) to do a better job.

The starting point for understanding how to use linear regression in MDX is (surprise, surprise) Mosha’s blog entry on the subject:

http://sqljunkies.com/WebLog/mosha/archive/2004/12/21/5689.aspx

However, the function that’s going to be most useful here is the linregslope function. If we’re looking at the values in our time series and trying to find a line of best fit for those values with the equation y=ax+b, linregslope returns the value of a in that equation, ie the gradient – when the value of x increases by 1, y increases by the value of a. Here’s an example of how to use it:

with member measures.gradient as
linregslope(
lastperiods(3, [Date].[Calendar].currentmember) as last3
, [Measures].[Internet Sales Amount]
,rank([Date].[Calendar].currentmember, last3)
)

select {[Measures].[Internet Sales Amount], measures.gradient} on 0,
[Date].[Calendar].[Month].members on 1
from [Adventure Works]

The trick with using this function in MDX with a time series is to be able to work out what values you want to pass in for the x axis. Here I’ve used the lastperiods function to get a set containing the current member on the Calendar hierarchy, the previous member on the Calendar hierarchy and the member before that, in the first parameter of the function; at the same time I’ve declared a named set and then used that with a rank function in the third parameter to return the values 1, 2 and 3 for each of these three members.

This gets us the slope, then, but I was thinking it would be better to express this value as a percentage – but of what? The current period’s value? Or one of the preceding two periods values? I have to admit I don’t know which would be correct. Can someone help me out here? Please leave a comment..

Modelling Goals and Thresholds in Measure Groups

Before I carry on with my chalk talk series, I have to own up to something: I didn’t actually want to present on the topic of KPIs, and when I found out that I was going to have to talk on the subject I fired off a few emails to people who spend more time with KPIs than I do to ask them if they could suggest some interesting things to talk about. One of these people was Nick Barclay, co-author of ‘The Rational Guide to Business Scorecard Manager 2005’ (which I shall be reviewing very soon – it’s a good book), and he pointed out that while all the examples of KPIs he’d seen hard-coded goals and thresholds into the MDX code this was not a good thing – users want to change their values all the time and ideally you’d want to be able to let them do this themselves. Why not store these values in a measure group, allow users to change the values using writeback, and then use these values within the KPI definition somehow?

Actually modelling how the values should be stored in measure groups was very straightforward. In my demo I showed two fact tables, one for the Goals and one for the Thresholds, with one measure each. I also created a KPI dimension for both of them to allow multiple goals and thresholds for different KPIs to be stored in the same measure group; for the Goal measure group I added the Date dimension at the granularity of Calendar Year (so there was a column in the fact table containing year names) and for the Threshold fact table I also created a Threshold dimension. This Threshold dimension contained one member for each threshold to be used: Very Bad, Quite Bad, OK, Quite Good and Very Good; there was also a numeric column containing the values -1, -0.5, 0, 0.5 and 1 which represented the numeric values each threshold gets normalised to and which I assigned to the ValueColumn property of my sole attribute when I built the dimension.

One this was done and the measure groups were added to the Adventure Works cube, I showed some ways to allocate the Goal values down from the Year granularity at which they were stored. Here’s the scoped assignment for the simple allocation which simply splits the values equally by the number of time periods in the year, so for example each month shows 1/12 of the year total:

SCOPE([Measures].[Goal]);
SCOPE([Date].[Calendar Semester].[Calendar Semester].MEMBERS, [Date].[Date].MEMBERS);
THIS=[Measures].[Goal]/
COUNT(
DESCENDANTS(
ANCESTOR([Date].[Calendar].CURRENTMEMBER,[Date].[Calendar].[Calendar Year])
, [Date].[Calendar].CURRENTMEMBER.LEVEL
)
);
END SCOPE;
END SCOPE;

Here’s the code for doing the weighted allocation by the previous year’s Internet Sales Amount measure values:

SCOPE([Measures].[Goal]);
SCOPE([Date].[Calendar Semester].[Calendar Semester].MEMBERS, [Date].[Date].MEMBERS);
THIS=[Measures].[Goal]
*
(
(PARALLELPERIOD([Date].[Calendar].[Calendar Year],1,[Date].[Calendar].CURRENTMEMBER), [Measures].[Internet Sales Amount])
/
(ANCESTOR([Date].[Calendar].CURRENTMEMBER,[Date].[Calendar].[Calendar Year]).PREVMEMBER, [Measures].[Internet Sales Amount])
);
END SCOPE;
END SCOPE;

A few things to note here:

  • In both cases, because I’ve set IgnoreUnrelatedDimensions to false on the measure group, to get the year’s Goal measure value I can simply reference [Measures].[Goal] – the values for the year are copied down automatically to the attributes below the granularity attribute.  
  • Although normally when you assign a value to a regular measure with an additive aggregation function the assigned value gets aggregated up, when you assign to a regular measure below the granularity attribute of a dimension no aggregation happens, similar to what you get with a calculated measure.
  • The assignment SCOPE([Date].[Calendar Semester].[Calendar Semester].MEMBERS, [Date].[Date].MEMBERS) means ‘scope on everything from the Date attribute (I’ve included the whole attribute here, All Member and the leaf level – everything on a dimension exists with either the All Member or the leaf members of the key attribute) up to and including the Calendar Semester attribute but no higher’.

Moving onto the thresholds, we need to find a way to apply the threshold values we’ve got in our measure group to the measure we’re interested in. Here’s a calculated member definition that does this:

CREATE MEMBER CurrentCube.[Measures].[Internet Sales To Goal Status] AS
TAIL(
{[Threshold].[Threshold].&[1],
FILTER(
[Threshold].[Threshold].&[2]:null
, ([Measures].[Threshold],[KPI].[KPI].&[1])
<
([Measures].[Internet Sales Amount]/[Measures].GOAL)
)}
,1).ITEM(0).MEMBERVALUE
;

What I’m doing here is creating a set that always contains the first member of the Threshold dimension, the ‘Very Bad’ member, and then filtering on the set that contains every other threshold to return the members for whom the threshold measure is less than the value of Internet Sales Amount. I then get the last member in that set, which represents the threshold with the highest value that is less than Internet Sales Amount, and use the MemberValue function to get the normalised value (the value between -1 and 1) that I assigned to that member.

BIDSHelper beta released

Via Darren Gosbell, news that the beta of BIDSHelper has been released:
 
You can download BIDSHelper from Codeplex here:
 
BIDSHelper contains so many cool features that make your life as an AS developer easier it’s untrue. The MDX Script updating (bypassing the sometimes 30 minutes waits I’ve sometimes seen) feature is worth the download alone. Kudos to Darren and Greg Galloway for their work on this.