My chalk talk yesterday at the BI Conference went…. ok. Unfortunately, despite having asked for projector so I could show my demo code it turned out the screen I got was pretty small so only the people in the front few rows could see anything. So I told everyone that I would put the code up on my blog so they could study the code in more detail and this is the first such post.

One of the points I wanted to make when discussing how to calculate trend values for KPIs in MDX is that we really need to think about the formulas we’re implementing to make sure they actually provide useful results. Unfortunately the business people we talk to often have even less idea than we do about what kind of calculations actually make sense: there’s a real need for input from someone with a statistical background, and I only wish I knew more about this subject. For example if your sales vary a lot by season (you might be selling ice cream, so sales are always going to be much higher in summer than in winter) there’s no point looking at previous period growth to determine whether your sales are growing or not because the strong seasonal component will mask the long term trend.

As I said, for this presentation I looked at a book I’ve had hanging around on my shelf for years which is basically a stats textbook for managers and MBA students: "Quantitative Methods for Decision Makers" by Mik Wisniewski. It explains some simple algorithms and techniques for analysing data that are very relevant for BI practitioners and I took the method for calculating a seasonally-adjusted total and translated it into an MDX query on Adventure Works as an illustration. Here it is:

with

–Calculate a 12-month moving average from 6 months up to and including

–the current member to 6 months after

member measures.movingaverage as

avg(

[Date].[Calendar].currentmember.lag(5)

:

[Date].[Calendar].currentmember.lead(6)

, [Measures].[Internet Sales Amount])

–Centre the underlying trend, by taking the average of the moving average

–for the current member and the next member

member measures.trend as

avg(

{[Date].[Calendar].currentmember,

[Date].[Calendar].currentmember.nextmember}

, measures.movingaverage)

–Find the deviation by dividing Internet Sales by Trend

member measures.deviation as

[Measures].[Internet Sales Amount] / measures.trend

–Find the average deviation for any given month

member measures.averagemonthlydeviation as

avg(

exists(

[Date].[Calendar].[Month].members

, {[Date].[Month of Year].currentmember})

, measures.deviation)

–Find the sum of these average deviations for each month

–then divide by 12 to find out by how much each month’s average deviation

–needs to be adjusted by

member measures.adjustedcomponent as

(sum(head([Date].[Calendar].[Month].members,12), measures.averagemonthlydeviation)

/ 12)

–Adjust the monthly deviation by the value we’ve just found

member measures.adjusteddeviation as

measures.averagemonthlydeviation / measures.adjustedcomponent

–Finally find the seasonally adjusted Internet Sales Amount

member measures.seasonallyadjustedsales as

[Measures].[Internet Sales Amount] / measures.adjusteddeviation

–Run the query

select {[Measures].[Internet Sales Amount], measures.trend, measures.deviation

, measures.averagemonthlydeviation,measures.adjustedcomponent

, measures.adjusteddeviation, measures.seasonallyadjustedsales}

on 0,

[Date].[Calendar].[Month].members

*

[Date].[Month of Year].[Month of Year].members

on 1

from

[Adventure Works]

This works on the following theory. What we want to do is take the Internet Sales Amount measure and remove the seasonal variations so we can view the underlying trend. If we assume that there are three factors in play with our sales, the long term trend, seasonal variations and random factors (for example a health scare in July could kill our ice cream sales that month) then we can say that Sales = Trend * Seasonal Variations * Random Factors. The random factors we can’t do anything about, but taking this formula we can say that Trend = Sales / (Seasonal Variations * Random Factors). To get here, first we estimate the trend by doing a twelve month moving average which hopefully will smooth out those seasonal variations; we then find the deviation from the average for each month and then find the average deviation for each month (eg I can say then that January sales are on average 10% lower than usual, Julys are on average 25% higher than usual). For a whole year we’d like these deviations to cancel each other out and add up to 12; of course they don’t so we now need to find what they do add up to then divide by 12, then divide each month’s average by this value. We can then take this value and divide Internet Sales Amount by it to find the seasonally-adjusted value, the combination of the trend and the random factors.

Hopefully I haven’t screwed up in my interpretation of how it works; it certainly isn’t the most robust or optimal bit of MDX I’ve written either but it shows what you could do if you wanted.

Good presentation, Chris. It was definitley more in depth than I expected. I heard alot of good opinions from others as well. I don\’t think anyone expected you to go into linear regression. Thanks for the code.

It\’s definitely a different tangent than I\’ve been working on sales normalization. I tend not to work in dollars because price changes, discounts, levels, location differentials all conspire to obscure the underlying unit flying the door. If you\’re at TechEd 2007, I would love to chat over a drink or at the TLC on my recent work using a Diff Eq model with exponential decay – Read: I do a ton of heavy lifting in the ETL phase with T-SQL to make some of these calculations not reside in MDX (which is painful from a computational time perspective) on negating seasonality.Thanks for the code – TTYL,Mark

Hi Mark,

I\’m not going to be at TechEd unfortunately, but I\’d be very interested in hearing more about what you do. I agree that MDX is not the place to do these calculations if you can help it.

Hi Chris,I tried to contact you via email but I am not sure if the message has arrived to you…I am trying to adapt your example to my Mondrian schema without lucky and I would thank you if you could help me with one of the steps:I use a cube with the next dimension -I have translated the name of the dimensions, etc. from spanish to english for better understanding-:<Dimension type="TimeDimension" name="Period"> <Hierarchy name="From 2004" hasAll="true" allMemberName="From 2004" primaryKey="id"> <Table name="dim_date"> </Table> <Level name="Year" column="year_name" ordinalColumn="year_id" type="String" uniqueMembers="true" levelType="TimeYears" hideMemberIf="Never"> </Level> <Level name="Quarter" column="quarter_name" ordinalColumn="quarter_id" type="String" uniqueMembers="false" levelType="TimeQuarters" hideMemberIf="Never"> </Level> <Level name="Month" column="month_name" ordinalColumn="month_id" type="String" uniqueMembers="false" levelType="TimeMonths" hideMemberIf="Never"> </Level> </Hierarchy> </Dimension>And one measure:<Measure name="Tonnes" column="weight" aggregator="sum" visible="true"> </Measure>To calculate the quarter seasonally-adjusted value I need to do some operations… one of them is the average of the sum of all the first quarters, etc.I tried to follow this example:–Find the average deviation for any given month member measures.averagemonthlydeviation as avg( exists( [Date].[Calendar].[Month].members , {[Date].[Month of Year].currentmember}) , measures.deviation)In my own schema:<CalculatedMember name="Average quarter deviation" formula="Avg(Exists([Period].[Quarter].Members, {[Period].[Quarter].CurrentMember}), [Measures].[Deviation])" dimension="Measures" visible="true">But it doesn\’t works… because it doesn\’t sum every first quarters before the average, etc.These are all the steps:<CalculatedMember name="4 quarter moving average" formula="Avg([Period].[Quarter].CurrentMember.Lag(2):[Period].[Quarter].CurrentMember.Lead(1), [Measures].[Tonnes])" dimension="Measures" visible="true"> </CalculatedMember> <CalculatedMember name="Underlying trend" formula="Avg({[Period].[Quarter].CurrentMember, [Period].[Quarter].CurrentMember.NextMember}, [Measures].[4 quarter moving average])" dimension="Measures" visible="true"> </CalculatedMember> <CalculatedMember name="Deviation" formula="[Measures].[Tonnes] / [Measures].[Underlying trend]" dimension="Measures" visible="true"> </CalculatedMember> <CalculatedMember name="Average quarter deviation" formula="Avg(Exists([Period].[Quarter].Members, {[Period].[Quarter].CurrentMember}), [Measures].[Desviation])" dimension="Measures" visible="true"> </CalculatedMember> <CalculatedMember name="Adjusted" formula="(Sum(Head([Period].[Quarter].Members, 4), [Measures].[Average quarter deviation]) / 4)" dimension="Measures" visible="true"> </CalculatedMember> <CalculatedMember name="Adjusted average deviation" formula="[Measures].[Average deviation] / [Measures].[Adjusted]" dimension="Measures" visible="true"> </CalculatedMember> <CalculatedMember name="Seasonally-Adjusted Values" formula="[Measures].[Tonnes] / [Measures].[Adjusted average deviation]" dimension="Measures" visible="true">Regards,

Hi Eduardo,No, I didn\’t get your email. You can find my contact details on http://www.crossjoin.co.uk, and I\’ll keep my eyes open for something from you.Chris

Hi Chris,I have sent you the email agian… yesterday we had some troubles with our servers. Thank you.Eduardo