Solving the ‘Events in Progress’ problem in MDX, Part 2–Role Playing Measure Groups

Chris Webb Analysis Services, MDX January 22, 2011 3 Minutes

In my last post I described a simple solution to the ‘events in progress’ problem in SSAS and MDX, which nonetheless had one drawback: it involved loading all the data from the same fact table into two different measure groups in the same cube, which of course potentially doubles the amount of time taken to process the cube and its size on disk. I used the same technique in my recent post on improving the performance of currency conversion, and indeed it’s a technique that I have used in the past with several customers successfully; but it seems rather inelegant, so is there a way we can avoid doing it and only have one measure group? It doesn’t seem to be possible at first glance and I actually opened a Connect a while ago asking for this functionality (‘role playing measure groups’) to be implemented. I was having a good week last week, though, and at the same time as solving the ‘events in progress’ problem I also came up with a solution to this problem too…

Here’s what I did, using the ‘events in progress’ problem as an illustration:

First of all, I created and processed a simple cube called ‘EventsInProgressSource’ with a single measure group, a single Count measure, and two dimensions, Order Date and Ship Date having regular relationships joining on the OrderDateKey and ShipDateKey columns on the fact table:

I then created a second cube, ‘EventsInProgress2’, using the ‘Create Empty Cube’ option in the Cube Wizard.

I then started the New Linked Object wizard and copied everything from the ‘EventsInProgressSource’ cube to the EventsInProgress2 cube:

This resulted in a cube with one linked measure group and two linked dimensions:

Now here’s the fun bit. I then deployed and processed the cube, closed BIDS and went to SQL Management Studio. There I scripted the Internet Sales Facts linked measure group in ‘EventsInProgress2’ to an XMLA Create statement, then manually updated the XMLA by adding a 1 to the end of the name and ID of the object itself and the Line Item Count measure, then executed the script against the ‘EventsInProgess2’ cube. This created a second, identical linked measure group – something that again BIDS doesn’t let you do. I then reopened BIDS and connected direct to the cube in online mode (I could also have reimported the project back into BIDS) and went to the Dimension Usage tab, then deleted the relationship between Ship Date and the first linked measure group and Order Date and the second linked measure group, leaving the relationships like this:

I then added another Date dimension and set up referenced relationships (which had to be non-materialised) with each measure group via the Date attributes of the Ship Date and Order Date dimensions:

With this done we have achieved out goal: we have the same fact table appearing twice in the same cube as two different measure groups with different dimensionality, but we are only processing the data once. The last step to solve the ‘events in progress’ problem is to add what is essentially the same MDX as last time to the cube:

CREATE MEMBER CURRENTCUBE.MEASURES.OrderedToDate AS
SUM(NULL: [Date].[Calendar].CURRENTMEMBER,
([Measures].[Line Item Count]));

CREATE MEMBER CURRENTCUBE.MEASURES.ShippedToPrevDate AS
SUM(NULL: [Date].[Calendar].CURRENTMEMBER.PREVMEMBER,
([Measures].[Line Item Count1]));

CREATE MEMBER CURRENTCUBE.MEASURES.OrderedNotShipped AS
MEASURES.OrderedToDate – MEASURES.ShippedToPrevDate;

And we get the results we want out of the cube:

Now I haven’t tested this with anything other than the Adventure Works data, and there are some areas where I’d like to do more testing:

Non-materialised reference relationships don’t always perform that well. Materialised reference relationships aren’t allowed with linked measure groups though, so possibly using a m2m relationship to join the Date and Order Date/Ship Date dimensions might be an alternative
There are various ways of calculating a total-to-date in MDX and I’ve not spent any time working out if the version I’ve shown above is the most efficient.
There is going to be an overhead from querying a linked measure group rather than a regular measure group, and also probably an overhead from trying to query the same source measure group twice rather than two different measure groups, but I don’t know how significant it is.

If anyone out there does decide to try this at scale with their own data, please leave a comment and tell me about your experiences.

Published by Chris Webb

My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel. View all posts by Chris Webb

Published January 22, 2011

20 thoughts on “Solving the ‘Events in Progress’ problem in MDX, Part 2–Role Playing Measure Groups”

Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.

Chris Webb says:

January 23, 2011 at 10:02 am

Two minutes after I posted this, Marco came and asked me the obvious question: did I really need to do all this, and couldn’t I use the LinkMember function instead? I should have included some benchmarks…

If we take the ‘EventsInProgressSource’ cube and add the following calculations (which give us the results we want, but use LinkMember):

create member currentcube.measures.test1 as
sum(null:[Order Date].[Calendar].currentmember,[Measures].[Line Item Count]);

create member currentcube.measures.test2 as
sum(null:
linkmember([Order Date].[Calendar].currentmember.prevmember, [Ship Date].[Calendar])
,([Measures].[Line Item Count], [Order Date].[Calendar].[All Periods]));

create member currentcube.measures.test3 as
measures.test1 – measures.test2;

Then run the following query:

select measures.test3 on 0,
[Order Date].[Date].[Date].members on 1
from [EventsInProgressSource]

The query runs in around 4.4 seconds on a cold cache on my machine. If we run the same query using the approach described above, it runs in 1.4 seconds. This is the overhead of using LinkMember…

Reply
Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.

Chris Webb says:

January 23, 2011 at 10:08 am

Oh, and as far as I can see, the approach used in this post performs the same as the duplicated measure group approach described in part 1, after a very limited amount of testing.

Reply
Bruce says:

January 24, 2011 at 6:49 pm

Would SQL Server benefit from something like Materialized Views as in Oracle? That way, you don’t have to duplicate your data and storage. I am not sure about the processing time though.

Reply
Smithicus

Ian smith says:

January 24, 2011 at 8:19 pm

Hi Chris. I think you can solve this problem in the dimensional model using the many to many approach, http://smithicus.wordpress.com/2011/01/03/point-in-time-analysis/ using the order date as the start date equivalent and the ship date as the end date equivalent. Please let me know what you think.

Reply
Smithicus

Ian smith says:

January 24, 2011 at 8:42 pm

The alternate solution i proposed does need a little tweaking but i stil think its worthwhile looking at.

Reply
1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
  
  Chris Webb says:
  
  January 24, 2011 at 9:20 pm
  
  Using m2m relationships is what Richard Tkachuk proposed in his (now unavailable, unfortunately) paper on this subject – in fact, he went one step further in that he worked around the problem that when there are a large number of different intervals it will make the intermediate measure group unfeasibly large by doing compression on the contents. It complicates the ETL a lot but it may well perform better than my approach in some scenarios. Which one works best will depend on your data, but the advantage of my approach is that no extra ETL development/processing involved 🙂
  
  Reply
  1. Smithicus
    
    Ian Smith says:
    
    January 25, 2011 at 7:39 am
    
    Ah, I didnt know Richard had allready mentioned this. Will blog about your approach and the one Richard and I proposed (will be linking to Richards and your blog page for the credits) just to outline the two.
    
    In short; yes as always it depends on the situation 🙂 The advantage using the m2m appraoch is that it’s solution lies in the dimensional model and therefor more transparent and less technology dependent.
Naveen Das says:

January 24, 2011 at 10:12 pm

Is this the whitepaper you are talking about – from Richard Tkachuk?

http://msdn.microsoft.com/en-us/library/ms345139.aspx

Reply
1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
  
  Chris Webb says:
  
  January 24, 2011 at 10:56 pm
  
  No, it’s the one linked to here:
  http://www.ssas-info.com/analysis-services-articles/59-time-dimension/1217-handling-begin-and-end-times-in-sql-server-analysis-services-2008
  But the link, and the site, is dead. I’ll ask Richard if he has the text though.
  
  Reply
  1. Naveen Das says:
    
    January 24, 2011 at 11:30 pm
    
    Is this it? http://bit.ly/gepPVD
    
    – from Richard Tkachuk’s Analysis Services Page
  2. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
    
    Chris Webb says:
    
    January 25, 2011 at 11:20 pm
    
    Yes, that’s the article.
clairedw

Claire says:

January 27, 2011 at 9:35 pm

Hmm, cool method. I’ve used an mdx script similar to the one here: http://consultingblogs.emc.com/christianwade/archive/2006/04/30/MDX-Sprocs-and-Scripting_3A00_-An-Interesting-Example.aspx

But I’ll try this and see how it compares!

Reply
Gerhard Brueckl says:

February 2, 2011 at 9:44 pm

Hi Chris,

after reading your two posts I also started thinking about the problem of “Events in Progress” and came up with the following:
for the relatively simple scenario of just count orders that are ordered but not yet shipped at a given date you could “unpivot” your facttable as
SELECT
OrderDate AS Date
1 AS OrderCount
FROM MyFactTable
UNION ALL
SELECT
ShipDate AS Date
-1 AS OrderCount
FROM MyFactTable

the rest is similar to your calculations except that you only need one calculated member:
SUM({NULL:[Date].[Date].currentmember}, [Measures].[OrderCount])

though this also doubles your data …

Reply
James Rogers says:

March 9, 2011 at 6:11 pm

Chris,

This is an interesting approach. We solved this problem by denormalizing the ship date into the order line record. If an order line does not have a ship date then it gets mapped to the “Unknown” member. then it becomes (order lines – ([Ship Date].&[Unknown],order lines). May be an over simplification and it has its own set of drawbacks but it scales very well for us.

Reply
1. James Rogers says:
  
  March 9, 2011 at 6:18 pm
  
  We have used this approach for imported and forecasted data. Those two measure groups get used in every cube so we set them up in their own cube and linked the measure groups. Seems to scale very well. I will try to pull together some statistics to quantify the performance.
  
  Reply
stevep – Manchester,UK – Can't sing, dance or tell jokes. But basically still a nice guy.

stevep says:

October 10, 2012 at 3:03 pm

I appreciate this is old now but I’ve used this approach several times in respect to dates very successfully. I’m now having to implement it down to hours (its bed occupancy in a hospital scenario) and have degraded the data so I can provide a key at DateHour level. Which seems to be working ok. But it feels untidy. I wondered if I could use two separate dimensions Date and Hour and cheat somehow with a reference dimension. Has anyone done something similar?

Reply
Pingback: Mental Health Dimensional Model | James Snape
Anonymous – india

ganeshvelt says:

October 30, 2013 at 11:51 pm

or pivot the fact table which have the events dates ,much easier to manage

Reply
James Mburu says:

June 16, 2014 at 6:45 pm

Hey Chris, if LINKMEMBER is that costly, why not use STRTOMEMBER as shown below?

CREATE MEMBER CURRENTCUBE.MEASURES.TEST2
AS

SUM(
NULL:STRTOMEMBER(“[Ship Date].[Calendar].&[” + [Order Date].[Calendar].CurrentMember.Member_Key + “]”),
([Measures].[Line Item Count], [Order Date].[Calendar].[All Periods])
);

Also you could use a physical measure and scope statement to calculate the OrderedNotShipped value to allow users to drill through from Excel if need be.

Reply
1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
  
  Chris Webb says:
  
  June 16, 2014 at 7:32 pm
  
  STRTOMEMBER is usually even worse for performance than LINKMEMBER, especially on older versions of SSAS.
  
  Reply