I wouldn’t admit to this in public (at least to an audience of normal people, rather than the Microsoft BI geeks reading here), but over the last few years I’ve been obsessed with solving the ‘events in progress’ problem in SSAS and MDX. I’ve tackled it successfully in PowerPivot and DAX (see here and here) but I always thought there was no good solution in SSAS, and certainly other people such as Richard Tkachuk who have written about the same issue have come to a similar conclusion. But… last week I found one! So in this blog post I’ll outline the basic approach, and in my next post I’ll detail an even better, if more complex to implement, solution.

The two blog posts linked to above describe the problem in detail, but here’s a quick overview. In the Internet Sales fact table in Adventure Works each row represents a line item on an invoice, and each line item has an Order Date (ie the date the order was received) and a Ship Date (ie the date the order was shipped to the customer). We would like to know, on any given date, how many invoice line items are part of orders that have been received but not yet shipped and so have an Order Date before that date and a Ship Date after that date.

The key to being able to solve this problem in SSAS and MDX is how you model the data – my previous difficulties were down to the fact that I had the obvious way of modelling it, with one measure group having regular relationships with an Order Date and a Ship Date dimension, fixed in my head. However if you take a completely different approach the problem becomes easy, and here’s my worked solution using the Adventure Works data:

First of all I created my cube with one Date dimension and two measure groups, both based on the Internet Sales fact table:

Since BIDS won’t let you build two measure groups from the same table in the DSV, I created a named query that duplicated the Internet Sales fact table and used that as the basis of the Ship Dates measure group. Both the Order Dates Count and Ship Dates Count have AggregateFunction set to Count, and the Date dimension joins to Order Dates on OrderDateKey and Ship Dates on ShipDateKey.

I then created the following calculated measures:

CREATE MEMBER CURRENTCUBE.MEASURES.OrderedToDate AS

SUM(NULL: [Date].[Calendar].CURRENTMEMBER,

([Measures].[Order Dates Count]));

CREATE MEMBER CURRENTCUBE.MEASURES.ShippedToPrevDate AS

SUM(NULL: [Date].[Calendar].CURRENTMEMBER.PREVMEMBER,

([Measures].[Ship Dates Count]));

CREATE MEMBER CURRENTCUBE.MEASURES.OrderedNotShipped AS

MEASURES.OrderedToDate – MEASURES.ShippedToPrevDate;

The first, OrderedToDate, gives the total number of line items ordered from the beginning of time up to the current date. The second, ShippedToPrevDate, gives the number of line items shipped from the beginning of time up to the date before the current date. If we subtract the second value from the first, as we do in the OrderedNotShipped measure, we can therefore find the number of line items have been ordered but not yet shipped on any given date:

And it’s pretty fast, too – certainly much faster than any other solution I’ve tried before in SSAS. There’s one obvious catch here though, and that is that you have to load the data from your fact table into your cube twice. What if you have a really large fact table, and doubling your processing time and storage in this way is not an option? Well, stay tuned for part two when I’ll show you a trick to avoid doing this and only use one measure group.

Well done!

After all, also you will join us in the “Model first” mantra…

When facing a similar problem concerning student enrollment counts based on a start and end date, we were forced to pre-aggrigate on each possible day in a cube to accomplish this. Our approach really bloated the cube, I’m thinking this could really help with some performance issues, Thanks and looking forward to part 2!

Nice. Lots of problems get solved eeasier in the model than in MDX 🙂

Can you add one more dimension with two members – Shipped and Ordered and add one extra row for each order – when it ships? Then you can have one date dimension too. Of course you get more fact rows, but less measures and dimensions (dimensions as opposed to the original design). Just thinking here – haven’t actually tried it….

You are being too modest Boyan… Is this not the same thing? http://www.bp-msbi.com/2010/10/avoiding-multiple-role-playing-date-dimensions/

Yup, similar…

Hmm, I might have tried this approach but forgotten… As you say, it still leaves you with duplicate data and but the MDX would probably be as easy. However the real point of this post was to build up to my next one and show how you could solve the problem with no duplication of data…!

Fair enough! Can’t wait :))

I have been trying to use this solution as model for a problem I am having. I have a table with transactions that have start and end dates and a customer ID. I want to know how many distinct customers are active during any time period. I can use your really cool technique to know how many transactions are active, but the distinct customer part is throwing me for a loop. Any ideas?

Actually, this should be much easier than the technique I describe here. You just need to create a distinct count measure on your customer id column, and then in the where clause of your query include a set containing the date range you want to look at. Here’s an example from Adventure Works:

select {[Measures].[Customer Count]} on 0,

[Product].[Category].members on 1

from

[Adventure Works]

where({[Date].[Calendar Year].&[2002]:[Date].[Calendar Year].&[2004]})

I was wanting the user to be able to pivot (using Excel or something similar) using any part of the date hierarchy.

You can always put your date hierarchy on the filter and multiselect the date range

Sorry for cluttering up your blog. I guess I am just missing a step. I was looking to your solution here to figure out how many of those transactions were active across multiple periods (break down by months or quarters). It works great to know how many transactions are active, but I want to know how many of my customers have active transactions I get lost. I do not want to double count customers who have more than one transaction active during any period. Thanks for all the quick replies. I really enjoy this blog.

No problem. They key is to use a distinct count measure – you don’t need a second measure group or anything else – and that will avoid double counting customers.

Let me ask it this way and see if you give the same answer. In your blog example, you can quickly figure out how many items have been ordered and not sent. How would you modify it to find out how many customers have items that have been ordered but not sent?

Thanks again for the responses. I feel like I should buy a second copy of your books now 🙂

Ah, I see. Unfortunately I don’t have an answer though… I’ll think about it!

Hi Chris,

Great approach to counting events-in-progress. I have the same situation as Brad G. I need to do a distinct count and show it in the Excel based on StartDate and End Date. Using the MDX query and specifying the range in the Where clause works well and is fast, but how to create a calculated member?

Brad, did you solve your problem? Appreciate a response.

Artur Anbild

Hi Chris, I have the same problem as Brad. The distinct count won’t work with this approach. Is there a good way to handle it yet? Thanks a lot.

Sorry, I haven’t got round to thinking about it properly – it’s not straightforward, for sure. You could certainly do it with sets similar to the approach here: http://cwebbbi.wordpress.com/2010/10/08/counting-new-and-returning-customers-in-mdx/ but if you have a lot of customers, performance might not be great.

I had a problem with LinkMember function applied on role playing dimensions(Date) when that role playing dimension was related with another dimension by many-to-many relationship.Besides performance issues related with the LinkMember function the results were not accurate.

So,I decided create a solution without using LinkMember based on yours solution with a little trick, instead of creating a named query I create a distinct count measure on the same fact table wich automatically creates a new measure group and after i changed the aggregation to Count.

The solution stays more cleaner, but obviously the catch remains.

Meanwhile, I read the Part II and i’m considering try it.

Thanks

What a nice design trick, it looks exactly what we need for our student enrollment problem. We tried to put deltas into the fact table and that worked fine for the counts, but the dimensions could not be properly utilized so we couldn’t slice properly. This looks like it will solve that problem. Thanks for the article!

how are the measures order dates count and sales dates count created? what aggragate, table and attributes are being used

The details are in the post: they’re created from the FactInternetSales table with aggregate function Count.