Solving the ‘Events in Progress’ problem in MDX, Part 2–Role Playing Measure Groups

In my last post I described a simple solution to the ‘events in progress’ problem in SSAS and MDX, which nonetheless had one drawback: it involved loading all the data from the same fact table into two different measure groups in the same cube, which of course potentially doubles the amount of time taken to process the cube and its size on disk. I used the same technique in my recent post on improving the performance of currency conversion, and indeed it’s a technique that I have used in the past with several customers successfully; but it seems rather inelegant, so is there a way we can avoid doing it and only have one measure group? It doesn’t seem to be possible at first glance and I actually opened a Connect a while ago asking for this functionality (‘role playing measure groups’) to be implemented. I was having a good week last week, though, and at the same time as solving the ‘events in progress’ problem I also came up with a solution to this problem too…

Here’s what I did, using the ‘events in progress’ problem as an illustration:

First of all, I created and processed a simple cube called ‘EventsInProgressSource’ with a single measure group, a single Count measure, and two dimensions, Order Date and Ship Date having regular relationships joining on the OrderDateKey and ShipDateKey columns on the fact table:

image

I then created a second cube, ‘EventsInProgress2’, using the ‘Create Empty Cube’ option in the Cube Wizard.

I then started the New Linked Object wizard and copied everything from the ‘EventsInProgressSource’ cube to the EventsInProgress2 cube:

This resulted in a cube with one linked measure group and two linked dimensions:

Now here’s the fun bit. I then deployed and processed the cube, closed BIDS and went to SQL Management Studio. There I scripted the Internet Sales Facts linked measure group in ‘EventsInProgress2’ to an XMLA Create statement, then manually updated the XMLA by adding a 1 to the end of the name and ID of the object itself and the Line Item Count measure, then executed the script against the ‘EventsInProgess2’ cube. This created a second, identical linked measure group – something that again BIDS doesn’t let you do. I then reopened BIDS and connected direct to the cube in online mode (I could also have reimported the project back into BIDS) and went to the Dimension Usage tab, then deleted the relationship between Ship Date and the first linked measure group and Order Date and the second linked measure group, leaving the relationships like this:

I then added another Date dimension and set up referenced relationships (which had to be non-materialised) with each measure group via the Date attributes of the Ship Date and Order Date dimensions:

With this done we have achieved out goal: we have the same fact table appearing twice in the same cube as two different measure groups with different dimensionality, but we are only processing the data once. The last step to solve the ‘events in progress’ problem is to add what is essentially the same MDX as last time to the cube:

CREATE MEMBER CURRENTCUBE.MEASURES.OrderedToDate AS
    SUM(NULL: [Date].[Calendar].CURRENTMEMBER,
        ([Measures].[Line Item Count]));

CREATE MEMBER CURRENTCUBE.MEASURES.ShippedToPrevDate AS
    SUM(NULL: [Date].[Calendar].CURRENTMEMBER.PREVMEMBER,
        ([Measures].[Line Item Count1]));

CREATE MEMBER CURRENTCUBE.MEASURES.OrderedNotShipped AS
MEASURES.OrderedToDate – MEASURES.ShippedToPrevDate;

And we get the results we want out of the cube:

Now I haven’t tested this with anything other than the Adventure Works data, and there are some areas where I’d like to do more testing:

  • Non-materialised reference relationships don’t always perform that well. Materialised reference relationships aren’t allowed with linked measure groups though, so possibly using a m2m relationship to join the Date and Order Date/Ship Date dimensions might be an alternative
  • There are various ways of calculating a total-to-date in MDX and I’ve not spent any time working out if the version I’ve shown above is the most efficient.
  • There is going to be an overhead from querying a linked measure group rather than a regular measure group, and also probably an overhead from trying to query the same source measure group twice rather than two different measure groups, but I don’t know how significant it is.

If anyone out there does decide to try this at scale with their own data, please leave a comment and tell me about your experiences.

20 thoughts on “Solving the ‘Events in Progress’ problem in MDX, Part 2–Role Playing Measure Groups

  1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
    Chris Webb says:

    Two minutes after I posted this, Marco came and asked me the obvious question: did I really need to do all this, and couldn’t I use the LinkMember function instead? I should have included some benchmarks…

    If we take the ‘EventsInProgressSource’ cube and add the following calculations (which give us the results we want, but use LinkMember):

    create member currentcube.measures.test1 as
    sum(null:[Order Date].[Calendar].currentmember,[Measures].[Line Item Count]);

    create member currentcube.measures.test2 as
    sum(null:
    linkmember([Order Date].[Calendar].currentmember.prevmember, [Ship Date].[Calendar])
    ,([Measures].[Line Item Count], [Order Date].[Calendar].[All Periods]));

    create member currentcube.measures.test3 as
    measures.test1 – measures.test2;

    Then run the following query:

    select measures.test3 on 0,
    [Order Date].[Date].[Date].members on 1
    from [EventsInProgressSource]

    The query runs in around 4.4 seconds on a cold cache on my machine. If we run the same query using the approach described above, it runs in 1.4 seconds. This is the overhead of using LinkMember…

  2. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
    Chris Webb says:

    Oh, and as far as I can see, the approach used in this post performs the same as the duplicated measure group approach described in part 1, after a very limited amount of testing.

  3. Would SQL Server benefit from something like Materialized Views as in Oracle? That way, you don’t have to duplicate your data and storage. I am not sure about the processing time though.

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      Using m2m relationships is what Richard Tkachuk proposed in his (now unavailable, unfortunately) paper on this subject – in fact, he went one step further in that he worked around the problem that when there are a large number of different intervals it will make the intermediate measure group unfeasibly large by doing compression on the contents. It complicates the ETL a lot but it may well perform better than my approach in some scenarios. Which one works best will depend on your data, but the advantage of my approach is that no extra ETL development/processing involved 🙂

      1. Ah, I didnt know Richard had allready mentioned this. Will blog about your approach and the one Richard and I proposed (will be linking to Richards and your blog page for the credits) just to outline the two.

        In short; yes as always it depends on the situation 🙂 The advantage using the m2m appraoch is that it’s solution lies in the dimensional model and therefor more transparent and less technology dependent.

      1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
        Chris Webb says:

        Yes, that’s the article.

  4. Hi Chris,

    after reading your two posts I also started thinking about the problem of “Events in Progress” and came up with the following:
    for the relatively simple scenario of just count orders that are ordered but not yet shipped at a given date you could “unpivot” your facttable as
    SELECT
    OrderDate AS Date
    1 AS OrderCount
    FROM MyFactTable
    UNION ALL
    SELECT
    ShipDate AS Date
    -1 AS OrderCount
    FROM MyFactTable

    the rest is similar to your calculations except that you only need one calculated member:
    SUM({NULL:[Date].[Date].currentmember}, [Measures].[OrderCount])

    though this also doubles your data …

  5. Chris,

    This is an interesting approach. We solved this problem by denormalizing the ship date into the order line record. If an order line does not have a ship date then it gets mapped to the “Unknown” member. then it becomes (order lines – ([Ship Date].&[Unknown],order lines). May be an over simplification and it has its own set of drawbacks but it scales very well for us.

    1. We have used this approach for imported and forecasted data. Those two measure groups get used in every cube so we set them up in their own cube and linked the measure groups. Seems to scale very well. I will try to pull together some statistics to quantify the performance.

  6. I appreciate this is old now but I’ve used this approach several times in respect to dates very successfully. I’m now having to implement it down to hours (its bed occupancy in a hospital scenario) and have degraded the data so I can provide a key at DateHour level. Which seems to be working ok. But it feels untidy. I wondered if I could use two separate dimensions Date and Hour and cheat somehow with a reference dimension. Has anyone done something similar?

  7. Hey Chris, if LINKMEMBER is that costly, why not use STRTOMEMBER as shown below?

    CREATE MEMBER CURRENTCUBE.MEASURES.TEST2
    AS

    SUM(
    NULL:STRTOMEMBER(“[Ship Date].[Calendar].&[” + [Order Date].[Calendar].CurrentMember.Member_Key + “]”),
    ([Measures].[Line Item Count], [Order Date].[Calendar].[All Periods])
    );

    Also you could use a physical measure and scope statement to calculate the OrderedNotShipped value to allow users to drill through from Excel if need be.

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      STRTOMEMBER is usually even worse for performance than LINKMEMBER, especially on older versions of SSAS.

Leave a ReplyCancel reply