Analysis Services · MDX · Multidimensional

Calculating The Value Of Overdue Invoices Using Many-To-Many Relationships in SSAS Multidimensional

Recently I had to solve the same, rather complex, problem for three different customers within the space of two weeks, a variation of the ‘events-in-progress’ problem I’ve blogged about a lot in the past. It’s this one: how can you calculate the value of your overdue invoices on any given date? It’s important to stress that we are not talking about the invoices that are overdue today – we want to be able to pick any date in the past and find out the value of invoices that were overdue at that point in time.

Let’s imagine you have a fact table containing payments against invoices: each row has an invoice number, the date of the payment, the payment amount, and the date that the invoice is due. A positive value in the Amount column indicates that this is the opening amount of the invoice; negative values in the Amount column are payments against the invoice. Multiple payments can be made against an invoice before it is fully closed. Here’s some example data:

image

Looking at the rows highlighted for invoice 5, you can see in the upper box that there is a positive amount of £35 shown on January 1st 2014 – this is the date that the invoice was opened, and £35 is the full value of the invoice. The invoice is due on January 10th 2014. In the lower box you can see there were four separate payments of £5, £5, £10 and £15 before the invoice was fully paid off on January 12th 2014.

Now, if you were to do a total-to-date in MDX (similar to what I describe here) it would be pretty easy to calculate the outstanding amount on all invoices on any given date. However the requirement here is not only to do that, but to break the value down so that you can see what the total value for overdue invoices and non-overdue (which I’m going to call backlog in this post) invoices is on any given date. This is tricky to do because we have to take the Due Date of each invoice into account as well as the Payment Date. The desired output for invoice 5 is this:

As you can see in this PivotTable, for invoice 5 £25 was still outstanding on January 9th 2014, but this is shown as backlog because this is before the due date of January 10th. On January 11th, one day after the due date, the remaining outstanding amount of £15 is shown as overdue. Of course, we also want to be able to calculate the correct values for all invoices:

One way of solving this problem would be to calculate the overdue and backlog values for each date that each invoice is open in your ETL, and store these values in a snapshot fact table. This works, and will give you the best possible query performance, but it has two major drawbacks: first, it makes your ETL much slower, and second it makes your fact table much larger. This post describes how you can calculate the overdue and non-overdue values on any given date using many-to-many relationships instead, without having to blow out the fact table.

To go along with the fact table (called FactInvoice) shown above, I have a date dimension table called DimDate (I have deliberately reduced the number of rows here to the dates I have data for, for reasons that will become clear soon):

I have a dimension table containing all of the days that an invoice can be overdue or not overdue for (which has one less than double the number of rows as the date dimension table), plus a second column classifying each row as ‘Backlog’ or ‘Overdue’:

… and an invoice dimension table that just contains the distinct invoice numbers called DimInvoice.

Now, let me explain how to build the SSAS cube.

Step 1

  • Build dimensions from all of the tables shown above, so you have dimensions called Date, Overdue Days and Invoice.
  • Build a cube with one measure group, based on FactInvoice, and create one measure with AggregateFunction Sum based on the Amount column in that table.
  • Add the Invoice dimension to the cube with a regular relationship. Add the Date dimension to the cube twice, as Payment Date and Due Date, with regular relationships on the PaymentDate and DueDate columns. The Dimension Usage tab should look like this:

Step 2

  • Create the following view in SQL Server:
  • CREATE VIEW [dbo].[FactDateToPaymentDate]
    AS
    SELECT        a.DateKey, b.DateKey AS PaymentDateKey
    FROM            dbo.DimDate AS a INNER JOIN
                             dbo.DimDate AS b ON a.DateKey >= b.DateKey

    This view returns all of the combinations of a given date and all dates up to and including the date.

  • Add this view to the DSV and create a new measure group from it; you’ll need to create a measure here, but it can just be a Count measure.
  • Add the Date dimension to the cube once again, this time leaving the name as Date (you will now have three role-playing copies of the Date dimension in the cube).
  • Set up regular relationships between the Date and Payment Date dimensions and the new measure group, then a many-to-many relationship between Date and the Invoice measure group. This means that when you query the cube by the Date dimension, the many-to-many relationship will mean you see the sum of all Amounts whose payment date is up to and including the date selected. The Dimension Usage tab will look like this:

Step 4

  • Create another view in SQL Server with this definition:
  • CREATE VIEW [dbo].[FactDueDateToAsOfDate]
    AS
    SELECT        a.DateKey AS DueDateKey, b.DateKey AS AsOfDateKey, DATEDIFF(dd, a.FullDate, b.FullDate) AS OverDueDays
    FROM            dbo.DimDate AS a CROSS JOIN
                             dbo.DimDate AS b

    Yes, I am cross joining the DimDate table with itself and yes, this could return a lot of rows. However you should find that the view is very fast to execute.

  • Add this view to the DSV and build another measure group from it, again with a single Count measure on it.
  • Set up a regular relationship between this new measure group and the Due Date dimension.
  • Add the Overdue Days dimension to the cube, give it a regular relationship with the new measure group and a many-to-many relationship with the Invoice measure group through the new measure group.
  • Add the Date dimension to the cube yet again, creating a new role-playing dimension called As Of Date, give it a regular relationship with the new measure group and a many-to-many relationship with the Invoice measure group through the new measure group. The Dimension Usage tab will now look like this:

The purpose of this measure group is this: if you select a date on the As Of Date dimension, you will be able to select ‘Overdue’ on the Overdue Days dimension and this will give you all of the dates on Due Date that were overdue on that date.

Step 5

  • You only want to have to select one date in your PivotTable, so create the following MDX calculated measures that take your selection on the Date dimension and applies it to the As Of Date dimension too:

CREATE MEMBER CURRENTCUBE.MEASURES.[Open Amount Hidden] AS
([Measures].[Amount],
LINKMEMBER([Date].[Date].CURRENTMEMBER, [As Of Date].[Date])
), VISIBLE=FALSE;

CREATE MEMBER CURRENTCUBE.MEASURES.[Open Amount] AS
IIF(MEASURES.[Open Amount Hidden]=0, NULL, MEASURES.[Open Amount Hidden]);

  • Hide all measures apart from the calculated measures you’ve just created
  • Hide the Payment Date, Due Date and As Of Date dimensions

Conclusion

This is a very complex pattern, I know, and this is after I’ve simplified it a lot (if you need currency conversion as well then things get even worse) but I also know it’s extremely useful from a business point of view. Query performance is also reasonably good, at least in the places where I have implemented this.

You can download my sample SQL Server database and VS 2012 project here.

25 thoughts on “Calculating The Value Of Overdue Invoices Using Many-To-Many Relationships in SSAS Multidimensional

  1. Chris,

    I remember working on an AR aging bucket problem a while back and if I’m not mistaken it seems to be similar to your challenge here. In most (if not all) financial reporting project I have worked on, the finance folks tend to run reports based on when a transaction posted into their ERP system since transactions that haven’t posted yet really don’t have much value in financial analysis. I tend to use the posing date of transactions as the default “Date” dimension for the cubes and thus the default date used in point in time analysis.

    With this in mind, I created two more dimensions in the cube (role paying dimensions). Payment Due Date and Payment Clearance Date (that are hidden from the cube user). Due dates for the original invoice posting are usually calculated based on the document date and the terms of payment. This information is usually available in the financial ERP systems are relatively easy to calculate during ETL. The original invoice posting (not payment yet) will have a Payment Clearance Date of something like 01/01/1900 just to make sure it has a date (or you can take the UNKNOWNMEMBER approach too).

    Since the payment postings have same invoice number as the original transactions, you will have all three sets of data for the payment transaction (posting date of the payment, due date of the original invoice and of course the payment clearance date). With this data, I’m able to introduce a new dimension – AR Bucket with no relationships to any measure group then define the relationship using MDS (just like the Business Intelligence wizard does with time series calculations. This dimension will derive values from a table that stores all the values of the AR Bucket – Current, 1 – 30, 31 – 60, 61 – 90, 91 – 120, 121+ and of course Default (meaningless but crucial for cases when the AR Bucket dimension is not selected in a report).

    With this approach you will be able to avoid a many to many relationship (executed on the fly and slower in nature) and if you create a custom aggregation on Date, Payment Due Date and Payment Clearance Date attributes, you can get very good performance.

    Below is sample MDX code that I used for the calculation of the “Current” and “1 – 30” AR Bucket values. Note that the key for my date dimension (physical dimension used by all role playing dimension) is an integer in the format – yyyymmdd. This way it’s easy to use the STRTOMEMBER function on other date dimensions based on what the user selected on the default dimension.

    /*———————————————
    Fiscal Year – Current AR Bucket
    */———————————————

    SCOPE
    (
    [AR Bucket].[AR Bucket].[Current]
    );

    THIS =

    IIF(
    ISEMPTY([AR Bucket].[AR Bucket].[Default]),
    NULL,
    AGGREGATE(
    {
    [AR Bucket].[AR Bucket].[Default]
    } *
    NONEMPTYCROSSJOIN(
    {
    NULL : CLOSINGPERIOD([Date].[Fiscal].[Day], [Date].[Fiscal].CurrentMember)
    },
    {
    STRTOMEMBER(
    ‘[Payment Due Date].[Fiscal].[Day].&[‘ + CLOSINGPERIOD([Date].[Fiscal].[Day], [Date].[Fiscal].CurrentMember).Member_Key + ‘]’
    ) : NULL
    },
    {
    STRTOMEMBER(
    ‘[Payment Clearance Date].[Fiscal].[Day].&[‘ + CLOSINGPERIOD([Date].[Fiscal].[Day], [Date].[Fiscal].CurrentMember).Member_Key + ‘]’
    ).NextMember : NULL,
    [Payment Clearance Date].[Fiscal].[Day].&[19000101]
    }
    )
    )
    );

    END SCOPE;

    /*———————————————
    Fiscal Year – 1 – 30 AR Bucket
    */———————————————

    SCOPE
    (
    [AR Bucket].[AR Bucket].[1 – 30]
    );

    THIS =

    IIF(
    ISEMPTY([AR Bucket].[AR Bucket].[Default]),
    NULL,
    AGGREGATE(
    {
    [AR Bucket].[AR Bucket].[Default]
    } *
    NONEMPTYCROSSJOIN(
    {
    NULL : CLOSINGPERIOD([Date].[Fiscal].[Day], [Date].[Fiscal].CurrentMember)
    },
    {
    STRTOMEMBER(
    ‘[Payment Due Date].[Fiscal].[Day].&[‘ + CLOSINGPERIOD([Date].[Fiscal].[Day], [Date].[Fiscal].CurrentMember).Member_Key + ‘]’
    ).Lag(30) :
    STRTOMEMBER(
    ‘[Payment Due Date].[Fiscal].[Day].&[‘ + CLOSINGPERIOD([Date].[Fiscal].[Day], [Date].[Fiscal].CurrentMember).Member_Key + ‘]’
    ).Lag(1)
    },
    {
    STRTOMEMBER(
    ‘[Payment Clearance Date].[Fiscal].[Day].&[‘ + CLOSINGPERIOD([Date].[Fiscal].[Day], [Date].[Fiscal].CurrentMember).Member_Key + ‘]’
    ).NextMember : NULL,
    [Payment Clearance Date].[Fiscal].[Day].&[19000101]
    }
    )
    )
    );

    END SCOPE;

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      Thanks James, yes, this is a different solution to the same problem. However I think the m2m will perform better than a pure MDX-based solution. I’ve used this kind of approach for creating buckets before and never been completely satisfied with it – especially when there are other calculations, such as currency conversion, in play.

      BTW you shouldn’t be using NonEmptyCrossjoin() – it’s deprecated. See http://msdn.microsoft.com/en-us/library/ms144797.aspx

      1. I agree pure MDX solutions can be significantly slower but if you look closer to my solution, I’m not really doing much dynamically. I’m only using STRTOMEMBER to map the date selected to slice the role playing dimensions accordingly. Most of these functions come return a single member to be used in the simplest set operation (colon operator – “:”). Additionally each calculation can easily utilize custom attribute level aggregates based on usage (reports) e.g. month + product category + sales region + etc. The issue with Many to Many is that every query will have to utilize the grain level attributes for the bridge table at run time (required to speed up queries that hit many to many related dimensions) regardless of whether the respective key attributes in the many to many are needed on the report or not.

        The solution above was implemented in quite the Enterprise solution that had Currency Intelligence, Time Intelligence, Account Intelligence and two Type 5 SCDs yet report performance was quite impressive given the complexity.

      2. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
        Chris Webb says:

        I think the only way to know which one would perform better would be to implement both on the same data. However I still think that mine will perform best 🙂 I know that m2m carries a performance overhead, but since you are already forcing the granularity down to the date level in your code using ClosingPeriod(), this only what the m2m relationships in my technique are doing. Both techniques would be able to use the same aggregations.

  2. I think this is a very complex solution, to a very simple problem. Why not just a view on fact table, something like:

    select InvoiceId,
    iif(PaymentDayDueDate, Amount, o) as Overdue

    from fact_table

    1. Best possible query performance
    2. ETL is not slower, because you´re loading the same data you already have.
    3. Fact table has the same size.

    .. maybe longer at processing cube, but I don´t think so, even so, user query performance is #1 goal for me.

    1. I agree with Chris. This is not that quite simple. To be able to use the solution you laid out, you would need a separate fact table composed of a record for each Invoice posting and any queryable date and using Kimball’s bus architecture link that measure group back to the original fact’s measure group via conforming dimensions. For every 10 years of queryable dates, this new fact table would require at least 3,652 records for every Invoice posting fact record in the current fact. This would result in an explosion of data that easily gets out of control and significantly impede any performance gains you have with pre-aggregated data. Also your ETL may get really complicated if you have to deal with Inter company invoice reversals. SAP handles this in a rather sophisticated manner and spins off compensating transactions (read new fact records here) reducing the degree of complexity one would need in ETL but since we introduced a new fact table here, we would have to add similar sophistication to our new fact table.

      To summarize, the challenge here would be best resolved dynamically as Chris did. The focus I believe would be to try and optimize his solution from an infrastructure standpoint if query response time is ever an issue like placing the intermediate measure groups for the many to many relationships that are joined at run time into solid state drives to boost.

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      No, this is not the same thing at all. Your solution can only give the amount overdue at a single point in time, the point that your fact table is calculated. However an invoice that is overdue today may not have been overdue last week. What my solution does (and the reason it is complex) is tell you how much was overdue on any given date, not just today.

  3. nsingleton1 – I am 27 year old male living in Buford, GA. I work in Alpharetta for the National Christian Foundation. The National Christian Foundation is charitable giving foundation and I work for their IT department. My main incentive for blogging will be for documenting projects and solutions. Too often in my line of work, I find a solution and forget how I originally figured it out. This leads to me questioning my original solution because I can't remember the mind set behind the logic in the first place. Hopefully, this will eliminate some of this wasted time and help me organize and document some of my work.
    Nick Singleton says:

    Chris – I need help for a similar problem. The difference is that I don’t a have DueDate. I’d like to dynamically put the AR in buckets of days such as [0 – 29], [30 – 59], … in which the user could see the AR in these buckets for any given historical date. Any ideas on how to go about this?

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      Can you tell me exactly how you want to work out which bucket each invoice goes into?

      1. nsingleton1 – I am 27 year old male living in Buford, GA. I work in Alpharetta for the National Christian Foundation. The National Christian Foundation is charitable giving foundation and I work for their IT department. My main incentive for blogging will be for documenting projects and solutions. Too often in my line of work, I find a solution and forget how I originally figured it out. This leads to me questioning my original solution because I can't remember the mind set behind the logic in the first place. Hopefully, this will eliminate some of this wasted time and help me organize and document some of my work.
        Nick Singleton says:

        I think in the FactDateToPaymentDate there would need to be a column for the buckets. For example, if the AR aging is 31 days, the row would have the [30 – 59] bucket. Does that make sense?

      2. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
        Chris Webb says:

        Yes, that’s a very common way of handling this type of problem. You’d need to calculate which bucket each invoice was in during your ETL though, not in SSAS/MDX. The difference between the approach you describe and the one in this post is that if you hard code the bucket on the fact table then an invoice can only ever be in one bucket, whereas in the approach in my post an invoice can be in different buckets depending on the as-of date you choose to look at.

  4. Thanks for sharing this solution Chris, I just implemented it and it’s working as expected.

    Would you use scopes or some other technique to display the status of the last day of the month/quarter/year on corresponding level when using a date hierarchy?
    I’m looking for the “last child” behavior when using a date hierarchy with the Open Amount measure.

    Thanks,
    Andri

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      I would use semi-additive measures if I was using Enterprise Edition – it’s the fastest and easiest option.

  5. nsingleton1 – I am 27 year old male living in Buford, GA. I work in Alpharetta for the National Christian Foundation. The National Christian Foundation is charitable giving foundation and I work for their IT department. My main incentive for blogging will be for documenting projects and solutions. Too often in my line of work, I find a solution and forget how I originally figured it out. This leads to me questioning my original solution because I can't remember the mind set behind the logic in the first place. Hopefully, this will eliminate some of this wasted time and help me organize and document some of my work.
    Nick Singleton says:

    I’m trying to figure out how to implement this with SSAS Tabular. I know many-to-many isn’t supported, but I’m wondering if there is a DAX work-around.

  6. Hi, If you have any ideas how we can create buckets for the amounts, like the no of invoices that have amount of 10$ overdue. Buckets with [1-10], [11-20].. so on

  7. Thanks Chris, I have implemented something like this already, and it works well for smaller volume of data, but when the volume is huge(Millions of invoices), the mdx performance is really bad, no response in fact. So I was looking for other design options instead of doing this in MDX.
    The dynamic set is killing the server, I realized that dynamic set is not a good option here.
    Here is the MDX used:
    CREATE dynamic SET CURRENTCUBE.Invoices as (filter([Invoice].[InvoiceID].[All].children,
    ([Invoice].[InvoiceID],[Date of observation].[Hierarchie Date].currentmember,
    [Measures].[Number of Invoices])=1));

    scope([Invoice Amounts Bucket].[1-10] ,[Measures].[Number of Invoices]);
    this =
    sum(Invoices ,iif(
    ([Measures].[Invoice Amount] >=1 and [Measures].[Invoice Amount] <= 10)
    ,1,null));
    end scope;

    I can provide the real structure I'm working on now. Please let me know.

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      Hi Suresh, no MDX-based approach (whether it uses dynamic sets or not) will every perform well on large amounts of data. Have you tried using the approach exactly as I describe it in this post? It’s the only way you’re likely to get reasonable performance.

      1. Yes, I did implement as per the link you provided, but I have lot of members for the InvoiceID attribute,(Millions) so the MDX is un responsive… Is there a effient way to restrict the no of members that are being evaluated by the scope each time?

  8. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
    Chris Webb says:

    No, there isn’t – SCOPE statements aren’t evaluated at query time, so they static

Leave a ReplyCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.