Sorry for the relative quiet on here recently, I’ve been caught up with helping to organise SQLBits and other stuff… But anyway, this week I’m working with a company who asked me to tackle what must be a very common business problem: they have a fact table where one row represents an invoice they have generated, and they need to know *at any point in time* what the average age of all their unpaid invoices is, and monitor whether this value goes up or down. The point of this is, of course, is that if the average age of outstanding invoices starts to go up over time then it means that their customers are getting slower to pay them and they may potentially have a cashflow problem.

The solution is a variation on the ‘events in progress’ model that I blogged about recently but with an interesting twist that I thought was worth describing. First of all, we need to prepare some data so we can reproduce the scenario in Adventure Works – let’s pretend that the FactInternetSales table actually contains invoice data. We’ll need to massage the data a bit for two reasons:

- In Adventure Works if you look at the OrderDate, ShipDate and DueDate columns there’s always exactly the same number of days between the dates, which doesn’t make for a great demo
- In the real world some invoices will be unpaid when the fact table is loaded, others will be paid. We need to be able to handle both scenarios properly.

Once again, we need two fact tables. The first of these will contain all of our invoices, and for this I used the following named query in the Adventure Works DSV:

SELECT CustomerKey, OrderDateKey

FROM dbo.FactInternetSales

The second will be a fact table that contains information on invoices that have been paid:

SELECT

CustomerKey,

OrderDateKey,

CASE WHEN SalesTerritoryKey > 5 THEN ShipDateKey ELSE DueDateKey END AS PaidDateKey,

CASE WHEN SalesTerritoryKey > 5 THEN 7 ELSE 12 END AS DaysToPayment

FROM dbo.FactInternetSales

WHERE (PromotionKey < 3)

A few things to notice in this second query:

- I’ve filtered the fact table to remove a very small number of rows where PromotionKey is less than 3 – this will give us a few unpaid invoices to make the numbers more interesting.
- I’ve created a new date key called PaidDateKey to make the invoices have different payment lengths.
- Because these are paid invoices, we know how long they took to be paid and we can precalculate this and put it in the fact table in the DaysToPayment column.

We can now build the cube. It needs two measure groups, one based on each fact table, and we can add the Customer dimension too. Three measures need to be built: one count measure on each measure group and a sum measure on the DaysToPayment column. The Date dimension joins to the OrderDateKey on the All Invoices measure group, and to the PaidDateKey on the Paid Invoices measure group.

Now for the MDX. The first thing we need to do is find the total number of invoices generated to date and the total number of invoices that have been paid to date (and also to make sure that these totals aggregate up properly through the time dimension); we can then subtract the second from the first to find the total number of unpaid invoices:

CREATE MEMBER CURRENTCUBE.MEASURES.[Total Invoices To Date] AS NULL; SCOPE(MEASURES.[Total Invoices To Date]); SCOPE([Date].[Date].[Date].MEMBERS); THIS = SUM(NULL:[Date].[Date].CURRENTMEMBER, [Measures].[All Invoices Count]); END SCOPE; SCOPE([Date].[Date].[All Periods]); THIS = (TAIL(EXISTING [Date].[Date].[Date].MEMBERS, 1).ITEM(0).ITEM(0) , MEASURES.[Total Invoices To Date]); END SCOPE; END SCOPE; CREATE MEMBER CURRENTCUBE.MEASURES.[Total Paid Invoices To Date] AS NULL; SCOPE(MEASURES.[Total Paid Invoices To Date]); SCOPE([Date].[Date].[Date].MEMBERS);

THIS = SUM(NULL:[Date].[Date].CURRENTMEMBER, [Measures].[Paid Invoices Count]); END SCOPE; SCOPE([Date].[Date].[All Periods]); THIS = (TAIL(EXISTING [Date].[Date].[Date].MEMBERS, 1).ITEM(0).ITEM(0) , MEASURES.[Total Paid Invoices To Date]); END SCOPE; END SCOPE; CREATE MEMBER CURRENTCUBE.MEASURES.[Unpaid Invoices Count] AS MEASURES.[Total Invoices To Date] - MEASURES.[Total Paid Invoices To Date];

This is the easy bit. In order to find out what the average of the number of days the unpaid invoices have actually been unpaid, we need to find the number of days that each unpaid invoice has been open for, sum that up, and divide it by the total number of unpaid invoices. And to find this sum of the number of days that all unpaid invoices are open, we need to do the following:

- Find the total up to yesterday of the [Unpaid Invoices Count] measure. This gives the total number of days that every invoice has ever been open.
- Then sum the total to date of the Days to Payment measure we created on the Paid Invoices fact table, to find the total number of days that all paid invoices were ever open
- Then subtract this second figure from the first, which will give us the sum of the number of days all currently unpaid invoices have been open

Here’s the code:

CREATE MEMBER CURRENTCUBE.MEASURES.[Sum of Ages of Invoices To Date] AS NULL; SCOPE(MEASURES.[Sum of Ages of Invoices To Date]); SCOPE([Date].[Date].[Date].MEMBERS); THIS = SUM(NULL:[Date].[Date].CURRENTMEMBER.PREVMEMBER , MEASURES.[Unpaid Invoices Count]); END SCOPE; SCOPE([Date].[Date].[All Periods]); THIS = (TAIL(EXISTING [Date].[Date].[Date].MEMBERS, 1).ITEM(0).ITEM(0) , MEASURES.[Sum of Ages of Invoices To Date]); END SCOPE; END SCOPE; CREATE MEMBER CURRENTCUBE.MEASURES.[Sum of Ages of Paid Invoices To Date] AS NULL; SCOPE(MEASURES.[Sum of Ages of Paid Invoices To Date]); SCOPE([Date].[Date].[Date].MEMBERS); THIS = SUM(NULL:[Date].[Date].CURRENTMEMBER, [Measures].[Days To Payment]); END SCOPE; SCOPE([Date].[Date].[All Periods]); THIS = (TAIL(EXISTING [Date].[Date].[Date].MEMBERS, 1).ITEM(0).ITEM(0) , MEASURES.[Sum of Ages of Paid Invoices To Date]); END SCOPE; END SCOPE; CREATE MEMBER CURRENTCUBE.MEASURES.[Difference of Sum of Ages] AS MEASURES.[Sum of Ages of Invoices To Date] - MEASURES.[Sum of Ages of Paid Invoices To Date];

Finally, we can calculate the average like so:

CREATE MEMBER CURRENTCUBE.MEASURES.[Average Age of Unpaid Invoices] AS IIF(MEASURES.[Unpaid Invoices Count]=0, NULL, MEASURES.[Difference of Sum of Ages]/MEASURES.[Unpaid Invoices Count]) , FORMAT_STRING='#,#.00';

It’s pretty complex I’ll admit, but it does perform well. Here’s what the results look like in Excel:

I only wish I could come up with a way of getting some of my customers to pay this quickly…!

Hi Chris,

Could you tell me the approximate time it took for the query to come out on your laptop? I also have a similar requirement and was checking if the performance would be within the acceptable limits.

Cheers,

Jason

It was pretty much instant, but I have a good laptop and there are only 60000 records in the FactInternetSales table. It’s probably about as fast as any other solution though.

Hi Chris,

Nice solution, I can see a real business use for this. On a similar note, if you have access to the financial data, you can calculate the DSO ( days sales outstanding ). This is a measure used in finance to get at a similar number, though you would need a fact snapshot to calculate over time. The calculation, which you can google, is simpler however. It would be interesting to see how they compare.

This may not address other applications for your solution, so thank you for posting.

Mark

Hi Mark

Can you post a link to the DSO calculations via MDX, I need to do this myself but am struggling to find one that fits the bill.

Thanks

Mike

Hi Mark

Can you post a link to the DSO calculations via MDX, I need to do this myself but am struggling to find one that fits the bill.

Thanks

Mike

(duplicate to receive updates)

Hi Chris.

One question about mdx resolution for this part :

THIS = SUM(NULL:[Date].[Date].CURRENTMEMBER, [Measures].[All Invoices Count]);

When the currentmember is a day, does it means ssas will retrieve data for each individual day and make the sum after? or the query engin is smart enough to combine daily aggregation (for the month of the currentmember) and monthly aggregation (for previous monthes)

My concern is of y have 20 years of data and choose date in 2011 it can have a lot of cells to read.

If it use only daily aggregation, does this optimisation make sense (the goal is to force it to use monthly aggregations and to use daily aggregation only fors days in the current month):

scope ([Date].[Date].[Date].members);

THIS = SUM(NULL:[Date].[Date].CURRENTMEMBER.parent.prevmember, [Measures].[All Invoices Count])

+SUM([Date].[Date].CURRENTMEMBER.parent.firstchild:[Date].[Date].CURRENTMEMBER, [Measures].[All Invoices Count]);

end scope;

if your answer is “yes” , it means it can be more optimized if we go to search yearly aggregation.

thanks

Hi Nicolas,

Yes, it means SSAS will retrieve the data for each individual day and make the sum. SSAS does not take monthly, quarterly or yearly values but in actual fact there are two problems with doing this:

1) There could be calculations in place that mean resolving a date range to months, quarters and/or years would return incorrect results.

2) More importantly (and despite what you might think), in all recent versions of SSAS it’s almost always quicker to sum up a set of dates and not try to do the optimisation you’re thinking of. In the days of SSAS 2000 I did use this optimisation successfully (my second-ever blog post was on this subject: https://cwebbbi.wordpress.com/2004/12/31/tuning-ytd-style-calculations/) but today an expression like THIS = SUM(NULL:[Date].[Date].CURRENTMEMBER, [Measures].[All Invoices Count]); is going to be faster. I know this seems strange but you have to trust me 🙂

Chris

Hi Chris and thanks for reply so quickly

yesterday i try to implement it on a support ticket tracking cube, the goal is to calculate backlog (works fine) and to calculate average delay to close tickets.

On the first year perf was good but if i some some date range far from the beginning response times are poor (like 15/20 sec on a core I7 + 8Go ram + ssd, it is ssas 2K8R2, aggregation are designed, only 3 dimensions and less than 50 000 rows for 6 years of data)

i implemented my workaround and go back under 1 sec whatever the date range i pick.

I will try to put mdx studio and check what happens on both queries and provide some materials.

Nicolas

Hi Chris, this was very helpful to a current project I am working on. I do have a question and I was hoping that you might have come accross a similar scenario. I have two dates, a StartDate and a CancellationDate. I need to find the total number of accounts and where the StartDate is less than a given date and the Cancellation Date is greater than the given date. I have tried several variation of the mdx and don’t get the numbers that I should have. Do you have any suggestions?

I can’t really make any suggestions without seeing your cube and MDX, I’m afraid – this sounds like a classic events-in-progress problem, so the MDX in this and earlier posts should work. What kind of problems are you seeing with the numbers?

Here is the SQL code I run that gets me the numbers I should have in the cube:

Start Date is 7/16/2012 and end date is 7/22/2012

Select COUNT(*) as Verified from ods.Account

where UtilityAcceptanceDate @EndDate

I get 1,047,594 when I run the query in SQL which is the correct number

When I look at the Cube and pull in the week I get 3,715 if I expand the week and add dates I get approximately 586,000 for each day. Not sure why this is working like it is I have a standard cube structure. I am temporarily pulling in the table twice and joining it to the date dimension one on UtilityAcceptanceDate and on for UtilityCancelDate. Below is my MDX:

CALCULATE;

CREATE MEMBER CURRENTCUBE.[Measures].[AcceptedCount]

AS null;

Scope(Measures.[AcceptedCount]);

Scope([Date].[Date].members);

This = sum(null:[Date].[Date].currentmember,[Measures].[AcceptanceCounts] );

End scope;

end scope;

CREATE MEMBER CURRENTCUBE.[Measures].[SalesCount]

AS null;

Scope(Measures.[SalesCount]);

Scope([Date].[Date].members);

This = sum(null:[Date].[Date].currentmember, [Measures].[SalesCounts]);

End scope;

end scope;

CREATE MEMBER CURRENTCUBE.[Measures].NonVerified

AS [Measures].[AcceptedCount]-[Measures].[SalesCount],

VISIBLE = 1;

CREATE MEMBER CURRENTCUBE.[Measures].[NonVerified2]

AS [Measures].[AcceptanceCounts]-[Measures].[SalesCounts],

VISIBLE = 1 ;

CREATE MEMBER CURRENTCUBE.[Measures].[LostCount]

AS null;

Scope(Measures.[LostCount]);

Scope([Date].[Date].members);

This = sum(null:[Date].[Date].currentmember, [Measures].[LostAccounts]);

End scope;

end scope;

CREATE MEMBER CURRENTCUBE.[Measures].Verified

AS [Measures].[AcceptedCount]-[Measures].[LostCount],

VISIBLE = 1;

Thanks for looking at this…

Sorry the SQL should be:

Select COUNT(*) as Verified from ods.Account

where UtilityAcceptanceDate @EndDate

Have you got many missing dates, ie rows where one or both of the date values is null, or has a default value a long time in the past, or uses the unknown member? That could be one cause of this problem.

Any records with nulls get a date of 12/31/2070

And are there a lot of these records? Could this be part of the explanation? Actually, I’m not sure it is, but it’s worth investigating. Do the AcceptedCount and SalesCounts measures return reasonable numbers?

The acceptedcount and the SalesCounts match to what I should have. I am still way over with the verified by about 900,000 records.

It’s going to be difficult to debug this via comments, I think. Feel free to send me an email with more detail and some screenshots; you can find my contact details at http://www.crossjoin.co.uk.

I Chris

I have a similar problem and I hope that you can help me. I have a fact table with flights from clients. Each line is a flight of a client. What I need is the average days between flights for all clients on a selected period. If the client only made one flight then it should be zero.

Can you help?

Thanks

I think this is actually a different, easier problem. What I’d do here is, in the ETL, calculate the days since the customer’s last flight and store it in a measure in the fact table. You can then create a sum measure from it in the cube, and then to get the average create a second measure that counts the number of flights and divides the sum by the count.

Hi Chris,

how about extending this to show Aging Buckets 0-30, 31-60 , over 60?

It would be possible (see http://tomislav.piasevoli.com/2009/08/24/implementing-buckets-in-olap/) but I suspect performance may be bad…

Very interesting article showing an outstanding technique. Many many thanks for this article Chris. So for example, in excel, the formula for calculating the “Average Age of Unpaid Invoices” would be:

“Sum of Days the Unpaid Invoices Remained Unpaid” divided by “Number of Unpaid Invoice

is this correct ?

Thank you for your help.

Basically, yes.