Allocation in Power Query, Part 2

Last week’s post on allocation in Power Query caused quite a lot of interest, so I thought I would follow it up with a post that deals with a slightly more advanced (and more realistic) scenario: what happens if the contracts you are working with don’t all start on the same date?

Here’s the table of data that is the starting point for my examples:


I’ve made two changes:

  • I’ve added a contract name to serve as a primary key so I can uniquely identify each contract in the table. Several people asked me why I added index columns to my tables after my last post and this is why: without a way of uniquely identifying contracts I might end up aggregating values for two different contracts that happen to have the same number of months, contract amount and start date.
  • I’ve added a contract start date column which contains the date that the contract starts on, which is always the first day of a month.

Now let’s imagine that you want to make each monthly payment on the last day of the month. You need to take each contact and, for each monthly payment generate a row containing the date that is the last day of the month, containing the allocated payment amount.

Once again, having have opened the Query Editor the first step is to calculate the amount of the monthly payment using a custom column that divides Contract Amount by Months in Contract. This is shown in the Allocated Amount column:


Now to generate those monthly payment rows. Since this is reasonably complex I decided to declare a function to do this called EndsOfMonths inside the query, as follows:

= (StartDate, Months) =>
List.Transform(List.Numbers(1, Months), each Date.AddDays(Date.AddMonths(StartDate, _ ), -1))

This function takes the start date for contract and the number of months, and:

  • Uses List.Numbers() to create a list containing numbers from 1 to the number of months in the contract. For example if there were three months in the contract, this would return the list {1,2,3}
  • This list is then passed to List.Transform(), and for each item in the list it does the following:
    • Adds the given number of months to the start date, then
    • Subtracts one day from that date to get the payment date, which will be the last day of the month it is in

Calling this function on each row of the table in a new custom column (called Payment Date here) gives you a list of the payment dates for each contract:


All that you need to do then is to click on the Expand icon next to the Payment Date column header and make sure each column has the correct type, and you have your output for loading into the Excel Data Model:


Here’s the code for the query:


    //Load source data from Excel table

    Source = Excel.CurrentWorkbook(){[Name="Contract"]}[Content],

    //Add custom column for Allocated Amount

    InsertedCustom1 = Table.AddColumn(Source, "Allocated Amount", 

        each [Contract Amount]/[Months In Contract]),

    //Declare function for returning a list of payment dates

    EndsOfMonths = (StartDate, Months) => 

        List.Transform(List.Numbers(1, Months), 

            each Date.AddDays(Date.AddMonths(StartDate, _ ), -1)),

    //Call this function for each contract in a new custom column

    InsertedCustom = Table.AddColumn(InsertedCustom1, "Payment Date", 

        each EndsOfMonths([Contract Start Date], [Months In Contract]) ),

    //Expand the list

    #"Expand Payment Date" = Table.ExpandListColumn(InsertedCustom, "Payment Date"),

    //Set column data types

    ChangedType = Table.TransformColumnTypes(#"Expand Payment Date",

        {{"Contract Start Date", type date}, 

        {"Payment Date", type date}, {"Allocated Amount", type number}, 

        {"Contract Amount", type number}, {"Months In Contract", type number}})



There’s one more thing to do though. Since the Contract table contains real dates, it’s a very good idea to have a separate Date table in the Excel Data Model to use with it. I’ve already blogged about how to use a function to generate a Date table in Power Query (as has Matt Masson, whose version adds some extra features) and in that function (called CreateDateTable) can be reused here. Here’s a query that returns a Date table starting at the beginning of the year of the earliest start date in the contract table and ends at the end of the year of the last payment date:

    //Aggregate the table to find the min contract start date

    //and the max payment date

    GroupedRows = Table.Group(Contract, {}, 

    {{"Min Start Date", each List.Min([Contract Start Date]), type datetime}, 

    {"Max Payment Date", each List.Max([Payment Date]), type datetime}}),

    //Find the first day of the year of the min start date    

    #"Start Date" = DateTime.Date(Date.StartOfYear(GroupedRows{0}[Min Start Date])),

    //Find the last day of the year of the max payment date

    #"End Date" = DateTime.Date(Date.EndOfYear(GroupedRows{0}[Max Payment Date])),

    //Call CreateDateTable with these parameters

    DateTable = CreateDateTable(#"Start Date", #"End Date"),

    //Change data types

    ChangedType = Table.TransformColumnTypes(DateTable,{{"MonthNumberOfYear", type number}

    , {"DayOfWeekNumber", type number}})




You can now build a PivotTable to show the payments allocated over the correct ranges:


The sample workbook can be downloaded here.

Ninth Blog Birthday

This is the ninth anniversary of the first post on my blog, and every year at this time I take a moment to reflect on what’s happened in the last twelve months in my professional life and in the world of Microsoft BI.

Without a doubt 2013 has been the year of Power BI. It’s true we’ve had PowerPivot Power Pivot for a while now, but in my opinion the combination of Excel 2013, Power Query, Power Map, Power BI Sites and Q&A is a much stronger proposition for customers interested in self-service BI; I’ve already blogged at great length about what I think are the strengths and weaknesses of Power BI (see here, here and here) so I won’t repeat myself here. As you would expect Microsoft marketing has gone into overdrive to promote it (have you entered the Power BI competition yet?) and it will be in 2014 that we see whether Power BI is a success or not. What will success look like though, if it comes? It won’t be lots of customers lining up to buy Power BI in they way they’ve bought traditional BI solutions, I think: instead it will be organisations that have already signed up for Office 365 being upsold to Power BI based on their existing commitment to the Office platform. This presents a number of challenges to someone like me who makes a living as an independent consultant and trainer.

At the moment more than 90% of my consultancy income comes from SSAS and MDX, but the overall percentage of my time that I spend doing consultancy has reduced over the last few years to about 60%. This is partly the result of SSAS and MDX skills becoming more widespread; partly due to the fact that I’ve been promoting my public and private training more aggressively; and possibly due to fewer new SSAS projects kicking off. In the future I expect this trend to continue. Just how much consultancy will be necessary in the world of self-service Power BI solutions remains to be seen, but it’s going to be less than is necessary for corporate SSAS solutions and the rates will probably be lower too.

For the same reason, though, the demand for all forms of training for Power BI will almost certainly be much greater. That’s why I’ve been scheduling more public training courses through Technitrain; why I’ve signed up to write a book on Power Query next year; and why I’ve started recording video training courses with Project Botticelli (there’s a new video available there, by the way, on set and member functions). If I’m honest I prefer doing consultancy to training and I don’t think you can be a really good trainer if you don’t have a substantial amount of practical experience gained from consultancy, so I’m going to have to make a special effort to maintain a balance between the two.

Speaking at conferences and user groups is an aspect of my work that I’ve always really enjoyed, and I’m pleased to say that I’ll be speaking at the PASS BA Conference next year for example. I’m also still involved with SQLBits but please, please don’t ask me when the next SQLBits will be – we don’t have anything to announce yet but I can assure you we are working on it and I promise there will be one in 2014. I won’t be speaking at quite so many events as I have done in the past however. I travel a lot for work and this makes it hard to justify spending even more time away from my family, especially at weekends, so I’ve made a conscious decision to cut down on my speaking engagements. The thing is that the number of SQL Server events has increased a lot in the last couple of years and this has led to an increased number of invitations to speak, and I’m one of those people who finds it hard to say no when someone asks me to do something. I’m just going to have to be a bit more choosy from now on, and concentrate on events close to home, events that coincide with business trips and online sessions.

All that remains is to wish you all a happy and prosperous 2014, and to thank you for reading my blog! This time next year I’ll have been blogging for ten years, and that’s a scary thought…

Now(), The Formula Engine Cache And The Where Clause

Back in 2009 I blogged about how the use of the Now() function inside calculated members prevents the results of those calculations being cached for longer than the lifetime of a query. It might be worth rereading that post before you carry on to get some background on the problem here:

Today I had an interesting email from a customer (thank you, Per!) showing me something I didn’t know: that using the Now() function in the Where clause of a query has the same effect.

Here’s the repro. On the MDX Script of the Adventure Works cube, I created the following calculated measure:

CREATE MEMBER CURRENTCUBE.MEASURES.TEST AS [Measures].[Internet Sales Amount] + 1;

Now consider the following query:


{[Date].[Day Name].&[1]}

ON 1

FROM [Adventure Works]


STRTOMEMBER("[Date].[Calendar Year].&[" + "2003"  +  "]")



The first time the query is run you can see in Profiler the SSAS Storage Engine retrieving values; the second time it’s run you can see the values for MEASURES.TEST being returned from the Formula Engine cache.

Now consider this second query:


{[Date].[Day Name].&[1]}

ON 1

FROM [Adventure Works]


STRTOMEMBER("[Date].[Calendar Year].&[" + cstr(Year(Now())-10)  +  "]")



It returns the same result as the previous query (at least so long as the current year is 2013), but you can see in Profiler that the second time the query is run, once again the Storage Engine cache is queried and the Formula Engine cache is not used:


So clearly the use of the Now() function in the Where clause is enough to prevent the use of global scope by the Formula Engine cache, and some limited testing suggests the same applies for other non-deterministic functions like Username() – which is hardly surprising. Something to watch out for, then…

Technitrain 2014 Course Schedule: Power BI, Power Pivot, SSAS, MDX and more

I’m pleased to announce that all of the courses I’m running in London next year are now live on the Technitrain site. I’m teaching a lot of them, of course, but I’ve also got quite an impressive array of other trainers who’ll be teaching too: Allan Hirt, Klaus Aschenbrenner, Jamie Thomson, Bob Phillips, Andy Leonard and Thomas Kejser. If you’re looking for top quality SQL Server education then you know where to come!

Here’s the full list of courses:


BI Survey 13

Just a quick post to say that the BI Survey 13 is now open and would like your feedback about the BI tools you’re using. Here’s the link:

It’s worth doing because you’ll get a summary of the results and will get entered in a draw for some Amazon vouchers; and of course you get to make sure that lots of nice things are said about the Microsoft BI stack! I’ll blog about the results later in the year when they come out – the findings are always very interesting.

PASS Summit 2011–Day 3 Keynote

The third and final keynote at the PASS Summit, and this morning I’ve been given a space on the official blogger table at the keynote! Actually this just means I’ve got a table to rest my laptop on and access to a power strip, but it’s an honour nonetheless.

There are several things that I saw yesterday that are worth mentioning. Probably the most interesting session was from Cathy Dumas about Tabular: among other things she demoed a DAX Editor plugin for BIDS SQL Server Data Tools that is going to make everyone’s life soooo much easier; it will give us something like an MDX Script editor, intellisense, colour coding and so on. She has blogged about it here and I can’t wait for it to be made available. Also I came across the Data Explorer team site and blog; if you are interested in getting to play with it when it’s ready then you get your email address added to the invite list.

Anyway, back to the keynote and today it’s PASS Summit favourite Dr David DeWitt covering Big Data. It’s not a standard marketing session, more of a lecture, and all the better for it; DeWitt is a very talented speaker and more importantly takes a balanced approach to describing the SQL vs NoSQL debate. Interesting points to note:

  • He thinks that the arrival of NoSQL is not a paradigm shift, in the way that the move from hierarchical databases to relational databases was. The assertion that SQL is not dead, not surprisingly, goes down well with this audience.
  • Hadoop. I’m not even going to try to summarise this section of the talk but it is an excellent introduction to how it works, and if you’re even vaguely interested in Hadoop (which you should be given Thursday’s announcements) then you need to watch this – I believe will be available to view on demand somewhere (the slide deck is here). It is, honestly, the best explanation of all this I’ve ever seen and there are several thousand people here in this room who agree with me…
  • He does a comparison of Hive vs Parallel Data Warehouse on the same queries, same data and same hardware, and shows that PDW can outperform Hive by up to 10x. This demonstrates that a parallel database still has advantages over a NoSQL approach as far as performance goes in many cases, although of course each has its own strengths and weaknesses and performance isn’t the only consideration.

This was not only an enthralling and educational talk, but it was also great marketing from Microsoft’s point of view. You can announce Hadoop for Windows to a room full of SQL Server types and however many whizzy demos you do, and however much woo-hooing goes on, if we don’t really understand the technology we’ll go back to our day jobs and ignore it. On the other hand, teach us what the technology actually does and you’ll get us interested enough to try it out for ourselves and maybe even use it on a project.

Finally, if you’re at the Summit today come and find me at the Birds of a Feather lunch on the SSAS Performance Tuning table, or later on the BI Expert Pod this afternoon.

‘Events in Progress’ Part 3 – Average Age of Unpaid Invoices

Sorry for the relative quiet on here recently, I’ve been caught up with helping to organise SQLBits and other stuff… But anyway, this week I’m working with a company who asked me to tackle what must be a very common business problem: they have a fact table where one row represents an invoice they have generated, and they need to know at any point in time what the average age of all their unpaid invoices is, and monitor whether this value goes up or down. The point of this is, of course, is that if the average age of outstanding invoices starts to go up over time then it means that their customers are getting slower to pay them and they may potentially have a cashflow problem.

The solution is a variation on the ‘events in progress’ model that I blogged about recently but with an interesting twist that I thought was worth describing. First of all, we need to prepare some data so we can reproduce the scenario in Adventure Works – let’s pretend that the FactInternetSales table actually contains invoice data. We’ll need to massage the data a bit for two reasons:

  • In Adventure Works if you look at the OrderDate, ShipDate and DueDate columns there’s always exactly the same number of days between the dates, which doesn’t make for a great demo
  • In the real world some invoices will be unpaid when the fact table is loaded, others will be paid. We need to be able to handle both scenarios properly.

Once again, we need two fact tables. The first of these will contain all of our invoices, and for this I used the following named query in the Adventure Works DSV:

SELECT        CustomerKey, OrderDateKey
FROM            dbo.FactInternetSales

The second will be a fact table that contains information on invoices that have been paid:

CASE WHEN SalesTerritoryKey > 5 THEN ShipDateKey ELSE DueDateKey END AS PaidDateKey,
CASE WHEN SalesTerritoryKey > 5 THEN 7 ELSE 12 END AS DaysToPayment
FROM            dbo.FactInternetSales
WHERE        (PromotionKey < 3)

A few things to notice in this second query:

  • I’ve filtered the fact table to remove a very small number of rows where PromotionKey is less than 3 – this will give us a few unpaid invoices to make the numbers more interesting.
  • I’ve created a new date key called PaidDateKey to make the invoices have different payment lengths.
  • Because these are paid invoices, we know how long they took to be paid and we can precalculate this and put it in the fact table in the DaysToPayment column.

We can now build the cube. It needs two measure groups, one based on each fact table, and we can add the Customer dimension too. Three measures need to be built: one count measure on each measure group and a sum measure on the DaysToPayment column. The Date dimension joins to the OrderDateKey on the All Invoices measure group, and to the PaidDateKey on the Paid Invoices measure group.

Now for the MDX. The first thing we need to do is find the total number of invoices generated to date and the total number of invoices that have been paid to date (and also to make sure that these totals aggregate up properly through the time dimension); we can then subtract the second from the first to find the total number of unpaid invoices:


SCOPE(MEASURES.[Total Invoices To Date]);

        THIS = SUM(NULL:[Date].[Date].CURRENTMEMBER, [Measures].[All Invoices Count]);

    SCOPE([Date].[Date].[All Periods]);
        THIS = (TAIL(EXISTING [Date].[Date].[Date].MEMBERS, 1).ITEM(0).ITEM(0)
                , MEASURES.[Total Invoices To Date]);



SCOPE(MEASURES.[Total Paid Invoices To Date]);

        THIS = SUM(NULL:[Date].[Date].CURRENTMEMBER, [Measures].[Paid Invoices Count]);

    SCOPE([Date].[Date].[All Periods]);
        THIS = (TAIL(EXISTING [Date].[Date].[Date].MEMBERS, 1).ITEM(0).ITEM(0)
                , MEASURES.[Total Paid Invoices To Date]);


MEASURES.[Total Invoices To Date] - MEASURES.[Total Paid Invoices To Date];


This is the easy bit. In order to find out what the average of the number of days the unpaid invoices have actually been unpaid, we need to find the number of days that each unpaid invoice has been open for, sum that up, and divide it by the total number of unpaid invoices. And to find this sum of the number of days that all unpaid invoices are open, we need to do the following:

  • Find the total up to yesterday of the [Unpaid Invoices Count] measure. This gives the total number of days that every invoice has ever been open.
  • Then sum the total to date of the Days to Payment measure we created on the Paid Invoices fact table, to find the total number of days that all paid invoices were ever open
  • Then subtract this second figure from the first, which will give us the sum of the number of days all currently unpaid invoices have been open

Here’s the code:


SCOPE(MEASURES.[Sum of Ages of Invoices To Date]);

                , MEASURES.[Unpaid Invoices Count]);

    SCOPE([Date].[Date].[All Periods]);
        THIS = (TAIL(EXISTING [Date].[Date].[Date].MEMBERS, 1).ITEM(0).ITEM(0)
                , MEASURES.[Sum of Ages of Invoices To Date]);



SCOPE(MEASURES.[Sum of Ages of Paid Invoices To Date]);

        THIS = SUM(NULL:[Date].[Date].CURRENTMEMBER, [Measures].[Days To Payment]);

    SCOPE([Date].[Date].[All Periods]);
        THIS = (TAIL(EXISTING [Date].[Date].[Date].MEMBERS, 1).ITEM(0).ITEM(0)
                , MEASURES.[Sum of Ages of Paid Invoices To Date]);


MEASURES.[Sum of Ages of Invoices To Date] - MEASURES.[Sum of Ages of Paid Invoices To Date];


Finally, we can calculate the average like so:

IIF(MEASURES.[Unpaid Invoices Count]=0, NULL,
MEASURES.[Difference of Sum of Ages]/MEASURES.[Unpaid Invoices Count])
, FORMAT_STRING='#,#.00';


It’s pretty complex I’ll admit, but it does perform well. Here’s what the results look like in Excel:


I only wish I could come up with a way of getting some of my customers to pay this quickly…!