A New Events-In-Progress DAX Pattern

I’ve been working on a very complex SSAS Tabular implementation recently, and as a result I’ve learned a few new DAX tricks. The one that I’m going to blog about today takes me back to my old favourite, the events-in-progress problem. I’ve blogged about it a lot of times, looking at solutions for MDX and DAX (see here and here), and for this project I had to do some performance tuning on a measure that uses a filter very much like this.

Using the Adventure Works Tabular model, the obvious way of finding the number of Orders on the Internet Sales table that are open on any given date (ie where the Date is between the dates given in the Order Date and the Ship Date column) is to write a query something like this:

EVALUATE

ADDCOLUMNS (

    VALUES ( 'Date'[Date] ),

    "OpenOrders",

    CALCULATE (

        COUNTROWS ( 'Internet Sales' ),

        FILTER( 'Internet Sales', 'Internet Sales'[Ship Date] > 'Date'[Date] ),

        FILTER( 'Internet Sales', 'Internet Sales'[Order Date] <= 'Date'[Date] )

    )

)

ORDER BY 'Date'[Date]

On my laptop this executes in around 1.9 seconds on a cold cache. However, after a bit of experimentation, I found the following query was substantially faster:

EVALUATE

ADDCOLUMNS (

    VALUES ( 'Date'[Date] ),

    "OpenOrders",

    COUNTROWS(

        FILTER(

            'Internet Sales',

            CONTAINS(

                DATESBETWEEN('Date'[Date]

                    , 'Internet Sales'[Order Date]

                    , DATEADD('Internet Sales'[Ship Date],-1, DAY))

                , [Date]

                , 'Date'[Date]

            )

        )

    )

)

ORDER BY 'Date'[Date]

On a cold cache this version executes in just 0.2 seconds on my laptop. What’s different? In the first version of the calculation the FILTER() function is used to find the rows in Internet Sales where the Order Date is less than or equal to the Date on rows, and where the Ship Date is greater than the Date. This is the obvious way of solving the problem. In the new calculation the DATESBETWEEN() function is used to create a table of dates from the Order Date to the day before the Ship Date for each row on Internet Sales, and the CONTAINS() function is used to see if the Date we’re interested in appears in that table.

I’ll be honest and admit that I’m not sure why this version is so much faster, but if (as it seems) this is a generally applicable pattern then I think this is a very interesting discovery.

Thanks to Marco, Alberto and Marius for the discussion around this issue…

UPDATE: Scott Reachard has some some further testing on this technique, and found that the performance is linked to the size of the date ranges. So, the shorter your date ranges, the faster the performance; if you have large date ranges, this may not be the best performing solution. See https://twitter.com/swreachard/status/349881355900952576

UPDATE: Alberto has done a lot more research into this problem, and come up with an even faster solution. See: http://www.sqlbi.com/articles/understanding-dax-query-plans/

30 thoughts on “A New Events-In-Progress DAX Pattern

  1. Chris, I was running your scripts against AdventureWorksDW2012 and getting unintended results. I think that the numbers in the OpenOrders column are not the number of Open Orders, but the total number of days between OrderDay and ShipDay. The Grand Total that I get, running your first script is 422,786 and the Grand Total of the second script is 440,504.

    So, I decided to create a PivotTable. First, I created a Relashionhip between FactInternetSales[OrderDate] and DimDate[FullDateAlternateKey]. Then I added a new calculated column to the FactInternetSales table, OrdersPending:=If([ShipDate]-[OrderDate]>0,1,0). Adding those two columns to the PivotTable gives me the correct number of Pending Orders for every date and a Grand Total of 60,398, which is the number of rows of the FactInternetSales table, given that there in no row with [OrderDate] = [ShipDate]. I also added another calculated column, TotalDays:=[ShipDate]-[OrderDate].

    Then I used the same logic of ([ShipDate]-[OrderDate]>0) on your script. I added two more columns. [TotalDays] to calculate the sum of Pending Days for each date. [AverageDays] to calculate the average Pending Days for the Orders Pending on each date. The number of days between [ShipDate] and [OrderDate] is always 7 for the whole FactInternetSales table. The [OpenOrders] Grand Total is 60,398 and [TotalDays] Grand Total is 422,786.

    To test the script a bit more I added two more calculated columns to insert into the pivot table, ShipDate1:=If([OrderDateKey]0),’FactInternetSales'[OrderDate])),
    “TotalDays”,
    CALCULATE(SUMX(FILTER(‘FactInternetSales’,(‘FactInternetSales'[ShipDate] – ‘FactInternetSales'[OrderDate])>0),’FactInternetSales'[ShipDate] – ‘FactInternetSales'[OrderDate])),
    “AverageDays”,
    CALCULATE(SUMX(FILTER(‘FactInternetSales’,(‘FactInternetSales'[ShipDate] – ‘FactInternetSales'[OrderDate])>0),’FactInternetSales'[ShipDate] – ‘FactInternetSales'[OrderDate])) /
    CALCULATE(COUNTX(FILTER(‘FactInternetSales’,(‘FactInternetSales'[ShipDate] – ‘FactInternetSales'[OrderDate])>0),’FactInternetSales'[OrderDate])),

    “OpenOrders1”,
    CALCULATE(COUNTX(FILTER(‘FactInternetSales’,(‘FactInternetSales'[ShipDate1] – ‘FactInternetSales'[OrderDate])>0),’FactInternetSales'[OrderDate])),
    “TotalDays1”,
    CALCULATE(SUMX(FILTER(‘FactInternetSales’,(‘FactInternetSales'[ShipDate1] – ‘FactInternetSales'[OrderDate])>0),’FactInternetSales'[ShipDate1] – ‘FactInternetSales'[OrderDate])),
    “AverageDays1”,
    CALCULATE(SUMX(FILTER(‘FactInternetSales’,(‘FactInternetSales'[ShipDate1] – ‘FactInternetSales'[OrderDate])>0),’FactInternetSales'[ShipDate1] – ‘FactInternetSales'[OrderDate])) /
    CALCULATE(COUNTX(FILTER(‘FactInternetSales’,(‘FactInternetSales'[ShipDate1] – ‘FactInternetSales'[OrderDate])>0),’FactInternetSales'[OrderDate]))
    )
    ORDER BY ‘DimDate'[FullDateAlternateKey]

  2. Chris, the second part of my comment was not pasted correctly, I don’t know why, but this is the second part:

    To test the script a bit more I added two more calculated columns to insert into the pivot table, ShipDate1:=If([OrderDateKey]0),’FactInternetSales'[OrderDate])),
    “TotalDays”,
    CALCULATE(SUMX(FILTER(‘FactInternetSales’,(‘FactInternetSales'[ShipDate] – ‘FactInternetSales'[OrderDate])>0),’FactInternetSales'[ShipDate] – ‘FactInternetSales'[OrderDate])),
    “AverageDays”,
    CALCULATE(SUMX(FILTER(‘FactInternetSales’,(‘FactInternetSales'[ShipDate] – ‘FactInternetSales'[OrderDate])>0),’FactInternetSales'[ShipDate] – ‘FactInternetSales'[OrderDate])) /
    CALCULATE(COUNTX(FILTER(‘FactInternetSales’,(‘FactInternetSales'[ShipDate] – ‘FactInternetSales'[OrderDate])>0),’FactInternetSales'[OrderDate])),

    “OpenOrders1”,
    CALCULATE(COUNTX(FILTER(‘FactInternetSales’,(‘FactInternetSales'[ShipDate1] – ‘FactInternetSales'[OrderDate])>0),’FactInternetSales'[OrderDate])),
    “TotalDays1”,
    CALCULATE(SUMX(FILTER(‘FactInternetSales’,(‘FactInternetSales'[ShipDate1] – ‘FactInternetSales'[OrderDate])>0),’FactInternetSales'[ShipDate1] – ‘FactInternetSales'[OrderDate])),
    “AverageDays1”,
    CALCULATE(SUMX(FILTER(‘FactInternetSales’,(‘FactInternetSales'[ShipDate1] – ‘FactInternetSales'[OrderDate])>0),’FactInternetSales'[ShipDate1] – ‘FactInternetSales'[OrderDate])) /
    CALCULATE(COUNTX(FILTER(‘FactInternetSales’,(‘FactInternetSales'[ShipDate1] – ‘FactInternetSales'[OrderDate])>0),’FactInternetSales'[OrderDate]))
    )
    ORDER BY ‘DimDate'[FullDateAlternateKey]

  3. Chris, I pasted a second time but the text is not right. Parts of the comment and the code are missing. When I pasted both times, the text looked right, but when I press the button “Post Comment” the results are not right. What should I do to get it right?

      • Hi David,

        I have to admit I’ve not tested this properly as a measure (so I don’t know what values it returns for Grand Totals), but it does return the correct values at the day level in the query I gave. Maybe part of the problem is that you’re not using the official Adventure Works Tabular model and building your own?

        Chris

  4. Chris, I just realized that I was not getting my math correctly. If we have 60,398 orders on the FactInternetSales table and it takes on average 7 days for every order to be shipped, then we have every order repeating for 7 days on average. That give us 60,398 * 7 = 422,786 days in total. So, your script is correct and I was on the wrong track. Sorry for that.

    Leaving my previous script aside, I added some lines to your script to calculate the average days that took to ship the orders that were pending on each day. I think that this time I got this right.

    EVALUATE
    ADDCOLUMNS (
    VALUES ( ‘DimDate'[FullDateAlternateKey] ),
    “OpenOrders”,
    CALCULATE (
    COUNTROWS ( ‘FactInternetSales’ ),
    FILTER( ‘FactInternetSales’, ‘FactInternetSales'[OrderDate] <= 'DimDate'[FullDateAlternateKey] ),
    FILTER( 'FactInternetSales', 'DimDate'[FullDateAlternateKey] < 'FactInternetSales'[ShipDate] )),
    "AverageDays",
    CALCULATE(
    SUMX( 'FactInternetSales', 'FactInternetSales'[shipDate] – 'FactInternetSales'[OrderDate] ),
    FILTER( 'FactInternetSales', 'FactInternetSales'[OrderDate] <= 'DimDate'[FullDateAlternateKey] ),
    FILTER( 'FactInternetSales', 'DimDate'[FullDateAlternateKey] < 'FactInternetSales'[ShipDate] )) /
    CALCULATE (
    COUNTROWS ( 'FactInternetSales' ),
    FILTER( 'FactInternetSales', 'FactInternetSales'[OrderDate] <= 'DimDate'[FullDateAlternateKey] ),
    FILTER( 'FactInternetSales', 'DimDate'[FullDateAlternateKey] < 'FactInternetSales'[ShipDate] ))
    )
    ORDER BY 'DimDate'[FullDateAlternateKey]

  5. Hello Chris, i have another question and was wondering if you could point me in the direction to start searching for a solution, I have a case similar to this Events in Progress scenario however, I need the COUNTROWS() function to be run on a table generated by a Summarize function.

    – I have Item Inventory table which have detailed movement of all items and I have a measure on this table that calculates the stock for any given day
    – I created a summarize function that shows (Day – Item – Quantity ) and I’m only showing rows that have 0 stock
    – I am trying to do an aggregation on the result of the summarize function to show me the no of days I had items out of stock per item.

    I understand I am getting into too much details and you are probably very busy but I would really appreciate if you can point me to the direction I can use to resolve the issue, been trying to resolve it for a couple of weeks with no luck at all :(

    Thanks a lot

  6. Chris I can’t thank you enough, your advice got it working :) :)

    Your blog and examples has made me accomplish today what I’ve been trying to finish for the past 8 days. Thank you

  7. Chris I’m wondering if you have any guidance for implementing this as a measure. You’re initial post was on this solution was very easy to apply to a model I’ve built but I’d like to know if this new method (or Alberto’s variation) would speed things up. However, I’m not totally sure how to make it all work as a measure. Any help would be appreciated.

      • In the previous solution you essentially built two measures. The running total of active values then subtracted the running total of inactive values to get the current events-in-progress at any point in time. I’m not sure how this method gets implemented as a measure. The query above seems to me to function more as a calculated column. For whatever reason I just can’t seem to wrap my head around converting the logic to a calculated measure in my fact table that returns the same values as the earlier solution.

  8. Chris,

    Your content is always much appreciated. I have a presumably simple question which I have not been able to track down an answer for. It is a simple “count all rows between dates”, very similar to your example. Two tables ; Customers ; id, fromdate, todate And Calendar ; date, week, month, year.

    So, user will want to trend and calculate # of customers in any which way, meaning count by week, month, year over year, etc.

    I guess I will need the same setup as you, unrelated tables. I have about 200k customers in the table over 5 years, so data volumes are not too bad.

    Would appreciate input. Cheers

    • I guess the first question you’d need to answer is how the users would choose their From and To dates. Would you want to make them separate tables, and separate selections, or would you want to make the selection on just one table?

      • Thanks for the reply.

        The users will choose period from time table, and a measure will show how many customers were active at that time. This is not customers per se, rather subscriptions with a from date and a to date if it was ended. If a user selects a date, the sql query for returning a count for that date would be extremely simple. I know however that once you start selecting a group of dates, for example month, the business requirements needs to be clear on how it should behave.

      • Yes, but I probably wasn’t clear about my question – do you want your users to makes two selections, a “from” date and a “to” date, from two different tables, or do you want them to make the selection as a range of dates from a single table?

      • Definitely as a selection from one date table. Storing data this way is quite common for a lot of the sources involved, and I am hoping there is an elegant and powerful solution to it with tabular.

      • Maybe I can fill in a bit.

        Creating a calculated column on date table :

        =CALCULATE(COUNTROWS(‘Subscription’);FILTER(Subscription;Subscription[FromDate]>=Date[Date] && Subscription[ToDate]=Date[Date])) — Returns correct number for each day, but is of course not the number I am looking for.

        So, one filter works fine.

        ToDate are blank for active subscriptions, but I have set them to the last date of date table to not have more factors than necessary for solving this basic problem.

        Another requirement is that the user would want to slice/filter the data based on columns in the Subscription table, for example type or product.

        Coming from the multidimensional side where this would be modeled quite differently is maybe not helping me much either. :)

      • Like a previous user here, wordpress removed some of my post.

        CALCULATE(COUNTROWS(‘Subscription’);FILTER(Subscription;Subscription[FromDate]>=Date[Date] && Subscription[ToDate]=Date[Date])) – Returns correct number for each day, but is of course not the number I am looking for.

      • I think WordPress has lost some of your code again :-)

        Anyway, I get the idea. Calculated columns aren’t going to do what you want; you need to create a measure here. I think what you’re missing is in the Filter() function you’d need to find the first and last dates in the selection made by the user (so if the user selected a year, you’d want to find the first and last dates in the year and then find the number of subscriptions active in that year). Therefore you’d need to use the FirstDate and LastDate functions to on your Date[Date] column.

  9. Wow…I have been scouring the Internet for this solution for over a week and I have Marcus Russo to thank me for pointing me in the right direction to you solution. I can see exactly how I can fit this in with my dilemma.
    Thanks Chris ;-)

  10. Hi Chris,

    I want to Get the SUM(EmployeeHeadcount) from EffectiveDT And ExpireDT Dates in Fact Tables Passing Date table as Reference in Tabular Cube

    Iam Trying with the Below DAX in Tabular Cubes..Still not getting the Expected results..
    First DAX —
    A:=CALCULATE(SUM(FactEmployee[EmployeeCount]),
    FILTER (
    FactEmployee,
    FactEmployee[TransEffectiveDT] = MAX(‘Date'[FullDT])
    ))
    With the Above DAX.. The Data is not Filtering Based on EffectiveDT And ExpireDT..Its Giving only the Values on Selected date Alone.Which works as Equals to =
    Second DAX —
    Test:=CALCULATE(SUM(FactEmployee[HeadCount]),
    FILTER (
    ALL(‘FactEmployee’),
    FactEmployee[TransEffectiveDT] = MAX (‘Fiscal Date'[FullDT])
    ))

    This one the data is Filtering, However Its giving the same Values for every Dimensions Attributes Selected..

    Example — Select Date(FullDt) Selected as 2015-05-02..Data is Filtering Between EffectiveDT And ExpireDT as 192000..However When i Drag and Drop of Employee Status Column from Employee Tabble Or EffectiveDT And ExpireDT From factTable..Its showing Y as 192000 and N as 192000…Not sure what is Breaking..Kindly Help me to get Measure Based on EffectiveDT And ExpireDT for any given date selected..

    Thank you

  11. The Above DAX bit Changed Automatically when i posted..Aim re posting the DAX..
    First DAX —
    A:=CALCULATE(SUM(FactEmployee[EmployeeCount]),
    FILTER (
    FactEmployee,
    FactEmployee[TransEffectiveDT] = MAX(‘Date'[FullDT])
    ))

    Second DAX —
    Test1:=CALCULATE(SUM(FactEmployee[HeadCount]),
    FILTER (
    ALL(‘FactEmployee’),
    FactEmployee[TransEffectiveDT] = MAX (‘Fiscal Date'[FullDT])
    ))

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s