Using DateDiff() To Calculate Time Intervals In DAX

One of the most useful new additions to DAX in Excel 2016 and the Power BI Designer is the DateDiff() function. It does exactly what you would expect: calculate the amount of time in between two dates, and express that value as either seconds, minutes, hours, days, weeks, months, quarters or years.

Here’s a very simple table of dates:

image

With this table loaded into the Power BI Designer, you can add new calculated columns to the table by clicking the New Column button on the ribbon. Here are two calculated column definitions that give the number of days and the number of years between the Start Date and the End Date on each row:

DayDurationColumn = 
DATEDIFF(MyTable[Start Date], MyTable[End Date], DAY)

YearDurationColumn = 
DATEDIFF(MyTable[Start Date], MyTable[End Date], YEAR) 

The output is pretty much what you’d expect:

image

It is of course also possible to create measures that use the DateDiff() function to, for example:

YearDurationMeasure = 
DATEDIFF(
FIRSTDATE(MyTable[Start Date]), 
LASTDATE(MyTable[End Date]), 
YEAR)

image

All very straightforward, then, and much easier than having to calculate these values yourself.

Using SelectColumns() To Alias Columns In DAX

A few years ago I wrote this post on how to alias columns in a table in DAX, using a combination of AddColumns() and Summarize(). The good news is that in Excel 2016/the Power BI Designer/SSAS Tabular 2016 there’s a new DAX function specifically for this purpose: SelectColumns(). Here’s an example of how it can be used:

Imagine you have the following source table, called Products:

image

You can write a DAX query to get all the rows and columns from this table like so:

EVALUATE Products

Here’s the output of that query in DAX Studio (and remember, DAX Studio can connect to data loaded into the Power BI Designer, which is what I’m doing here):

image

You can alias the columns in this table using SelectColumns() very easily, like so:

EVALUATE
SELECTCOLUMNS (
    Products,
    "Column One", Products[Product],
    "Column Two", Products[Colour]
)

Here’s the output:

image

The syntax for SelectColumns() is straightforward: the first parameter is a table expression, and after that there are pairs of parameters consisting of:

  • A new column name
  • An expression returning a column from the table given in the first parameter

As you can see in the output of the query above, I’ve renamed the Product column “Column One” and the Colour column “Column Two”.

This means I can now crossjoin a table with itself without needing to worry about conflicting column names, like so:

EVALUATE
CROSSJOIN (
    Products,
    SELECTCOLUMNS (
        Products,
        "Column One", Products[Product],
        "Column Two", Products[Colour]
    )
)

image

One other interesting thing to note about SelectColumns() is that it allows you to do projection in a DAX query easily – as Marco notes here, it was possible before but it wasn’t pleasant. For example, the query:

EVALUATE
SELECTCOLUMNS (
    Products,
    "Just Colour", Products[Colour]
)

Returns:

image

Notice how there are three rows in the output here and that the value Green occurs twice. If you’re a true DAX afficionado, you might get excited about that.

Thoughts On All The Recent Power BI/SQL Server 2016 BI/Excel 2016 News

The last few weeks have seen more Microsoft BI-related announcements in a short time than I can ever remember before. Some of them I’ve blogged about; most I’ve at least tweeted. For good summaries of what’s coming for Power BI, on-premises SQL Server BI and Excel 2016 I can recommend the following posts by other people, all of which are worth reading:

http://www.jenunderwood.com/2015/05/14/sql-server-bi-2016/

http://www.jenunderwood.com/2015/04/23/april-microsoft-bi-world-news/

http://byobi.com/blog/2015/05/ssas-related-enhancements-in-sql-server-2016/

https://gqbi.wordpress.com/2015/05/14/bi-nsight-excel-2016-power-bi-updates-including-new-data-sources-azure-sql-data-warehouse/

https://gqbi.wordpress.com/2015/05/07/bi-nsight-sql-server-2016-power-bi-updates-microsoft-azure-stack/

Even then I’m not sure everything has been covered, and because new stuff is coming thick and fast (custom regions in Power Map! DirectQuery/ROLAP in the cloud with Power BI connecting to Azure SQL Database!) it’s hardly worth trying. However, I do think this is as good a point as any to work out what I think about all this activity and where Microsoft is heading.

SSAS Multidimensional Improvements

I’m well past the stage of feeling angry about the neglect of SSAS Multidimensional over the past few years, and I’m genuinely grateful that it’s getting some investment rather than nothing at all. That said, I’m not sure which customers asked for Netezza support or DBCC – they aren’t things I’ve ever needed. The promised performance improvements are where I expect the real value to be, and on their own they will probably give existing customers reason enough to upgrade to 2016. It would have been nice to get even one new feature from this list though.

SSAS Tabular Improvements

As expected, the Tabular engine in SSAS 2016 gets a lot of new stuff for free because of its shared heritage with other Power BI tools. My feeling is that uptake of Tabular has been slower than it should have been because 2012 was, frankly, a bit v1.0 with all the immaturity that implies, and there haven’t been any substantial improvements since then. With 2016, though, it looks like Tabular will take a great leap forward and as a result be seen as a much more capable platform. There will certainly be fewer reasons to choose Multidimensional over Tabular, although for applications that require complex calculations (such as financial applications) Multidimensional will still have the upper hand. The more reasons I have to love Tabular, the less I’ll worry about the lack of new features in Multidimensional.

Power Query And The Corporate/Self-Service BI Crossover

As regular readers of this blog may have noticed, I like Power Query a lot and I’m pleased to see that it has extended its reach into corporate BI. Power Query as a data source for SSAS will be important for scenarios where Power Pivot models are upgraded to server-side solutions; I don’t think it will be a good idea to use Power Query if you’re building an SSAS solution from scratch though. Power Query in SSIS was another predictable development and one which should make it easier to work with certain data sources (such as Excel files); the existing ability to publish the output of an SSIS package as an OData feed using the Data Streaming Destination, which can then be consumed by Power Query, could open up some interesting scenarios where a user builds a data set in Power Query and publishes it via SSIS for consumption by other Power Query users.

It’s the promised integration of Power Query and SSRS that excites me most though. I asked for it here and it looks like my wish has been granted! As well as providing access to a wider range of data sources and a common ‘get data’ experience with other tools, I think it will be the key to making SSRS and in particular Report Builder the self-service BI tool that so many customers want it to be. Report Builder has struggled with two problems since it first appeared: first, make it easier for users to lay out a nice-looking report on a canvas, something that the current version does a reasonable job of I think; and second, make it easy for non-technical users (who, for example, might have little or no SQL knowledge) to get data from data sources for their reports – this is where it has not succeeded in the past, and where Power Query could make all the difference. Power Query, among other things, is a solid, user friendly, SQL generation tool. This, plus the fact that SSRS will be updated for all modern browsers and get new visualisations and report themes etc, means that the vast number of existing SSRS customers will have a lot of good reasons to upgrade to 2016, and when they do they’ll also find it easy to integrate with the rest of Power BI.

Power BI: Will Anyone Buy It?

It’s very easy for Microsoft BI fanboys like me to get all worked up by the constant drip feed of tweets about new Power BI features. An impartial observer will point out that some of these features, like the ability to change the colours of your charts in Power View, are actually things we should be embarrassed at not having already. Nonetheless I think it’s fair to say that Microsoft are doing a good job of getting its core customers excited about Power BI and there’s also a lot of evidence that people outside this core at, at least, curious, so from a marketing perspective everything’s going well.

Even if the marketing is good, that will only get Power BI evaluated. Those evaluations will only turn into purchases if the product itself is up to the task. Microsoft set itself an extremely difficult task when it decided to change the direction of Power BI and deliver a respectable version 1.0 this year; the impressive speed that new features are arriving at suggests that they will manage it. When this product is put side-by-side with competing tools it will have some advantages – Power Query is excellent, the Power Pivot engine is fast and can handle all kinds of complex calculations – but will inevitably appear immature in other respects such as visualisation. I think the limit on the amount of data that can be held in a single data model, either on the desktop or in the cloud, is also something that will be a problem for those of us who are used to building server-side SSAS solutions that can hold all the data the user ever needs to see. Maybe DirectQuery/ROLAP on SQL Azure and perhaps Azure SQL Data Warehouse will make this irrelevant? Overall though in my opinion the version of ‘new’ Power BI that will RTM later this year will be seen as more than good enough from a technical standpoint, and if this rate of change is maintained for version 2.0 then it will be something special.

I also think that the focus on building APIs and connectors to other web services is a really clever move. There are a lot of other vendors out there who don’t want to build their own BI functionality, and if Microsoft can convince them to use Power BI that will bring a lot of customers on board. Even at this early stage it looks like Microsoft is doing a good job of recruiting these vendors (SQL Sentry for example, but there are many others) as well as getting other teams inside Microsoft (like Visual Studio Online) to do the same. Close integration with new Microsoft services like Azure Stream Analytics and Azure SQL Data Warehouse should have a similar effect, although less pronounced given that these new services will have few users initially.

While I admit the divorce from Excel was the right thing to do in the circumstances, I still find that I prefer working in Excel over the Power BI Dashboard Designer. Maybe that’s partly due to habit, but Power View still has a long way to go before it has the flexibility of Excel PivotTables and especially cube formulas. That’s why I think Marco Russo’s campaign to create an API for the Dashboard Designer and to support external connections from Excel and other tools is so important. If you haven’t voted already, please do so now! This would be a killer feature in that it would allow you to continue to build reports in Excel (maybe 32-bit) while still making use of new features in the engine. It would give use all the good things we have today with the Excel Power add-ins and more. It would also, as Marco points out, be another reason for third party vendors to use the Power BI platform.

The final factor to consider is price. Making the Dashboard Designer free is important, because it’s not just a Dashboard Designer but a complete, standalone desktop self-service BI solution in itself. Many customers will use it as such without buying a Power BI subscription – that is, if they know that is an option. The free/$9.99 cloud subscription model is also very attractive, and all in all the new pricing model is a refreshing change from the nightmare that ‘old’ Power BI licensing was. I wonder if there will be any particular incentives (financial or otherwise) for partners to sell or recommend Power BI to their customers? If not,there probably should be.

Conclusion

Overall, I’m happier with the direction that Microsoft BI is going in than I have been for a long time. Power BI now seems like it has some momentum behind it, and that it is a coherent product rather than a collection of (individually impressive) tools bound into Excel that, for one reason or another, customers couldn’t use to their full potential. We’ll have to see whether it does become a commercial success or not but I think it has a good chance of doing so now. Excel 2016 also has some welcome improvements, even if it is now the ‘slow track’ for self-service BI; the more users discover Power Pivot and Power Query via Excel 2013 and soon 2016, the more likely it is that they’ll start using the rest of the Power BI stack.

Meanwhile it seems like at last there is at last a serious commitment to improve the on-premises SQL Server BI stack on the part of Microsoft. Some time ago I wrote a post on why corporate BI and self-service BI are both necessary and I still stand by what I said there; it’s also clear that a lot of customers, especially enterprise customers and especially in Europe, are not yet ready to put their most valuable data in the cloud. Microsoft has the chance to be one of the few vendors with great self-service and corporate BI stories, and great on-premises and cloud BI stories. Also, given that today’s SQL Server BI customers are the most likely to become tomorrow’s Power BI customers, keeping them happy in the medium term while Power BI matures should be a priority.

Let’s see where we are this time next year…?

Nested Calculations In Power Query

Quite often, in Power Query, you want to ‘nest’ calculations and transformations – apply them not across the whole table, but repeat the same calculation or transformation across multiple groups of rows within that table. Let me give you an example…

Take the following input table:

image

Imagine you wanted to add a column showing the rank of each row by Sales. In Power Query you would just need to:

  1. Load the data
  2. Sort the table in descending order by Sales
  3. Add an index column starting at 1, which is the rank

You would end up with the following output:

image

…and here’s the M code, all of which is generated by the UI:

let
    Source = Excel.CurrentWorkbook(){[Name="Sales"]}[Content],
    #"Sorted Rows" = Table.Sort(Source,{{"Sales", Order.Descending}}),
    #"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Rank", 1, 1)
in
    #"Added Index"

Now imagine you wanted to rank Sales within each month. That’s to say you want to apply the same steps as shown in the previous query but for each month individually to get an output like this:

image

One way to tackle this, and problems like it, is to do the following. First, do a Group By on the column you want to repeat the calculation over, in this case Month, and use the All Rows aggregation operation. This will result in a table with one row for each month and a column containing nested tables, as shown below:

image

Each one of these tables contains the rows from the original table for the month.

You can then take your original transformation and turn it into a function, either in a separate query or as a step in your current query. Here’s an example of how the query above can be turned into a function that takes a table and returns a table with a rank column added:

(tabletorank as table) as table =>
     let
      SortRows = Table.Sort(tabletorank,{{"Sales", Order.Descending}}),
      AddIndex = Table.AddIndexColumn(SortRows, "Rank", 1, 1)
     in
      AddIndex

 

Next, you need to pass each nested table to this function. You could do that in a calculated column, but the most elegant way I think is by using the Table.TransformColumns() function which takes a function and applies it to every value in a column (see here for another example of how to use it).

Finally, you get the final output by clicking on the Expand icon in the AllRows column and then choosing to expand all the columns in the nested table except the ones you originally grouped on:

image

image

Here’s the full M code:

let
    //Get data from Excel
    Source = Excel.CurrentWorkbook(){[Name="Sales"]}[Content],
    //Group by Month
    Grouped = Table.Group(Source, {"Month"}, {{"AllRows", each _, type table}}),
    //Declare a function that adds a Rank column to a table
    RankFunction = (tabletorank as table) as table =>
     let
      SortRows = Table.Sort(tabletorank,{{"Sales", Order.Descending}}),
      AddIndex = Table.AddIndexColumn(SortRows, "Rank", 1, 1)
     in
      AddIndex,
    //Apply that function to the AllRows column
    AddedRank = Table.TransformColumns(Grouped, {"AllRows", each RankFunction(_)}),
    //Expand the tables in the AllRows column again
    ExpandAgain = Table.ExpandTableColumn(AddedRank, "AllRows", 
       {"Product", "Sales", "Rank"}, {"Product", "Sales", "Rank"})
in
    ExpandAgain

You can download the example workbook here.

This pattern could also be applied to other types of calculation that need to be nested, for example running totals or shares.

Improving Power Query Calculation Performance With List.Buffer()

I saw an interesting post the other day on the Power Query Technet forum which showed how the List.Buffer() function could be used to improve calculation performance. This is something I’d seen hinted at in other places so I thought it was worth a bit of investigation.

Consider the following query:

let
    //Connect to SQL Server
    Source = Sql.Database("localhost", "adventure works dw"),
    //Get first 2000 rows from FactInternetSales
    dbo_FactInternetSales = Table.FirstN(
          Source{[Schema="dbo",Item="FactInternetSales"]}[Data], 
          2000),
    //Remove unwanted columns
    RemoveColumns = Table.SelectColumns(
          dbo_FactInternetSales,
          {"SalesOrderLineNumber", "SalesOrderNumber","SalesAmount"}),
    //Get sorted list of values from SalesAmount column
    RankValues = List.Sort(RemoveColumns[SalesAmount], Order.Descending),
    //Calculate ranks
    AddRankColumn = Table.AddColumn(RemoveColumns , "Rank", 
          each List.PositionOf(RankValues,[SalesAmount])+1)
in
    AddRankColumn

 

It gets the first 2000 rows from the FactInternetSales table in the Adventure Works DW database, removes most of the columns, and adds a custom column that shows the rank of the current row based on its Sales Amount.

image

On my laptop it takes around 35 seconds to run this query – pretty slow, in my opinion, given the amount of data in this table.

However, using the List.Buffer() function in the RankValues step like so:

let
    //Connect to SQL Server
    Source = Sql.Database("localhost", "adventure works dw"),
    //Get first 2000 rows from FactInternetSales
    dbo_FactInternetSales = Table.FirstN(
          Source{[Schema="dbo",Item="FactInternetSales"]}[Data], 
          2000),
    //Remove unwanted columns
    RemoveColumns = Table.SelectColumns(
          dbo_FactInternetSales,
          {"SalesOrderLineNumber", "SalesOrderNumber","SalesAmount"}),
    //Get sorted list of values from SalesAmount column
    //And buffer them!
    RankValues = List.Buffer(List.Sort(RemoveColumns[SalesAmount], Order.Descending)),
    //Calculate ranks
    AddRankColumn = Table.AddColumn(RemoveColumns , "Rank", 
          each List.PositionOf(RankValues,[SalesAmount])+1)
in
    AddRankColumn

 

Makes the query run in just 2 seconds. The List.Buffer() function stores the sorted list of values used to calculate the rank in memory which means it will only be evaluated once; in the original query it seems as though this step and those before it are being evaluated multiple times. Curt Hagenlocher’s comment (on this thread) on what List.Buffer() does for a similar calculation is telling:

The reason for this is that M is both functional and lazy, so unless we buffer the output of List.Select, we’re really just building a query that needs to be evaluated over and over. This is similar to the Enumerable functions in LINQ, if you’re familiar with those.

Table.Buffer() and Binary.Buffer() functions also exist, and do similar things.

A few other points to make:

  • This is not necessarily the optimal way to calculate ranks in Power Query – it’s just an example of how List.Buffer() can be used.
  • In the first query above, query folding is not taking place. If it had been it’s likely that performance would have been better. Since using List.Buffer() explicitly prevents query folding from taking place, it could make performance worse rather than better because of this in many cases.
  • I’m 100% certain you’ll get much better performance for a rank calculation by loading the table to the Excel Data Model/Power Pivot and writing the calculation in DAX. You should only really do calculations like this in Power Query if they are needed for other transformations in your query.

Power Query Announcements At The PASS BA Conference

There were a couple of big (well, big if you’re a Power Query fan like me) announcements made today by Miguel Llopis at the PASS BA Conference:

  • Today Power Query is available only to people who have Excel Professional Plus or Excel standalone, but as of May a version of Power Query will be available on every Excel SKU. There will be some limitations around data sources that are supported if you don’t have Excel Professional Plus, but that’s ok – this change will make it much easier for people to learn about and use Power Query, and I’m really happy about that.
  • Other new features coming in the May update of Power Query include the ability to turn off prompts about native database queries (useful in this scenario, for example), OData v4.0 support, the ability to use alternative Windows credentials to run queries, and a couple of new transformations such as removing empty rows.
  • Excel 2016 – where Power Query is now native to Excel – will have support for creating Power Query queries using VBA and macro recording. I understand you won’t be able to edit individual steps in a query, but you’ll be able to create and delete queries programmatically and change where they load their data too.
  • Excel 2016 will also support undo/redo for Power Query and give you the ability to copy/paste queries (even from workbook to workbook).
  • There was a commitment that Power Query in Excel 2016 will keep getting updates on a regular basis, rather than get tied to the much slower Office release cycle, so it retains parity with the Power Query functionality in the Power BI Dashboard Designer.

All very cool stuff!

Power BI And Excel 2016 BI News

There have been quite a few Power BI and Office BI-related announcements over the last few weeks, and while I’ve tweeted about them (I’m @Technitrain if you’re not following me already) I though it would be a good idea to summarise them all in one post.

Power BI Announcements at Convergence and SQLBits

You’ve probably already seen the announcement today on the Power BI blog that Power BI is FINALLY available to those of us outside the USA:

http://blogs.msdn.com/b/powerbi/archive/2015/03/16/power-bi-preview-now-available-worldwide.aspx

At last! I’m sure MS had very good reasons why they couldn’t make the Power BI Preview available worldwide back in December, but this decision caused a lot of frustration in the MS BI community and I hope it’s not something that happens again. I can also confirm that the Power BI iPhone app is now available in the UK as well. The new data sources for Power BI that are coming soon – especially Google Analytics – will be very popular I think.

While I’m on the topic of Power BI, a few interesting nuggets about upcoming functionality emerged at SQLBits last week. Kasper mentioned that there will be some new DAX functions appearing in Power BI soon: Median, Percentile, DateDiff and XPNV. Presumably they will appear when we get the ability to create DAX measures and calculated columns in the Power BI Dashboard Designer. Also, following on from the bidirectional relationships functionality I blogged about earlier this year, there was the news that Power BI will also understand 1:1 relationships as well as 1:many, many:1 and many:many.

Office 2016 Preview BI Features

The Office 2016 preview went public today too:

http://blogs.office.com/2015/03/16/announcing-the-office-2016-it-pro-and-developer-preview/

There’s a great overview of what’s new for BI in Office 2016 here:

https://support.office.com/en-gb/article/Whats-new-in-Office-2016-Preview-4841f061-d019-45cc-af74-3e89c8cff1c4#data

The main points are:

  • Power Query is now a native feature of Excel 2016.
  • Power View works on SSAS Multidimensional (this is only going to work on the versions of SSAS Multidimensional that support DAX queries, ie SSAS 2014 or SSAS 2012 SP2)
  • New Excel forecasting functions
  • Time grouping functionality in PivotTables

I’ll be writing a more detailed blog on all of this at some point soon, once I know what’s officially public and what isn’t.

The Power Query announcement is interesting because, as things stand at the moment, we’ll be able to use full Power Query, Power Pivot and Power View functionality for free in the Power BI Dashboard Designer, but in Excel the same functionality is restricted to users of the Professional Plus SKUs. This is crazy, and I hope Microsoft makes the Power add-ins available for every SKU of Excel 2016. Have you signed the petition for this yet?

Power Map

Last week the Power Map team released a new video showcasing functionality from an upcoming release:

https://www.youtube.com/watch?v=aP-vZfC3Fd4&feature=youtu.be

Although there are no details about what is shown in the video, it certainly looks like the ability to use custom shapes (the main missing feature in Power Map up to now) will be coming soon.

PowerMap

Wow, psychedelic…

Surface Hub

Finally, BI is clearly one of the main use-cases of the new Surface Hub (see also this video):

SufaceHubPowerBI_small

I wonder if I can justify buying one for demo purposes?