Automatically Generating Date Dimension Tables In Excel 2016 Power Pivot

As you probably know, whenever you are doing any kind of date or time-based calculations in DAX you should always have a separate Date table in your Power Pivot model. There are a number of ways of building these tables (see, for example, my Power Query query here) but they are all a bit of a hassle – which is why it’s so cool that, in Excel 2016, you can get one built automatically inside the Power Pivot window.

Consider the following table of sales data on an Excel worksheet:

Source

With this table loaded into the Data Model (and the Order Date column recognised as containing data of the Date data type), when you go to the Design tab in the Power Pivot window you’ll see the new Date Table button enabled:

Menu

Clicking on the New button will add a new date table to the Data Model, called Calendar:

CalendarBasic

This table is automatically marked as the Date Table in your model.

The table contains a continuous range of dates starting from the beginning of the year of the earliest date found in any column in any table in your Data Model, up to the end of the year containing the latest date found in any column in any table in your Data Model. Obviously, this means that your table could contain a very large date range if, for example, you have a Customer table containing a Date Of Birth column. Luckily, you also have the option of manually configuring the range of dates used by clicking the Update Range button:

Range

One other thing to point out is that the resulting table is a table like any other, so you can add, delete or rename columns as you wish. You should also be able to set the table back to its default state by using the Set Default menu option, but I couldn’t make that work (possibly it hasn’t been implemented yet – this post was written using the Excel 2016 Preview).

If you do make changes like adding calculated columns, such as the Month Year calculated column shown below:

Calc

You can then click the Save Configuration button to save the current state of the table as your default. This means that the next time you create a new Date table in the same workbook, the table will include any customisations. However these changes don’t seem to be applied in Date tables created in new workbooks – maybe this will also change before RTM?

All in all, this is a very handy feature that will save Power Pivot modellers a lot of time. I wonder if it uses the new Calendar() or CalendarAuto() DAX functions under the covers?

Using DateDiff() To Calculate Time Intervals In DAX

One of the most useful new additions to DAX in Excel 2016 and the Power BI Designer is the DateDiff() function. It does exactly what you would expect: calculate the amount of time in between two dates, and express that value as either seconds, minutes, hours, days, weeks, months, quarters or years.

Here’s a very simple table of dates:

image

With this table loaded into the Power BI Designer, you can add new calculated columns to the table by clicking the New Column button on the ribbon. Here are two calculated column definitions that give the number of days and the number of years between the Start Date and the End Date on each row:

DayDurationColumn = 
DATEDIFF(MyTable[Start Date], MyTable[End Date], DAY)

YearDurationColumn = 
DATEDIFF(MyTable[Start Date], MyTable[End Date], YEAR) 

The output is pretty much what you’d expect:

image

It is of course also possible to create measures that use the DateDiff() function to, for example:

YearDurationMeasure = 
DATEDIFF(
FIRSTDATE(MyTable[Start Date]), 
LASTDATE(MyTable[End Date]), 
YEAR)

image

All very straightforward, then, and much easier than having to calculate these values yourself.

The Use And Abuse Of The MDX Freeze Statement

The other day, while helping a customer with some particularly nasty MDX scoped assignments, I realised that there weren’t many good resources on the internet that explained how to use the MDX Freeze statement. It’s something I see used quite often, but usually because some MDX calculations aren’t giving the correct results and a developer has found that putting a Freeze statement in has fixed the problem – even if they don’t understand why it has fixed the problem. So, in this post I’ll explain what Freeze does, when you might want to use it, and when there are other other, better alternatives.

First of all, the basics. Imagine you have a super-simple cube and that, apart from the Calculate statement, the only MDX you have on the Calculations tab in the cube editor is the following:

CREATE MEMBER CURRENTCUBE.MEASURES.M1 AS 1;

CREATE MEMBER CURRENTCUBE.MEASURES.M2 AS NULL;

SCOPE(MEASURES.M2);
    THIS = MEASURES.M1;
END SCOPE;

If you query the cube in Excel, you’ll see the following:

image

No surprises here: we have created two calculated measures, M1 and M2, and then used a scoped assignment to set M2 to show the value of M1. It’s important to understand that the scope statement has not copied the value of M1 into M2, but acts more like a pointer so that M1 will always display the same value as M2 even if M1 subsequently changes. This means that when we add a second scope statement to the code that alters the value of M1, as follows:

CREATE MEMBER CURRENTCUBE.MEASURES.M1 AS 1;

CREATE MEMBER CURRENTCUBE.MEASURES.M2 AS NULL;

SCOPE(MEASURES.M2);
    THIS = MEASURES.M1;
END SCOPE;

SCOPE(MEASURES.M1);
    THIS = 2;
END SCOPE;

You see the following in your PivotTable:

image

This behaviour is the source of a lot of confusion! An assignment to one measure has indirectly changed the value of another measure, and of course in a real-world cube it can be very difficult to spot situations where this has happened and if you do, what other MDX has caused this to happen.

Each statement in the MDX Script of a cube adds an extra layer of calculations to it, called a calculation pass; this is true for all the calculations in the examples above. As new calculations are added, and new passes are created, the previous passes still exist and are still accessible. In the second example above, in the outermost calculation pass, the measure M2 returns the value 2 but at the previous calculation pass (as seen in the first example) it returned the value 1. The Freeze statement allows you to freeze the values returned by a subcube of cells at a given calculation pass, so that no future calculations will change those values.

Therefore, by taking our code and adding a Freeze statement to the first scoped assignment we can prevent the second scoped assignment changing the value of M2:

CREATE MEMBER CURRENTCUBE.MEASURES.M1 AS 1;

CREATE MEMBER CURRENTCUBE.MEASURES.M2 AS NULL;

SCOPE(MEASURES.M2);
    THIS = MEASURES.M1;
    FREEZE(THIS);
END SCOPE;

SCOPE(MEASURES.M1);
    THIS = 2;
END SCOPE;

Here’s the output now:

image

Another very common way that scoped assignments can affect the value of a cell is through the aggregation of the results of a calculation. This blog post (one of the most popular I’ve ever written) explains how this behaviour can be used to implement calculations like currency conversions and weighted averages. However, in other cases, this aggregation of a calculation is an unwanted and unexpected side effect of a scope statement and calculated values that you did want to be displayed instead get replaced with weird, meaningless values. The Freeze statement can be used to stop this happening but in actual fact it’s a much better idea to understand the cause of these problems and rewrite your calculations so that Freeze isn’t necessary.

Now, imagine that in your cube you have a regular (ie not calculated) measure called Sales Amount that has its AggregateFunction property set to Sum, and that you have a fairly standard Date dimension with a Year attribute hierarchy. A PivotTable with Sales Amount on columns and Year on rows looks like this in Excel:

image

If you add the following assignment to the cube, to change the value of the All Member on Year, the value of the Grand Total in the PivotTable (which is the All Member, even if that name isn’t shown) will be changed:

SCOPE([Date].[Year].[All], [Measures].[Sales Amount]);
    THIS = 123;
END SCOPE;

image

If, on the other hand, you remove that previous assignment and replace it with an assignment on the year 2001:

SCOPE([Date].[Year].&[2001], [Measures].[Sales Amount]);
    THIS = 456;
END SCOPE;

You’ll see that not only has the value for Sales Amount for the year 2001 changed, but that the value of the All Member has been changed too: the All Member represents the aggregated total of all the years, so therefore if a year value has changed, the All Member value must change the reflect this:

image

What happens if we try to combine the two previous scope statements?

SCOPE([Date].[Year].[All], [Measures].[Sales Amount]);
    THIS = 123;
END SCOPE;

SCOPE([Date].[Year].&[2001], [Measures].[Sales Amount]);
    THIS = 456;
END SCOPE;

In this case, the output is exactly the same as with the previous example (although the measure formatting has also been lost):

image

This is because even though the first Scope statement successfully changed the value of the All Member, the aggregation of values triggered by the second Scope overwrote this value. Although you can’t see this happening in Excel, where you only see the values returned at the final calculation pass of the cube, the MDX Script Debugger can be used to see the values returned for a query at all the different passes so you can work out what’s going on.

The Freeze statement can be used to stop the second Scope from overwriting the first, like so:

SCOPE([Date].[Year].[All], [Measures].[Sales Amount]);
    THIS = 123;
    FREEZE(THIS);
END SCOPE;

SCOPE([Date].[Year].&[2001], [Measures].[Sales Amount]);
    THIS = 456;
END SCOPE;

image

However, in my opinion it makes a lot more sense to change the order of the Scope statements so that the assignment to 2001 doesn’t overwrite the assignment to the All Member:

SCOPE([Date].[Year].&[2001], [Measures].[Sales Amount]);
    THIS = 456;
END SCOPE;

SCOPE([Date].[Year].[All], [Measures].[Sales Amount]);
    THIS = 123;
END SCOPE;

The end result is the same:

image

Why do I prefer this approach to the use of Freeze? Two reasons:

  1. It works with natural MDX behaviour rather than trying to fight against it. In this case it’s just one line of code less, but in the real world it could result in a much greater reduction. It’s true that you have to put a lot of thought into the ordering of your calculations, but I don’t think you can get away from that. Using Freeze to make your calculations work properly without understanding why it’s needed results in much more complex code, often with duplicated calculations because Freeze still doesn’t give the desired results, and is frankly a bit of a hack.
  2. There are, or at least were, performance implications with the use of Freeze. In Analysis Services 2005 I saw a few cases where the use of Freeze contributed to poor query performance, and where reordering scope statements so that it was no longer necessary made performance better. I’m not sure whether this is still the case with SSAS 2014 but it may well be.

I see Freeze abused most often in financial cubes, when scope statements are used to define calculations on a chart of accounts hierarchy. Sometimes I have even seen the same calculation code appear in several places in the same MDX Script, just to make sure that the calculations always return the right result – all because the calculations on the chart of accounts dimension are aggregating up and overwriting each other. In this case the simple rule you have to remember is to always scope the calculations on the lowest level of the hierarchy first, then scope the calculations on the second-lowest level, and so on working your way up to the top of the hierarchy. This way you can be sure that your scope will never aggregate up and overwrite the result of another calculation.

Apart from that, I also see Freeze used when a cube contains a Date Tool dimension that uses regular members instead of calculated members, in the way described here. Now there are a lot of good reasons to use regular members on a Date Tool dimension (it will work with all versions of SSAS and Excel for instance) but I have also seen a lot of cases where the fact that you are scoping calculations on regular measures, which may then get aggregated up accidentally, has caused a lot of problems – not only resulting in incorrect values appearing, but also making query performance worse. For that reason, nowadays I prefer to use calculated members on my Date Tool dimension rather than regular members.

Power Query/Excel 2016 VBA Examples

In Excel 2016, Power Query is no longer an Excel add-in but a native feature of Excel, and what’s more, you can now use VBA to create and manage Power Query queries.

I’ve found two sources of information about how to use VBA with Power Query in Excel 2016. First, there are some code samples on the Technet Gallery here:
https://gallery.technet.microsoft.com/VBA-to-automate-Power-956a52d1#content
…and Gil Raviv, a Program Manager at Microsoft, has also asked for feedback on this functionality on this thread:
https://social.technet.microsoft.com/Forums/en-US/1eac9c36-b6e4-48f0-a51a-fa92b24cf1d9/vba-and-power-query-in-excel-2016-preview-lets-get-started?forum=powerquery

Secondly, I was contacted recently by Tycho Grouwstra who shared with me some of the interesting work he has done using VBA and Power Query in the Excel 2016 Preview, and who has very kindly allowed me to blog about it here. His work is much more representative of how I think most people will want to use this feature.

Tycho sent me a .xlsm file containing all of the VBA code, which you can download here. Obviously the code only works in the Excel 2016 Preview, but you can still open the file and look at the code in Excel 2013. However if you’re worried about downloading a workbook with macros in, I extracted the code to a text document which you can see here. If you want to copy the code to use in your own workbook, you’ll need to go to the VBA Editor, select Tools/References and add a reference to “Microsoft ActiveX Data Objects 6.1 Library”.

image

The VBA code includes examples of how to:

  • Delete all the Power Query queries in a workbook
  • Export/import the M code for all queries to/from another Excel workbook
  • Export/import the M code for all queries to text files
  • Refresh all the Power Query queries in the workbook
  • Load a query to an Excel table

A few bugs/features in the Preview are also pointed out, namely:

  • Imported queries don’t always show up in the Workbook Queries pane; the workaround is to close and reopen the workbook
  • Functions aren’t recognised as functions (ie they don’t have the fx icon) until you open the Query Editor and the Close & Load
  • Query groups aren’t supported yet – which is a bit of an oversight, in my opinion, but the forums thread linked to above indicates it won’t be addressed before RTM unfortunately
  • Loading the output of a query into an Excel table using the code given here doesn’t seem to have the same result as loading a query to a table in the worksheet using the Power Query UI: it creates a blue, rather than green, table that doesn’t always retain row order.

I can imagine a lot of serious Power Query users will create workbooks containing a library of their most useful queries and functions, and use VBA code to copy these queries and functions into new workbooks as and when necessary. We’ll have to wait and see what Microsoft’s plans for sharing Power Query queries are, whether they’ll go beyond what’s already been seen in Office 365 Power BI, whether they will be part of a bigger bundle of services and what the cost will be.

Incidentally, the sample workbook contains a lot of interesting, generally useful Power Query queries and functions written by Tycho and others which is also available in the following GitHub repository: https://github.com/tycho01/pquery

Technitrain Courses For The Rest Of 2015

We’re almost halfway though the year and so I would draw your attention to some of the training courses we have coming up at Technitrain. Since you are the the discerning sort of person who reads my blog, then you are probably also going to be the discerning sort of person who is interested in the kind of expert-led Microsoft BI and SQL Server training that Technitrain runs! To get regular updates on new courses, you can sign up for our mailing list here.

Introduction to Power BI with Chris Webb, June 12th, Edinburgh

I know this is only next Friday but there are still places available. It’s a one-day precon for SQL Saturday Edinburgh BI Edition, which is well worth attending too. On this course I’ll give you a full introduction to Power BI, what it is, when to use it, and where it’s going.

SSIS Design Patterns with Andy Leonard, September 7th-10th, London

If you’ve already got some SSIS experience but want to find out about design patterns, best practices, BIML and new features in SSIS 2012/2014, this is the course for you.

Mission Critical SQL Server with Allan Hirt, September 28th-October 1st, London

Allan (http://sqlha.com/) is probably the number 1 expert on high availability for SQL Server, and this course will cover features such as AlwaysOn Availability Groups, true multi-site failover clustering instances, support for Windows Server Core, and what’s new in SQL Server 2016.

Introduction to MDX with Chris Webb, October 12th-14th, London

Yes, MDX isn’t cool or fashionable any more, but if you’re working with SSAS Multidimensional you still need to know it. On this course I’ll teach you all you need to know about MDX queries and calculations.

Practical Data Science – Azure Machine Learning, SQL Data Mining and R with Rafal Lukawiecki, October 19th-22, London

I’m particularly excited about this course, in part because the subject matter is so interesting (and so now) and in part because Rafal has such an amazing reputation as a trainer and a speaker, so I know it will be great. You can see a very detailed course agenda here.

Video Training

If you can’t make it over to the UK, don’t forget that a lot of my training material on MDX and SSAS has been recorded and is available on subscription at Project Botticelli, along with videos from Rafal, Marco Russo, Alberto Ferrari and others. Register with the discount code TECHNITRAIN2015 for a 10% discount.

Drawing Lines On Maps With Power Map And Power Query

Recently, I was working with a customer that wanted to be able to draw lines between two points in Power Map. At the time I thought the only way that it might be possible was by using Power Query to generate a table of data containing a series of points that were so close together that they looked like a single line, and then plot these points in Power Map (similar to what I show in the screenshot here). Soon after, the new custom regions functionality was released in Power Map (there’s no documentation I can find right now, but this blog post is reasonably detailed) and I wondered whether now it might be possible to draw lines. Unfortunately not: Power Map can now import SHP and KML files, but it doesn’t support all the features of KML – only polygons (and even then not all the features of polygons, although inner and outer boundaries work fine). I guess this is ok for the primary use-case of Power Map, which is plotting BI data on a map, but it would be nice to see more KML features supported so that Power Map can show richer supporting information for the data: things like arrows showing direction of travel, and so on.

Anyway, I then thought – why not use polygons to draw these lines? Again, I hit a slight problem: I wanted to generate the polygons for the lines in code, and Power Map can only import SHP or KML data from files. It would be really useful if we could use shape data stored in the Excel Data Model… but we can’t. However, it is possible to use Power Query to generate KML and then copy and paste this code into a file, which can then be imported into Power Map. So, just for the fun of it, I put together a proof-of-concept workbook containing Power Query queries to generate all the tables and KML code needed to draw lines between two places, and a Power Map tour that shows the output. Here’s what the end result looks like:

image

You can download my example workbook that contains all the code, plus all the supporting files, here. You will need to update some file paths in the M code to get it all to work.

The starting point is two tables on the worksheet, one containing the single starting point for the lines, the other all of the destinations:

image

There’s a lot of M code so I’m not going to include it in this post, but here’s an overview of what each query does:

  • GetLatLong is a function that calls the Bing Maps REST API to find the latitude and longitude for each place in the tables above. You will need your own Bing Maps account key if you want to use this code yourself – you can get one at https://www.bingmapsportal.com/
  • Starting Point and Ending Points simply load the data from the Excel tables
  • StartingPointLatLong gets the latitude and longitude of the starting point by calling GetLatLong
  • StartEndPoints gets the latitude and longitude of all the ending points by calling GetLatLong, adds custom columns to show the starting point name, latitude and longitude against each ending point, and loads the result to the Excel Data Model. You have to have some data in the Excel Data Model for Power Map to display the lines, and it’s important that Power Map can match the values in one column in this table with the names of objects in the KML file.
  • BasicPolygonTemplate loads a fragment of KML, containing the definition of a polygon, from a text file. This contains two ‘parameters’, @Name and @Coordinates, which will be overwritten using Text.Replace() later on when the actual KML is being generated.
  • GetCoordinateList is a function to draw a rectangular polygon that represents the line between the starting point and an ending point. I had a lot of fun trying to get the code for this working properly (I wish I could remember any of the trigonometry that I learned after the age of 13…) and I’m still not convinced the rectangles are properly rectangular, but they’re good enough.
  • KML generates the KML for all of the polygons. The output of this query must be copied from the Power Query query window into a text file with the .kml extension, for example Test.kml. There’s no need to load the output of this query to anywhere.

image

With all of that done, you now need to open Power Map and create a new tour. Choose EndingPoint as the sole Geography column, then choose Custom Region (.kml, .shp) from the dropdown list below and click Yes to import custom regions.

image

Select the .kml file you created earlier, containing the output of the KML Power Query query, and then click Import:

image

Finally, change the visualisation type to Region and optionally add Ending Point to Category to make the lines different colours:

image

And bingo, you see the lines:

image

Support for custom regions is a massive step forward for Power Map in my opinion: rather than just being a toy for creating flashy demos it’s now able to handle a lot more real-world requirements. However, having some way of programmatically creating regions and shapes (either through Power Query as I’ve done here, or using VBA or some other API), being able to load shape data from the Excel Data Model, or even just to be able to draw shapes on a map manually, would be welcome. I’m no mapping expert but I’ve come across a few frustrated Mappoint (which was discontinued at the end of 2014) users who would like to use Power Map but find that it can’t do everything that they need. The code in this post shows what’s possible but it’s still way too complex for most users and hardly an elegant solution.

Using SelectColumns() To Alias Columns In DAX

A few years ago I wrote this post on how to alias columns in a table in DAX, using a combination of AddColumns() and Summarize(). The good news is that in Excel 2016/the Power BI Designer/SSAS Tabular 2016 there’s a new DAX function specifically for this purpose: SelectColumns(). Here’s an example of how it can be used:

Imagine you have the following source table, called Products:

image

You can write a DAX query to get all the rows and columns from this table like so:

EVALUATE Products

Here’s the output of that query in DAX Studio (and remember, DAX Studio can connect to data loaded into the Power BI Designer, which is what I’m doing here):

image

You can alias the columns in this table using SelectColumns() very easily, like so:

EVALUATE
SELECTCOLUMNS (
    Products,
    "Column One", Products[Product],
    "Column Two", Products[Colour]
)

Here’s the output:

image

The syntax for SelectColumns() is straightforward: the first parameter is a table expression, and after that there are pairs of parameters consisting of:

  • A new column name
  • An expression returning a column from the table given in the first parameter

As you can see in the output of the query above, I’ve renamed the Product column “Column One” and the Colour column “Column Two”.

This means I can now crossjoin a table with itself without needing to worry about conflicting column names, like so:

EVALUATE
CROSSJOIN (
    Products,
    SELECTCOLUMNS (
        Products,
        "Column One", Products[Product],
        "Column Two", Products[Colour]
    )
)

image

One other interesting thing to note about SelectColumns() is that it allows you to do projection in a DAX query easily – as Marco notes here, it was possible before but it wasn’t pleasant. For example, the query:

EVALUATE
SELECTCOLUMNS (
    Products,
    "Just Colour", Products[Colour]
)

Returns:

image

Notice how there are three rows in the output here and that the value Green occurs twice. If you’re a true DAX afficionado, you might get excited about that.