Automatically Generating Date Dimension Tables In Excel 2016 Power Pivot

As you probably know, whenever you are doing any kind of date or time-based calculations in DAX you should always have a separate Date table in your Power Pivot model. There are a number of ways of building these tables (see, for example, my Power Query query here) but they are all a bit of a hassle – which is why it’s so cool that, in Excel 2016, you can get one built automatically inside the Power Pivot window.

Consider the following table of sales data on an Excel worksheet:

Source

With this table loaded into the Data Model (and the Order Date column recognised as containing data of the Date data type), when you go to the Design tab in the Power Pivot window you’ll see the new Date Table button enabled:

Menu

Clicking on the New button will add a new date table to the Data Model, called Calendar:

CalendarBasic

This table is automatically marked as the Date Table in your model.

The table contains a continuous range of dates starting from the beginning of the year of the earliest date found in any column in any table in your Data Model, up to the end of the year containing the latest date found in any column in any table in your Data Model. Obviously, this means that your table could contain a very large date range if, for example, you have a Customer table containing a Date Of Birth column. Luckily, you also have the option of manually configuring the range of dates used by clicking the Update Range button:

Range

One other thing to point out is that the resulting table is a table like any other, so you can add, delete or rename columns as you wish. You should also be able to set the table back to its default state by using the Set Default menu option, but I couldn’t make that work (possibly it hasn’t been implemented yet – this post was written using the Excel 2016 Preview).

If you do make changes like adding calculated columns, such as the Month Year calculated column shown below:

Calc

You can then click the Save Configuration button to save the current state of the table as your default. This means that the next time you create a new Date table in the same workbook, the table will include any customisations. However these changes don’t seem to be applied in Date tables created in new workbooks – maybe this will also change before RTM?

All in all, this is a very handy feature that will save Power Pivot modellers a lot of time. I wonder if it uses the new Calendar() or CalendarAuto() DAX functions under the covers?

Using DateDiff() To Calculate Time Intervals In DAX

One of the most useful new additions to DAX in Excel 2016 and the Power BI Designer is the DateDiff() function. It does exactly what you would expect: calculate the amount of time in between two dates, and express that value as either seconds, minutes, hours, days, weeks, months, quarters or years.

Here’s a very simple table of dates:

image

With this table loaded into the Power BI Designer, you can add new calculated columns to the table by clicking the New Column button on the ribbon. Here are two calculated column definitions that give the number of days and the number of years between the Start Date and the End Date on each row:

[sourcecode language='text'  padlinenumbers='true']
DayDurationColumn = 
DATEDIFF(MyTable[Start Date], MyTable[End Date], DAY)

YearDurationColumn = 
DATEDIFF(MyTable[Start Date], MyTable[End Date], YEAR) 
[/sourcecode]

The output is pretty much what you’d expect:

image

It is of course also possible to create measures that use the DateDiff() function to, for example:

[sourcecode language='text' ]
YearDurationMeasure = 
DATEDIFF(
FIRSTDATE(MyTable[Start Date]), 
LASTDATE(MyTable[End Date]), 
YEAR)
[/sourcecode]

image

All very straightforward, then, and much easier than having to calculate these values yourself.

The Use And Abuse Of The MDX Freeze Statement

The other day, while helping a customer with some particularly nasty MDX scoped assignments, I realised that there weren’t many good resources on the internet that explained how to use the MDX Freeze statement. It’s something I see used quite often, but usually because some MDX calculations aren’t giving the correct results and a developer has found that putting a Freeze statement in has fixed the problem – even if they don’t understand why it has fixed the problem. So, in this post I’ll explain what Freeze does, when you might want to use it, and when there are other other, better alternatives.

First of all, the basics. Imagine you have a super-simple cube and that, apart from the Calculate statement, the only MDX you have on the Calculations tab in the cube editor is the following:

[sourcecode language=”text” padlinenumbers=”true”]
CREATE MEMBER CURRENTCUBE.MEASURES.M1 AS 1;

CREATE MEMBER CURRENTCUBE.MEASURES.M2 AS NULL;

SCOPE(MEASURES.M2);
THIS = MEASURES.M1;
END SCOPE;
[/sourcecode]

If you query the cube in Excel, you’ll see the following:

image

No surprises here: we have created two calculated measures, M1 and M2, and then used a scoped assignment to set M2 to show the value of M1. It’s important to understand that the scope statement has not copied the value of M1 into M2, but acts more like a pointer so that M1 will always display the same value as M2 even if M1 subsequently changes. This means that when we add a second scope statement to the code that alters the value of M1, as follows:

[sourcecode language=”text”]
CREATE MEMBER CURRENTCUBE.MEASURES.M1 AS 1;

CREATE MEMBER CURRENTCUBE.MEASURES.M2 AS NULL;

SCOPE(MEASURES.M2);
THIS = MEASURES.M1;
END SCOPE;

SCOPE(MEASURES.M1);
THIS = 2;
END SCOPE;
[/sourcecode]

You see the following in your PivotTable:

image

This behaviour is the source of a lot of confusion! An assignment to one measure has indirectly changed the value of another measure, and of course in a real-world cube it can be very difficult to spot situations where this has happened and if you do, what other MDX has caused this to happen.

Each statement in the MDX Script of a cube adds an extra layer of calculations to it, called a calculation pass; this is true for all the calculations in the examples above. As new calculations are added, and new passes are created, the previous passes still exist and are still accessible. In the second example above, in the outermost calculation pass, the measure M2 returns the value 2 but at the previous calculation pass (as seen in the first example) it returned the value 1. The Freeze statement allows you to freeze the values returned by a subcube of cells at a given calculation pass, so that no future calculations will change those values.

Therefore, by taking our code and adding a Freeze statement to the first scoped assignment we can prevent the second scoped assignment changing the value of M2:

[sourcecode language=”text” firstline=”7″]
CREATE MEMBER CURRENTCUBE.MEASURES.M1 AS 1;

CREATE MEMBER CURRENTCUBE.MEASURES.M2 AS NULL;

SCOPE(MEASURES.M2);
THIS = MEASURES.M1;
FREEZE(THIS);
END SCOPE;

SCOPE(MEASURES.M1);
THIS = 2;
END SCOPE;
[/sourcecode]

Here’s the output now:

image

Another very common way that scoped assignments can affect the value of a cell is through the aggregation of the results of a calculation. This blog post (one of the most popular I’ve ever written) explains how this behaviour can be used to implement calculations like currency conversions and weighted averages. However, in other cases, this aggregation of a calculation is an unwanted and unexpected side effect of a scope statement and calculated values that you did want to be displayed instead get replaced with weird, meaningless values. The Freeze statement can be used to stop this happening but in actual fact it’s a much better idea to understand the cause of these problems and rewrite your calculations so that Freeze isn’t necessary.

Now, imagine that in your cube you have a regular (ie not calculated) measure called Sales Amount that has its AggregateFunction property set to Sum, and that you have a fairly standard Date dimension with a Year attribute hierarchy. A PivotTable with Sales Amount on columns and Year on rows looks like this in Excel:

image

If you add the following assignment to the cube, to change the value of the All Member on Year, the value of the Grand Total in the PivotTable (which is the All Member, even if that name isn’t shown) will be changed:

[sourcecode language=”text”]
SCOPE([Date].[Year].[All], [Measures].[Sales Amount]);
THIS = 123;
END SCOPE;
[/sourcecode]

image

If, on the other hand, you remove that previous assignment and replace it with an assignment on the year 2001:

[sourcecode language=”text”]
SCOPE([Date].[Year].&[2001], [Measures].[Sales Amount]);
THIS = 456;
END SCOPE;
[/sourcecode]

You’ll see that not only has the value for Sales Amount for the year 2001 changed, but that the value of the All Member has been changed too: the All Member represents the aggregated total of all the years, so therefore if a year value has changed, the All Member value must change the reflect this:

image

What happens if we try to combine the two previous scope statements?

[sourcecode language=”text”]
SCOPE([Date].[Year].[All], [Measures].[Sales Amount]);
THIS = 123;
END SCOPE;

SCOPE([Date].[Year].&[2001], [Measures].[Sales Amount]);
THIS = 456;
END SCOPE;
[/sourcecode]

In this case, the output is exactly the same as with the previous example (although the measure formatting has also been lost):

image

This is because even though the first Scope statement successfully changed the value of the All Member, the aggregation of values triggered by the second Scope overwrote this value. Although you can’t see this happening in Excel, where you only see the values returned at the final calculation pass of the cube, the MDX Script Debugger can be used to see the values returned for a query at all the different passes so you can work out what’s going on.

The Freeze statement can be used to stop the second Scope from overwriting the first, like so:

[sourcecode language=”text” highlight=”3″]
SCOPE([Date].[Year].[All], [Measures].[Sales Amount]);
THIS = 123;
FREEZE(THIS);
END SCOPE;

SCOPE([Date].[Year].&[2001], [Measures].[Sales Amount]);
THIS = 456;
END SCOPE;
[/sourcecode]

image

However, in my opinion it makes a lot more sense to change the order of the Scope statements so that the assignment to 2001 doesn’t overwrite the assignment to the All Member:

[sourcecode language=”text”]
SCOPE([Date].[Year].&[2001], [Measures].[Sales Amount]);
THIS = 456;
END SCOPE;

SCOPE([Date].[Year].[All], [Measures].[Sales Amount]);
THIS = 123;
END SCOPE;
[/sourcecode]

The end result is the same:

image

Why do I prefer this approach to the use of Freeze? Two reasons:

  1. It works with natural MDX behaviour rather than trying to fight against it. In this case it’s just one line of code less, but in the real world it could result in a much greater reduction. It’s true that you have to put a lot of thought into the ordering of your calculations, but I don’t think you can get away from that. Using Freeze to make your calculations work properly without understanding why it’s needed results in much more complex code, often with duplicated calculations because Freeze still doesn’t give the desired results, and is frankly a bit of a hack.
  2. There are, or at least were, performance implications with the use of Freeze. In Analysis Services 2005 I saw a few cases where the use of Freeze contributed to poor query performance, and where reordering scope statements so that it was no longer necessary made performance better. I’m not sure whether this is still the case with SSAS 2014 but it may well be.

I see Freeze abused most often in financial cubes, when scope statements are used to define calculations on a chart of accounts hierarchy. Sometimes I have even seen the same calculation code appear in several places in the same MDX Script, just to make sure that the calculations always return the right result – all because the calculations on the chart of accounts dimension are aggregating up and overwriting each other. In this case the simple rule you have to remember is to always scope the calculations on the lowest level of the hierarchy first, then scope the calculations on the second-lowest level, and so on working your way up to the top of the hierarchy. This way you can be sure that your scope will never aggregate up and overwrite the result of another calculation.

Apart from that, I also see Freeze used when a cube contains a Date Tool dimension that uses regular members instead of calculated members, in the way described here. Now there are a lot of good reasons to use regular members on a Date Tool dimension (it will work with all versions of SSAS and Excel for instance) but I have also seen a lot of cases where the fact that you are scoping calculations on regular measures, which may then get aggregated up accidentally, has caused a lot of problems – not only resulting in incorrect values appearing, but also making query performance worse. For that reason, nowadays I prefer to use calculated members on my Date Tool dimension rather than regular members.

Power Query/Excel 2016 VBA Examples

In Excel 2016, Power Query is no longer an Excel add-in but a native feature of Excel, and what’s more, you can now use VBA to create and manage Power Query queries.

I’ve found two sources of information about how to use VBA with Power Query in Excel 2016. First, there are some code samples on the Technet Gallery here:
https://gallery.technet.microsoft.com/VBA-to-automate-Power-956a52d1#content
…and Gil Raviv, a Program Manager at Microsoft, has also asked for feedback on this functionality on this thread:
https://social.technet.microsoft.com/Forums/en-US/1eac9c36-b6e4-48f0-a51a-fa92b24cf1d9/vba-and-power-query-in-excel-2016-preview-lets-get-started?forum=powerquery

Secondly, I was contacted recently by Kiara Grouwstra who shared with me some of the interesting work they have done using VBA and Power Query in the Excel 2016 Preview, and who has very kindly allowed me to blog about it here. Their work is much more representative of how I think most people will want to use this feature.

Kiara sent me a .xlsm file containing all of the VBA code, which you can download here. Obviously the code only works in the Excel 2016 Preview, but you can still open the file and look at the code in Excel 2013. However if you’re worried about downloading a workbook with macros in, I extracted the code to a text document which you can see here. If you want to copy the code to use in your own workbook, you’ll need to go to the VBA Editor, select Tools/References and add a reference to “Microsoft ActiveX Data Objects 6.1 Library”.

image

The VBA code includes examples of how to:

  • Delete all the Power Query queries in a workbook
  • Export/import the M code for all queries to/from another Excel workbook
  • Export/import the M code for all queries to text files
  • Refresh all the Power Query queries in the workbook
  • Load a query to an Excel table

A few bugs/features in the Preview are also pointed out, namely:

  • Imported queries don’t always show up in the Workbook Queries pane; the workaround is to close and reopen the workbook
  • Functions aren’t recognised as functions (ie they don’t have the fx icon) until you open the Query Editor and the Close & Load
  • Query groups aren’t supported yet – which is a bit of an oversight, in my opinion, but the forums thread linked to above indicates it won’t be addressed before RTM unfortunately
  • Loading the output of a query into an Excel table using the code given here doesn’t seem to have the same result as loading a query to a table in the worksheet using the Power Query UI: it creates a blue, rather than green, table that doesn’t always retain row order.

I can imagine a lot of serious Power Query users will create workbooks containing a library of their most useful queries and functions, and use VBA code to copy these queries and functions into new workbooks as and when necessary. We’ll have to wait and see what Microsoft’s plans for sharing Power Query queries are, whether they’ll go beyond what’s already been seen in Office 365 Power BI, whether they will be part of a bigger bundle of services and what the cost will be.

Incidentally, the sample workbook contains a lot of interesting, generally useful Power Query queries and functions written by Kiara and others which is also available in the following GitHub repository: https://github.com/KiaraGrouwstra/pquery

Technitrain Courses For The Rest Of 2015

We’re almost halfway though the year and so I would draw your attention to some of the training courses we have coming up at Technitrain. Since you are the the discerning sort of person who reads my blog, then you are probably also going to be the discerning sort of person who is interested in the kind of expert-led Microsoft BI and SQL Server training that Technitrain runs! To get regular updates on new courses, you can sign up for our mailing list here.

Introduction to Power BI with Chris Webb, June 12th, Edinburgh

I know this is only next Friday but there are still places available. It’s a one-day precon for SQL Saturday Edinburgh BI Edition, which is well worth attending too. On this course I’ll give you a full introduction to Power BI, what it is, when to use it, and where it’s going.

SSIS Design Patterns with Andy Leonard, September 7th-10th, London

If you’ve already got some SSIS experience but want to find out about design patterns, best practices, BIML and new features in SSIS 2012/2014, this is the course for you.

Mission Critical SQL Server with Allan Hirt, September 28th-October 1st, London

Allan (http://sqlha.com/) is probably the number 1 expert on high availability for SQL Server, and this course will cover features such as AlwaysOn Availability Groups, true multi-site failover clustering instances, support for Windows Server Core, and what’s new in SQL Server 2016.

Introduction to MDX with Chris Webb, October 12th-14th, London

Yes, MDX isn’t cool or fashionable any more, but if you’re working with SSAS Multidimensional you still need to know it. On this course I’ll teach you all you need to know about MDX queries and calculations.

Practical Data Science – Azure Machine Learning, SQL Data Mining and R with Rafal Lukawiecki, October 19th-22, London

I’m particularly excited about this course, in part because the subject matter is so interesting (and so now) and in part because Rafal has such an amazing reputation as a trainer and a speaker, so I know it will be great. You can see a very detailed course agenda here.

Video Training

If you can’t make it over to the UK, don’t forget that a lot of my training material on MDX and SSAS has been recorded and is available on subscription at Project Botticelli, along with videos from Rafal, Marco Russo, Alberto Ferrari and others. Register with the discount code TECHNITRAIN2015 for a 10% discount.

Drawing Lines On Maps With Power Map And Power Query

Recently, I was working with a customer that wanted to be able to draw lines between two points in Power Map. At the time I thought the only way that it might be possible was by using Power Query to generate a table of data containing a series of points that were so close together that they looked like a single line, and then plot these points in Power Map (similar to what I show in the screenshot here). Soon after, the new custom regions functionality was released in Power Map (there’s no documentation I can find right now, but this blog post is reasonably detailed) and I wondered whether now it might be possible to draw lines. Unfortunately not: Power Map can now import SHP and KML files, but it doesn’t support all the features of KML – only polygons (and even then not all the features of polygons, although inner and outer boundaries work fine). I guess this is ok for the primary use-case of Power Map, which is plotting BI data on a map, but it would be nice to see more KML features supported so that Power Map can show richer supporting information for the data: things like arrows showing direction of travel, and so on.

Anyway, I then thought – why not use polygons to draw these lines? Again, I hit a slight problem: I wanted to generate the polygons for the lines in code, and Power Map can only import SHP or KML data from files. It would be really useful if we could use shape data stored in the Excel Data Model… but we can’t. However, it is possible to use Power Query to generate KML and then copy and paste this code into a file, which can then be imported into Power Map. So, just for the fun of it, I put together a proof-of-concept workbook containing Power Query queries to generate all the tables and KML code needed to draw lines between two places, and a Power Map tour that shows the output. Here’s what the end result looks like:

image

You can download my example workbook that contains all the code, plus all the supporting files, here. You will need to update some file paths in the M code to get it all to work.

The starting point is two tables on the worksheet, one containing the single starting point for the lines, the other all of the destinations:

image

There’s a lot of M code so I’m not going to include it in this post, but here’s an overview of what each query does:

  • GetLatLong is a function that calls the Bing Maps REST API to find the latitude and longitude for each place in the tables above. You will need your own Bing Maps account key if you want to use this code yourself – you can get one at https://www.bingmapsportal.com/
  • Starting Point and Ending Points simply load the data from the Excel tables
  • StartingPointLatLong gets the latitude and longitude of the starting point by calling GetLatLong
  • StartEndPoints gets the latitude and longitude of all the ending points by calling GetLatLong, adds custom columns to show the starting point name, latitude and longitude against each ending point, and loads the result to the Excel Data Model. You have to have some data in the Excel Data Model for Power Map to display the lines, and it’s important that Power Map can match the values in one column in this table with the names of objects in the KML file.
  • BasicPolygonTemplate loads a fragment of KML, containing the definition of a polygon, from a text file. This contains two ‘parameters’, @Name and @Coordinates, which will be overwritten using Text.Replace() later on when the actual KML is being generated.
  • GetCoordinateList is a function to draw a rectangular polygon that represents the line between the starting point and an ending point. I had a lot of fun trying to get the code for this working properly (I wish I could remember any of the trigonometry that I learned after the age of 13…) and I’m still not convinced the rectangles are properly rectangular, but they’re good enough.
  • KML generates the KML for all of the polygons. The output of this query must be copied from the Power Query query window into a text file with the .kml extension, for example Test.kml. There’s no need to load the output of this query to anywhere.

image

With all of that done, you now need to open Power Map and create a new tour. Choose EndingPoint as the sole Geography column, then choose Custom Region (.kml, .shp) from the dropdown list below and click Yes to import custom regions.

image

Select the .kml file you created earlier, containing the output of the KML Power Query query, and then click Import:

image

Finally, change the visualisation type to Region and optionally add Ending Point to Category to make the lines different colours:

image

And bingo, you see the lines:

image

Support for custom regions is a massive step forward for Power Map in my opinion: rather than just being a toy for creating flashy demos it’s now able to handle a lot more real-world requirements. However, having some way of programmatically creating regions and shapes (either through Power Query as I’ve done here, or using VBA or some other API), being able to load shape data from the Excel Data Model, or even just to be able to draw shapes on a map manually, would be welcome. I’m no mapping expert but I’ve come across a few frustrated Mappoint (which was discontinued at the end of 2014) users who would like to use Power Map but find that it can’t do everything that they need. The code in this post shows what’s possible but it’s still way too complex for most users and hardly an elegant solution.

Using SelectColumns() To Alias Columns In DAX

A few years ago I wrote this post on how to alias columns in a table in DAX, using a combination of AddColumns() and Summarize(). The good news is that in Excel 2016/the Power BI Designer/SSAS Tabular 2016 there’s a new DAX function specifically for this purpose: SelectColumns(). Here’s an example of how it can be used:

Imagine you have the following source table, called Products:

image

You can write a DAX query to get all the rows and columns from this table like so:

[sourcecode language=”text” padlinenumbers=”true”]
EVALUATE Products
[/sourcecode]

Here’s the output of that query in DAX Studio (and remember, DAX Studio can connect to data loaded into the Power BI Designer, which is what I’m doing here):

image

You can alias the columns in this table using SelectColumns() very easily, like so:

[sourcecode language=”text”]
EVALUATE
SELECTCOLUMNS (
Products,
"Column One", Products[Product],
"Column Two", Products[Colour]
)
[/sourcecode]

Here’s the output:

image

The syntax for SelectColumns() is straightforward: the first parameter is a table expression, and after that there are pairs of parameters consisting of:

  • A new column name
  • An expression returning a column from the table given in the first parameter

As you can see in the output of the query above, I’ve renamed the Product column “Column One” and the Colour column “Column Two”.

This means I can now crossjoin a table with itself without needing to worry about conflicting column names, like so:

[sourcecode language=”text”]
EVALUATE
CROSSJOIN (
Products,
SELECTCOLUMNS (
Products,
"Column One", Products[Product],
"Column Two", Products[Colour]
)
)
[/sourcecode]

image

One other interesting thing to note about SelectColumns() is that it allows you to do projection in a DAX query easily – as Marco notes here, it was possible before but it wasn’t pleasant. For example, the query:

[sourcecode language=”text”]
EVALUATE
SELECTCOLUMNS (
Products,
"Just Colour", Products[Colour]
)
[/sourcecode]

Returns:

image

Notice how there are three rows in the output here and that the value Green occurs twice. If you’re a true DAX afficionado, you might get excited about that.

Working With International Date And Number Formats In Power Query

One problem that can really drive you crazy when working with data from text files is the many different formats that dates and numbers can be stored in around the world. For example, take a look at the contents of a tab-delimited file shown in the following screenshot:

image

There’s a single row of data and two columns, Date and Sales. What number and date are shown though? For me, living in the UK, the date shown is the 2nd of March 2015 (in the UK, like most of the world, we use the DD/MM/YYYY date format) and the number is one hundred thousand and two (in the UK we use the . sign as a decimal separator and the , as a thousands separator). However, if I was from the US and used the MM/DD/YYYY format I’d read the date as the 3rd of February 2015; and if I was from France and used a comma as a decimal separator, I’d read the number as a value just a tiny amount over one hundred. Of course, if you’re working in one country and getting data sent from another, which uses a different date or number format, you need to take all of these variations into account.

The good news is that Power Query has a really easy way of doing this for you, even if it’s not immediately obvious where this functionality is. You don’t need to change your Windows locale or split dates into their constituent parts and rearrange them in the correct order, or anything like that.

When you first load a csv file like the one shown above into Power Query, it will try to guess the data types of each column. Here’s the code generated by Power Query for this file:

[sourcecode language=”text” padlinenumbers=”true”]
let
Source = Csv.Document(
File.Contents(
"C:\Users\Chris\Documents\Power Query demos\PQDateNumberFormats.csv"),
null,"#(tab)",null,1252),
#"First Row as Header" = Table.PromoteHeaders(Source),
#"Changed Type" = Table.TransformColumnTypes(
#"First Row as Header",
{{"Date", type date}, {"Sales", Int64.Type}})
in
#"Changed Type"
[/sourcecode]

As you can see, it has created three steps in the query automatically:

  1. Source loads the data from the CSV file given and sees that it’s tab delimited
  2. First Row as Header uses the first row of data as the column headers
  3. Changed Type sets the Date column to be type date, and Sales to be an integer

image

To change how the column data types are interpreted though, you first need to delete the Changed Type step by clicking on the cross icon next to it, shown above. Then, right-click on the Date column, choose Change Type then Using Locale…

image

When you do this, the Change Type With Locale dialog appears:

image

A locale is simply a name for all of the rules for date and number formats and more associated with a particular language and region. So, for example, setting the Date column so that it is interpreted using the English (United States) locale, means that when the data is loaded into Power Query on my machine I see the date 3/2/2015.

image

What’s happened is that the csv file contains the date “2/3/2015”, I have told Power Query that the data source uses a US English format date, Power Query has then assumed this date is therefore in MM/DD/YYYY format, loaded the data for me and shown the date in my own locale (which is English (United Kingdom)) in DD/MM/YYYY format as 3/2/2015. The date itself hasn’t changed, just the way it is displayed.

Using the same technique to import the Sales column as a decimal value using the English United States locale like so:

image

…gives the following final result, where the comma is assumed to be a thousands separator:

image

Here’s the new code for the query:

[sourcecode language=”text”]
let
Source = Csv.Document(
File.Contents(
"C:\Users\Chris\Documents\Power Query demos\PQDateNumberFormats.csv"),
null,"#(tab)",null,1252),
#"First Row as Header" = Table.PromoteHeaders(Source),
#"Changed Type with Locale" = Table.TransformColumnTypes(
#"First Row as Header",
{{"Date", type date}}, "en-US"),
#"Changed Type with Locale1" = Table.TransformColumnTypes(
#"Changed Type with Locale",
{{"Sales", type number}}, "en-US")
in
#"Changed Type with Locale1"
[/sourcecode]

However, if I change the locale used to set the types of these two columns to French (France), I see the following:

image

You will see that I have told Power Query to interpret the value “100,002” as a decimal number using a French format, where a comma is used as a decimal separator, and it has therefore imported and displayed it as 100.002 in the English (United Kingdom) format with a decimal point used as the decimal separator.

Thoughts On All The Recent Power BI/SQL Server 2016 BI/Excel 2016 News

The last few weeks have seen more Microsoft BI-related announcements in a short time than I can ever remember before. Some of them I’ve blogged about; most I’ve at least tweeted. For good summaries of what’s coming for Power BI, on-premises SQL Server BI and Excel 2016 I can recommend the following posts by other people, all of which are worth reading:

http://www.jenunderwood.com/2015/05/14/sql-server-bi-2016/

http://www.jenunderwood.com/2015/04/23/april-microsoft-bi-world-news/

http://byobi.com/blog/2015/05/ssas-related-enhancements-in-sql-server-2016/

https://gqbi.wordpress.com/2015/05/14/bi-nsight-excel-2016-power-bi-updates-including-new-data-sources-azure-sql-data-warehouse/

https://gqbi.wordpress.com/2015/05/07/bi-nsight-sql-server-2016-power-bi-updates-microsoft-azure-stack/

Even then I’m not sure everything has been covered, and because new stuff is coming thick and fast (custom regions in Power Map! DirectQuery/ROLAP in the cloud with Power BI connecting to Azure SQL Database!) it’s hardly worth trying. However, I do think this is as good a point as any to work out what I think about all this activity and where Microsoft is heading.

SSAS Multidimensional Improvements

I’m well past the stage of feeling angry about the neglect of SSAS Multidimensional over the past few years, and I’m genuinely grateful that it’s getting some investment rather than nothing at all. That said, I’m not sure which customers asked for Netezza support or DBCC – they aren’t things I’ve ever needed. The promised performance improvements are where I expect the real value to be, and on their own they will probably give existing customers reason enough to upgrade to 2016. It would have been nice to get even one new feature from this list though.

SSAS Tabular Improvements

As expected, the Tabular engine in SSAS 2016 gets a lot of new stuff for free because of its shared heritage with other Power BI tools. My feeling is that uptake of Tabular has been slower than it should have been because 2012 was, frankly, a bit v1.0 with all the immaturity that implies, and there haven’t been any substantial improvements since then. With 2016, though, it looks like Tabular will take a great leap forward and as a result be seen as a much more capable platform. There will certainly be fewer reasons to choose Multidimensional over Tabular, although for applications that require complex calculations (such as financial applications) Multidimensional will still have the upper hand. The more reasons I have to love Tabular, the less I’ll worry about the lack of new features in Multidimensional.

Power Query And The Corporate/Self-Service BI Crossover

As regular readers of this blog may have noticed, I like Power Query a lot and I’m pleased to see that it has extended its reach into corporate BI. Power Query as a data source for SSAS will be important for scenarios where Power Pivot models are upgraded to server-side solutions; I don’t think it will be a good idea to use Power Query if you’re building an SSAS solution from scratch though. Power Query in SSIS was another predictable development and one which should make it easier to work with certain data sources (such as Excel files); the existing ability to publish the output of an SSIS package as an OData feed using the Data Streaming Destination, which can then be consumed by Power Query, could open up some interesting scenarios where a user builds a data set in Power Query and publishes it via SSIS for consumption by other Power Query users.

It’s the promised integration of Power Query and SSRS that excites me most though. I asked for it here and it looks like my wish has been granted! As well as providing access to a wider range of data sources and a common ‘get data’ experience with other tools, I think it will be the key to making SSRS and in particular Report Builder the self-service BI tool that so many customers want it to be. Report Builder has struggled with two problems since it first appeared: first, make it easier for users to lay out a nice-looking report on a canvas, something that the current version does a reasonable job of I think; and second, make it easy for non-technical users (who, for example, might have little or no SQL knowledge) to get data from data sources for their reports – this is where it has not succeeded in the past, and where Power Query could make all the difference. Power Query, among other things, is a solid, user friendly, SQL generation tool. This, plus the fact that SSRS will be updated for all modern browsers and get new visualisations and report themes etc, means that the vast number of existing SSRS customers will have a lot of good reasons to upgrade to 2016, and when they do they’ll also find it easy to integrate with the rest of Power BI.

Power BI: Will Anyone Buy It?

It’s very easy for Microsoft BI fanboys like me to get all worked up by the constant drip feed of tweets about new Power BI features. An impartial observer will point out that some of these features, like the ability to change the colours of your charts in Power View, are actually things we should be embarrassed at not having already. Nonetheless I think it’s fair to say that Microsoft are doing a good job of getting its core customers excited about Power BI and there’s also a lot of evidence that people outside this core at, at least, curious, so from a marketing perspective everything’s going well.

Even if the marketing is good, that will only get Power BI evaluated. Those evaluations will only turn into purchases if the product itself is up to the task. Microsoft set itself an extremely difficult task when it decided to change the direction of Power BI and deliver a respectable version 1.0 this year; the impressive speed that new features are arriving at suggests that they will manage it. When this product is put side-by-side with competing tools it will have some advantages – Power Query is excellent, the Power Pivot engine is fast and can handle all kinds of complex calculations – but will inevitably appear immature in other respects such as visualisation. I think the limit on the amount of data that can be held in a single data model, either on the desktop or in the cloud, is also something that will be a problem for those of us who are used to building server-side SSAS solutions that can hold all the data the user ever needs to see. Maybe DirectQuery/ROLAP on SQL Azure and perhaps Azure SQL Data Warehouse will make this irrelevant? Overall though in my opinion the version of ‘new’ Power BI that will RTM later this year will be seen as more than good enough from a technical standpoint, and if this rate of change is maintained for version 2.0 then it will be something special.

I also think that the focus on building APIs and connectors to other web services is a really clever move. There are a lot of other vendors out there who don’t want to build their own BI functionality, and if Microsoft can convince them to use Power BI that will bring a lot of customers on board. Even at this early stage it looks like Microsoft is doing a good job of recruiting these vendors (SQL Sentry for example, but there are many others) as well as getting other teams inside Microsoft (like Visual Studio Online) to do the same. Close integration with new Microsoft services like Azure Stream Analytics and Azure SQL Data Warehouse should have a similar effect, although less pronounced given that these new services will have few users initially.

While I admit the divorce from Excel was the right thing to do in the circumstances, I still find that I prefer working in Excel over the Power BI Dashboard Designer. Maybe that’s partly due to habit, but Power View still has a long way to go before it has the flexibility of Excel PivotTables and especially cube formulas. That’s why I think Marco Russo’s campaign to create an API for the Dashboard Designer and to support external connections from Excel and other tools is so important. If you haven’t voted already, please do so now! This would be a killer feature in that it would allow you to continue to build reports in Excel (maybe 32-bit) while still making use of new features in the engine. It would give use all the good things we have today with the Excel Power add-ins and more. It would also, as Marco points out, be another reason for third party vendors to use the Power BI platform.

The final factor to consider is price. Making the Dashboard Designer free is important, because it’s not just a Dashboard Designer but a complete, standalone desktop self-service BI solution in itself. Many customers will use it as such without buying a Power BI subscription – that is, if they know that is an option. The free/$9.99 cloud subscription model is also very attractive, and all in all the new pricing model is a refreshing change from the nightmare that ‘old’ Power BI licensing was. I wonder if there will be any particular incentives (financial or otherwise) for partners to sell or recommend Power BI to their customers? If not,there probably should be.

Conclusion

Overall, I’m happier with the direction that Microsoft BI is going in than I have been for a long time. Power BI now seems like it has some momentum behind it, and that it is a coherent product rather than a collection of (individually impressive) tools bound into Excel that, for one reason or another, customers couldn’t use to their full potential. We’ll have to see whether it does become a commercial success or not but I think it has a good chance of doing so now. Excel 2016 also has some welcome improvements, even if it is now the ‘slow track’ for self-service BI; the more users discover Power Pivot and Power Query via Excel 2013 and soon 2016, the more likely it is that they’ll start using the rest of the Power BI stack.

Meanwhile it seems like at last there is at last a serious commitment to improve the on-premises SQL Server BI stack on the part of Microsoft. Some time ago I wrote a post on why corporate BI and self-service BI are both necessary and I still stand by what I said there; it’s also clear that a lot of customers, especially enterprise customers and especially in Europe, are not yet ready to put their most valuable data in the cloud. Microsoft has the chance to be one of the few vendors with great self-service and corporate BI stories, and great on-premises and cloud BI stories. Also, given that today’s SQL Server BI customers are the most likely to become tomorrow’s Power BI customers, keeping them happy in the medium term while Power BI matures should be a priority.

Let’s see where we are this time next year…?

Nested Calculations In Power Query

Quite often, in Power Query, you want to ‘nest’ calculations and transformations – apply them not across the whole table, but repeat the same calculation or transformation across multiple groups of rows within that table. Let me give you an example…

Take the following input table:

image

Imagine you wanted to add a column showing the rank of each row by Sales. In Power Query you would just need to:

  1. Load the data
  2. Sort the table in descending order by Sales
  3. Add an index column starting at 1, which is the rank

You would end up with the following output:

image

…and here’s the M code, all of which is generated by the UI:

let
Source = Excel.CurrentWorkbook(){[Name="Sales"]}[Content],
#"Sorted Rows" = Table.Sort(Source,{{"Sales", Order.Descending}}),
#"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Rank", 1, 1)
in
#"Added Index"

Now imagine you wanted to rank Sales within each month. That’s to say you want to apply the same steps as shown in the previous query but for each month individually to get an output like this:

image

One way to tackle this, and problems like it, is to do the following. First, do a Group By on the column you want to repeat the calculation over, in this case Month, and use the All Rows aggregation operation. This will result in a table with one row for each month and a column containing nested tables, as shown below:

image

Each one of these tables contains the rows from the original table for the month.

You can then take your original transformation and turn it into a function, either in a separate query or as a step in your current query. Here’s an example of how the query above can be turned into a function that takes a table and returns a table with a rank column added:

(tabletorank as table) as table =>
let
SortRows = Table.Sort(tabletorank,{{"Sales", Order.Descending}}),
AddIndex = Table.AddIndexColumn(SortRows, "Rank", 1, 1)
in
AddIndex

Next, you need to pass each nested table to this function. You could do that in a calculated column, but the most elegant way I think is by using the Table.TransformColumns() function which takes a function and applies it to every value in a column (see here for another example of how to use it).

Finally, you get the final output by clicking on the Expand icon in the AllRows column and then choosing to expand all the columns in the nested table except the ones you originally grouped on:

image
image

Here’s the full M code:

let
//Get data from Excel
Source = Excel.CurrentWorkbook(){[Name="Sales"]}[Content],
//Group by Month
Grouped = Table.Group(Source, {"Month"}, {{"AllRows", each _, type table}}),
//Declare a function that adds a Rank column to a table
RankFunction = (tabletorank as table) as table =>
let
SortRows = Table.Sort(tabletorank,{{"Sales", Order.Descending}}),
AddIndex = Table.AddIndexColumn(SortRows, "Rank", 1, 1)
in
AddIndex,
//Apply that function to the AllRows column
AddedRank = Table.TransformColumns(Grouped, {"AllRows", each RankFunction(_)}),
//Expand the tables in the AllRows column again
ExpandAgain = Table.ExpandTableColumn(AddedRank, "AllRows",
{"Product", "Sales", "Rank"}, {"Product", "Sales", "Rank"})
in
ExpandAgain

You can download the example workbook here.

This pattern could also be applied to other types of calculation that need to be nested, for example running totals or shares.