Excel 2016 PivotTable MDX Changes Lead To Big Query Performance Gains

Yesterday, the Excel team announced a bunch of performance improvements in Excel 2016 for PivotTables connected to Analysis Services (Multidimensional or Tabular), Power Pivot and Power BI. If you haven’t read the official blog post already, here it is:

https://blogs.office.com/2016/07/07/faster-olap-pivottables-in-excel-2016/

In that post, Microsoft point out that how much of a performance increase you get will depend on a number of factors. I guess they have to do this to manage expectations. However I’m going to come right out and say that these changes are probably the most important thing that has happened in the last five years for Analysis Services or Power Pivot query performance and if you are using Excel PivotTables with SSAS or Power Pivot for reporting, you should upgrade to Excel 2016 (you’ll need the Office 365 click-to-run version) just for these changes.

The key improvement that Microsoft have made here is to stop Excel bringing back unnecessary subtotal values in the MDX queries that it generates. This has been a gigantic problem for years and several people have blogged about it: I did here, and among other people Rui Quintino and Richard Lees have posts on the subject that are worth reading. A lot of my consultancy work involves SSAS and MDX performance tuning and I think I see around five to ten customers every year who are seriously affected by this – and I’m sure this is just the tip of the iceberg. Even if you don’t think you are affected, you may have users who are putting up with slow refresh times for Excel-based reports without your knowledge.

Let’s go into some technical detail, see what the original problem was and what has changed in Excel 2016. I’m going to provide some examples using Excel 2013/2016 against SSAS Multidimensional but as I said the same problems occur (although less severely) with SSAS Tabular and Power Pivot.

Consider the following Excel 2013 PivotTable connected to the Adventure Works cube:

image

image

It’s typical of the kind of thing an end user wants to build, and there are two things to point out:

  1. Notice there are three fields that have been placed on the Rows axis of the PivotTable: the Gender and Marital Status hierarchies from the Customer dimension, and the Category hierarchy from the Product dimension. It’s pretty common for users to put many more fields together on an axis when they want to display detailed data.
  2. The user has also turned off grand totals and subtotals in the PivotTable so that only the detailed data is shown. This often happens in combination with (1) because if the user only wants to see the detailed data the subtotals and grand totals are a distraction.

In this case the PivotTable layout has been switched Tabular and the Repeat All Item Labels option is selected to make the PivotTable look like a table, but these are just aesthetic changes.

This PivotTable contains 12 rows, but if you look at the MDX query that is generated by Excel 2013 to populate it, it returns 36 rows of data (actually the MDX in this case returns 36 columns, but that’s a quirk of Excel) because the subtotals and grand total that Excel is not displaying are still being returned. Here’s the query:

SELECT 
NON EMPTY 
CrossJoin(
CrossJoin(
Hierarchize(
{DrilldownLevel(
{[Customer].[Gender].[All Customers]}
,,,INCLUDE_CALC_MEMBERS)}), 
Hierarchize(
{DrilldownLevel({[Customer].[Marital Status].[All Customers]}
,,,INCLUDE_CALC_MEMBERS)})), 
Hierarchize(
{DrilldownLevel({[Product].[Category].[All Products]}
,,,INCLUDE_CALC_MEMBERS)})) 
DIMENSION PROPERTIES PARENT_UNIQUE_NAME,HIERARCHY_UNIQUE_NAME 
ON COLUMNS  
FROM 
[Adventure Works] 
WHERE ([Measures].[Internet Sales Amount]) 
CELL PROPERTIES VALUE, FORMAT_STRING, 
LANGUAGE, BACK_COLOR, FORE_COLOR, FONT_FLAGS

And here’s a sample of what it returns, with the unwanted values highlighted:

image

You can probably guess that these unwanted subtotals and grand totals make the query slower, and the more fields you put together on the rows or columns of a PivotTable the greater the number of subtotals/grand totals are returned and slower things get. PivotTables connected to SSAS or Power Pivot often become completely unusable with more than six or seven fields put together on the same axis, and the only workarounds before Excel 2016 are either to write custom MDX in a named set (I show how to do this here) or if you are using SSAS Multidimensional try to use member properties instead of separate fields (as Richard Lees shows here), but neither are really great alternatives.

Building the same PivotTable using Excel 2016, however, gives the following MDX:

SELECT 
NON EMPTY 
CrossJoin(
CrossJoin(
Hierarchize(
{[Customer].[Gender].[Gender].AllMembers}), 
Hierarchize(
{[Customer].[Marital Status].[Marital Status].AllMembers})), 
Hierarchize(
{[Product].[Category].[Category].AllMembers})) 
DIMENSION PROPERTIES PARENT_UNIQUE_NAME,HIERARCHY_UNIQUE_NAME 
ON COLUMNS  FROM [Adventure Works] 
WHERE ([Measures].[Internet Sales Amount]) 
CELL PROPERTIES VALUE, FORMAT_STRING, 
LANGUAGE, BACK_COLOR, FORE_COLOR, FONT_FLAGS

Which returns just the data needed for the PivotTable:

image

What has changed? Focusing on just the Gender hierarchy, the difference between the two queries is that in Excel 2013 the selection on Gender is given using the DrillDownLevel() function on the All Member of the hierarchy. Isolated and put in its own query, the set expression used looks like this:

SELECT 
{[Measures].[Internet Sales Amount]}
ON COLUMNS,
DrilldownLevel(
{[Customer].[Gender].[All Customers]}
,,,INCLUDE_CALC_MEMBERS)
ON ROWS
FROM
[Adventure Works]

image

Notice how the All Member, used by Excel to return subtotals and grand totals, is returned by the set. Excel 2016 instead just asks for the members on the Gender level of the Gender hierarchy, not including the All Member. Again, put inside its own query, you can see what it returns:

SELECT 
{[Measures].[Internet Sales Amount]}
ON COLUMNS,
{[Customer].[Gender].[Gender].AllMembers}
ON ROWS
FROM
[Adventure Works]

image

Apart from the fact that no All Member is returned, it’s also a lot cleaner and easier to read.

I’ve been very excited about this change since I first heard it was happening, and I think it will benefit a lot of people. A big thanks to Alexander Lahuerta and the rest of the Excel team for making this happen!

Finding All Selected Items In A Slicer In Excel 2016 Using TextJoin()

When you are using slicers with an Excel PivotTable it’s often useful to be able to get a comma-delimited list of the items selected in that slicer for use in a report title. It’s not easy to do though, and in fact this is one of those topics that lots of people have blogged about over the years: here’s my MDX approach, here’s Erik Svensen’s post on using the new DAX ConcatenateX() function, and there are also posts by Rob Collie like this one. None of these techniques are ideal though: my personal favourite is the ConcatenateX() approach, but that only works with SSAS Tabular 2016 (and then only if you can create a measure on the model) or Power Pivot in Excel 2016, and not at all if you’re using SSAS Multidimensional or earlier versions of SSAS Tabular.

However, after discovering the new TextJoin() function in Excel 2016 the other week I realised that this would provide yet another way to solve this problem. Here’s a simple example using a PivotTable and slicer connected to a Power Pivot model:

image

The highlighted cell F3 showing a comma-delimited list of all the items selected in the slicer has the following Excel formula:

=TEXTJOIN(
    ", ",
    TRUE,
     IFERROR(
      CUBERANKEDMEMBER(
        "ThisWorkbookDataModel",
        Slicer_Product,
        ROW(
         INDIRECT("1:"&CUBESETCOUNT(Slicer_Product))
        )
       ), 
      "")
     )

Important: this needs to be entered as an array formula, so instead of hitting Enter after typing in the formula you need to hit Ctrl+Shift+Enter. You’ll see the formula surrounded by braces {} in the formula bar when you do this:

image

image

This formula relies on the fact that the selection in a slicer (in the example above the slicer has the name Slicer_Product) can be treated the same as the output of the Excel CubeSet() function, which means that you can use the CubeSetCount() function to find the number of items selected and the CubeRankedMember() function to get the name of any single item in the selection. It also uses the Row()/Indirect() trick described here to create an array of numbers from 1 to the number of items selected in the slicer, which in turn provides the rank values to pass to the CubeRankedMember() function.

The beauty of this approach is that it works for Power Pivot and all versions of SSAS Tabular and Multidimensional, and doesn’t require any measures to be created on your models/cubes. It even works in Excel Online, so it will work inside Power BI, although it doesn’t seem to be possible to create array formulas in Excel Online yet so you need to create the formula on the desktop before you deploy. Of course you need the latest build of Excel 2016 for all this to work, and at the time of writing most people don’t have Excel 2016 and even if they do they probably won’t have a build (Version 16.0.6568.2025 or higher) with TextJoin() in it yet. But this will be a great solution in the distant future when everyone has Excel 2016, I promise!

You can download the sample Excel 2016 workbook here.

I also have to acknowledge the help of David Hager in writing this formula – we had a conversation about how TextJoin() behaves in array formulas in the comments of my earlier post and in doing so he provided the basic approach for me.

Automatically Generating Date Dimension Tables In Excel 2016 Power Pivot

As you probably know, whenever you are doing any kind of date or time-based calculations in DAX you should always have a separate Date table in your Power Pivot model. There are a number of ways of building these tables (see, for example, my Power Query query here) but they are all a bit of a hassle – which is why it’s so cool that, in Excel 2016, you can get one built automatically inside the Power Pivot window.

Consider the following table of sales data on an Excel worksheet:

Source

With this table loaded into the Data Model (and the Order Date column recognised as containing data of the Date data type), when you go to the Design tab in the Power Pivot window you’ll see the new Date Table button enabled:

Menu

Clicking on the New button will add a new date table to the Data Model, called Calendar:

CalendarBasic

This table is automatically marked as the Date Table in your model.

The table contains a continuous range of dates starting from the beginning of the year of the earliest date found in any column in any table in your Data Model, up to the end of the year containing the latest date found in any column in any table in your Data Model. Obviously, this means that your table could contain a very large date range if, for example, you have a Customer table containing a Date Of Birth column. Luckily, you also have the option of manually configuring the range of dates used by clicking the Update Range button:

Range

One other thing to point out is that the resulting table is a table like any other, so you can add, delete or rename columns as you wish. You should also be able to set the table back to its default state by using the Set Default menu option, but I couldn’t make that work (possibly it hasn’t been implemented yet – this post was written using the Excel 2016 Preview).

If you do make changes like adding calculated columns, such as the Month Year calculated column shown below:

Calc

You can then click the Save Configuration button to save the current state of the table as your default. This means that the next time you create a new Date table in the same workbook, the table will include any customisations. However these changes don’t seem to be applied in Date tables created in new workbooks – maybe this will also change before RTM?

All in all, this is a very handy feature that will save Power Pivot modellers a lot of time. I wonder if it uses the new Calendar() or CalendarAuto() DAX functions under the covers?

Using DateDiff() To Calculate Time Intervals In DAX

One of the most useful new additions to DAX in Excel 2016 and the Power BI Designer is the DateDiff() function. It does exactly what you would expect: calculate the amount of time in between two dates, and express that value as either seconds, minutes, hours, days, weeks, months, quarters or years.

Here’s a very simple table of dates:

image

With this table loaded into the Power BI Designer, you can add new calculated columns to the table by clicking the New Column button on the ribbon. Here are two calculated column definitions that give the number of days and the number of years between the Start Date and the End Date on each row:

DayDurationColumn = 
DATEDIFF(MyTable[Start Date], MyTable[End Date], DAY)

YearDurationColumn = 
DATEDIFF(MyTable[Start Date], MyTable[End Date], YEAR) 

The output is pretty much what you’d expect:

image

It is of course also possible to create measures that use the DateDiff() function to, for example:

YearDurationMeasure = 
DATEDIFF(
FIRSTDATE(MyTable[Start Date]), 
LASTDATE(MyTable[End Date]), 
YEAR)

image

All very straightforward, then, and much easier than having to calculate these values yourself.

Using SelectColumns() To Alias Columns In DAX

A few years ago I wrote this post on how to alias columns in a table in DAX, using a combination of AddColumns() and Summarize(). The good news is that in Excel 2016/the Power BI Designer/SSAS Tabular 2016 there’s a new DAX function specifically for this purpose: SelectColumns(). Here’s an example of how it can be used:

Imagine you have the following source table, called Products:

image

You can write a DAX query to get all the rows and columns from this table like so:

EVALUATE Products

Here’s the output of that query in DAX Studio (and remember, DAX Studio can connect to data loaded into the Power BI Designer, which is what I’m doing here):

image

You can alias the columns in this table using SelectColumns() very easily, like so:

EVALUATE
SELECTCOLUMNS (
    Products,
    "Column One", Products[Product],
    "Column Two", Products[Colour]
)

Here’s the output:

image

The syntax for SelectColumns() is straightforward: the first parameter is a table expression, and after that there are pairs of parameters consisting of:

  • A new column name
  • An expression returning a column from the table given in the first parameter

As you can see in the output of the query above, I’ve renamed the Product column “Column One” and the Colour column “Column Two”.

This means I can now crossjoin a table with itself without needing to worry about conflicting column names, like so:

EVALUATE
CROSSJOIN (
    Products,
    SELECTCOLUMNS (
        Products,
        "Column One", Products[Product],
        "Column Two", Products[Colour]
    )
)

image

One other interesting thing to note about SelectColumns() is that it allows you to do projection in a DAX query easily – as Marco notes here, it was possible before but it wasn’t pleasant. For example, the query:

EVALUATE
SELECTCOLUMNS (
    Products,
    "Just Colour", Products[Colour]
)

Returns:

image

Notice how there are three rows in the output here and that the value Green occurs twice. If you’re a true DAX afficionado, you might get excited about that.

Documentation For New Excel 2016 DAX Functions

Microsoft has published documentation for the new DAX functions in the Excel 2016 preview here:

https://support.office.com/en-us/article/New-DAX-functions-for-Excel-2016-Preview-8192a787-aa91-4d7f-9a82-4e2c607e629a

There’s a lot of detail, including examples (although the ConcatenateX() page isn’t live at the time of writing – but I’ve blogged about that already), so it’s well worth reading through.

NaturalInnerJoin And NaturalLeftOuterJoin DAX Functions In Excel 2016

Continuing my series on new DAX functions in Excel 2016, here are two more: NaturalInnerJoin() and NaturalLeftOuterJoin(). Both do pretty much what you’d expect.

Consider the following two tables in an Excel worksheet, called ColourFruit and FruitPrice:

image

With these tables loaded into the Excel Data Model as linked tables, the next step is to create a relationship between the tables on the Fruit column:

image

Both functions only work with two tables that have an active relationship between them, and both take two tables from the Excel Data Model as parameters. Once you’ve done that you can use these functions in a DAX query.

The queries

evaluate naturalinnerjoin(ColourFruit,FruitPrice)

and

evaluate naturalinnerjoin(FruitPrice,ColourFruit)

…both perform an inner join between the two tables on the Fruit column and both return the same table:

image

The query

evaluate naturalleftouterjoin(ColourFruit,FruitPrice)

…returns

image

The query

evaluate naturalleftouterjoin(FruitPrice,ColourFruit)

…returns:

image

For NaturalLeftOuterJoin() the table given in the first parameter is on the left-hand side of the left outer join, so all rows from it are returned, whereas the table in the second parameter is on the right-hand side of the join so only the matching rows are returned.