Excel PivotTable Query Performance With Analysis Services, Power Pivot And Power BI (Revisited)

Back in 2016 I wrote the following blog post about changes to the way Excel 365 generated MDX queries for PivotTables connected to Analysis Services, Power Pivot/the Excel Data Model and Power BI datasets:

https://blog.crossjoin.co.uk/2016/07/08/excel-2016-pivottable-mdx-changes-lead-to-big-query-performance-gains/

I know it sounds boring and not something you need to worry about but trust me, this is important – these changes solved the vast majority of Excel PivotTable performance problems that I encountered when I was a consultant so you should read the above post before continuing.

Unfortunately, earlier this year these changes had to be partially rolled back because in some rare cases the queries generated returned incorrect results; this means that you may find that values for subtotals and grand totals are again being returned even when they aren’t being displayed. The good news is that you should still be able to get the improved performance with a few minor tweaks.

Using the example from the previous post, a PivotTable connected to SSAS MD and the Adventure Works cube (the queries generated for SSAS Tabular, Power Pivot and Power BI may be slightly different and slightly better but the basic problem is the same), if you build the following in Excel:

image

image

…even though the subtotals in the PivotTable are not displayed in Excel, the MDX query generated by Excel not only returns them but also returns others that are not needed at all – in fact 36 rows (although the query returns them as columns) of data rather than the 13 rows that are displayed. Here’s a screenshot of the results returned when you run the MDX query in SQL Server Management Studio:

image

image

Any time you see a row or column containing an All Member (in this case All Customers or All Products) you know you are looking at a subtotal or grand total.

[NB The easiest way to get the MDX query generated by your PivotTable is to install the OLAP PivotTable Extensions add-in https://olappivottableextensions.github.io/]

In this case, to get an efficient query, you need to explicitly turn off subtotals and grand totals for the PivotTable in Excel:

image

image

In the PivotTable the only difference you’ll see is that the grand totals are now not displayed:

image

But the query generated now only returns the values that are actually needed, and as a result will be a lot more efficient and potentially a lot faster. In this case, the query now only returns the 12 rows of data displayed:

image

image

Even with subtotals and grand totals turned off there are still some cases where unwanted values will be returned. Take the following PivotTable, where I have used the Calendar hierarchy from the Date dimension on columns and filtered it so I only see the three months in Q1 CY 2012:

image

image

The MDX query generated by Excel requests requests four extra columns with subtotals and a grand total that aren’t displayed:

image

There is a fairly simple workaround though. By changing how the PivotTable is constructed – in this case by not putting a hierarchy on columns but just the Month Of Year attribute, then adding slicers for Calendar Year and Calendar Quarter to control which months are displayed – you can get an efficient query. This version of the PivotTable:

image

image

…results in this query being generated, which only returns the required values:

image

To sum up, then, if you have a PivotTable that seems a bit slow:

  • Turn off subtotals and grand totals unless you really need to see them
  • Look at the MDX query being generated by Excel and see if it is still returning unnecessary subtotals and grand totals, and if it is try rebuilding the PivotTable to get the same results in a different way (for example by not drilling down on hierarchies as in the last example) to get a more efficient query

[All the queries in this post were generated by Excel 365 version 1910 build 12130.20238]

SSAS Tabular 2019, Calculation Groups And Migration From SSAS Multidimensional

With the release of CTP 2.3 of SQL Server 2019 today there was big news for Analysis Services Tabular developers: Calculation Groups. You can read all about them in detail in this blog post:

https://blogs.msdn.microsoft.com/analysisservices/2019/03/01/whats-new-for-sql-server-2019-analysis-services-ctp-2-3/

In my opinion this is the most important new feature in DAX since… well, forever. It allows you to create a new type of calculation – which in most cases will be a time intelligence like a year-to-date or a previous period growth – that can be applied to multiple measures; basically the same thing that we have been doing in SSAS Multidimensional for years with the time utility/shell/date tool dimension technique. It’s certainly going to solve a lot of problems for a lot of SSAS Tabular implementations, many of which have hundreds or even thousands of measures for every combination of base measure and calculation type needed.

I’m not going to repeat any of the detailed technical information in the blog post here, though. Instead the point I want to make is that this is very big news for SSAS Multidimensional users too. In the past couple of years many people who have existing SSAS Multidimensional implementations have thought about migrating to SSAS Tabular so they can take advantage of its new features or move to the cloud, and indeed many of them have already migrated successfully. However, up to now, the biggest blocker for those wanting to migrate from Multidimensional to Tabular has been the fact that some complex calculations that can be expressed in MDX cannot be recreated (or recreated easily and efficiently) in DAX, because DAX has not had an equivalent of calculated members not on the Measures dimension or the MDX SCOPE statement.

Calculation groups do not remove this roadblock completely, but they do remove the roadblock for a large group of existing SSAS Multidimensional users whose only complex calculation requirement is a time utility/shell/date tool dimension. As a result these SSAS Multidimensional users will now be able to migrate to SSAS Tabular 2019, Azure Analysis Services or Power BI if they want to. Only those people who have more exotic uses for calculated members not on the Measures dimension (which are not very common at all) and those who use SCOPE statements (a larger group – many people working with financial data use SCOPE statements heavily) will find that Multidimensional is still the correct platform for them.

Finding All Selected Items In An Excel Slicer Connected To SSAS, Power BI Or the Excel Data Model Using Dynamic Arrays

The big news in the world of Excel right now is the introduction of dynamic arrays. They’re only available in the Office 365 click-to-run version of Excel and, at the time of writing, only available to people on the Office Insiders programme, but eventually they’ll be available to anyone running Excel for Office 365 on their desktop. There are already lots of blog posts about them including this overview by Jon Acampora, and you probably also want to download Bill Jelen’s detailed mini-book on them here which is free for the rest of 2018. Now I’m not an Excel expert by any stretch of the imagination but I’m as excited as anyone else about them because they will be incredibly useful for anyone building reports using Excel cube functions against Analysis Services, the Excel Data Model/Power Pivot and Power BI. Bill Jelen’s book has a short section on this subject but the possibilities are limitless…

Here’s one example of how they can be used. A while ago I blogged about how to use a regular array formula and the TextJoin() Excel function to get all the selected items from a slicer. Dynamic arrays make this problem much easier to solve. Take the following table loaded into the Excel Data Model:

Capture1

Now, say you have a PivotTable built from this and a slicer (called Slicer_Fruit) connected to it:

Capture2

It’s possible to use the CubeSet() function to get the set of selected items in a slicer using the following formula:

=CUBESET("ThisWorkbookDataModel",Slicer_Fruit,"Slicer Set")

Assuming this formula is in cell H1, you can then get the number of items in this set using CubeSetCount():

=CUBESETCOUNT($H$1)

Assuming this is in cell H2, all you need to do to get a comma-delimited list of all the selected items in the slicer via this set is:

=
 TEXTJOIN(
  ", ", 
  TRUE, 
  CUBERANKEDMEMBER(
   "ThisWorkbookDataModel",
   $H$1, 
   SEQUENCE($H$2,1)
  )
 )

Capture3

Here it is in action:

demo

It works as follows:

  • The new Sequence() function is used to create a dynamic array of numbers from one to the number returned by the CubeSetCount() function.
  • The CubeRankedMember() gets the name of each of the items in the set using the numbers returned by Sequence()
  • TextJoin() then concatenates all of the names returned by CubeRankedMember() into a single comma-delimited list.

You can download the sample workbook here (remember it will only work if you have a version of Excel installed that includes dynamic arrays!).

Performance Problems With MDX Calculated Measures That Return Constants In Analysis Services Multidimensional

Recently I was working on an MDX calculation that calculated a value at a low granularity and aggregated the result – basically what I described in my blog post here. Here’s a simplified version of the calculation written on a cube built from Adventure Works data running on my local SSAS 2017 MD instance:

SCOPE([Measures].[Tax Amount]);
    SCOPE(
	[Customer].[Customer].[Customer].MEMBERS, 
	[Date].[Date].[Date].MEMBERS, 
	[Product].[Product].[Product].MEMBERS);
        THIS = [Measures].[Sales Amount]*0.08;
    END SCOPE;
END SCOPE;

All it does is take the value of the Sales Amount measure at the lowest granularities of the Customer, Date and Product dimensions, multiply it by 0.08 to find a tax value, and because [Tax Amount] is a real, non-calculated measure, the result of the calculation aggregates up through the cube. [I know that I don’t have to aggregate the result of this specific calculation but remember that this is a simplified example – in the real case I did have to write the calculation using Scope statements – and anyway the best way of handling a basic multiplication like this would be with a measure expression]

The performance was sub-second for my test query and I was happy, but then I realised that the same tax rate was being used in other calculations and may change in the future, so I thought I would store the value 0.08 in a calculated measure:

CREATE MEMBER CURRENTCUBE.[Measures].[Tax Rate] AS  
0.08;

SCOPE([Measures].[Tax Amount]);
    SCOPE(
	[Customer].[Customer].[Customer].MEMBERS, 
	[Date].[Date].[Date].MEMBERS, 
	[Product].[Product].[Product].MEMBERS);
        THIS = [Measures].[Sales Amount]*[Measures].[Tax Rate];
    END SCOPE;
END SCOPE;

Surely a simple change? But no, as soon as I did this my query ran for several minutes and memory usage went through the roof until the query was automatically cancelled:

image

Clearly the SSAS MD Formula Engine could optimise the version with the hard-coded constant value but could not optimise the version with the calculated measure. There was nothing in Profiler to indicate the calculation was being evaluated in cell-by-cell mode though.

So I tried another variation:

CREATE MEMBER CURRENTCUBE.[Measures].[Tax Rate] AS  
IIF([Measures].[Sales Amount]=0, NULL, 0.08);

SCOPE([Measures].[Tax Amount]);
    SCOPE(
	[Customer].[Customer].[Customer].MEMBERS, 
	[Date].[Date].[Date].MEMBERS, 
	[Product].[Product].[Product].MEMBERS);
        THIS = [Measures].[Sales Amount]*[Measures].[Tax Rate];
    END SCOPE;
END SCOPE;

This time the memory usage was completely flat but the query was still so slow had to be cancelled. Next, I thought I’d try setting the NON_EMPTY_BEHAVIOR property:

CREATE MEMBER CURRENTCUBE.[Measures].[Tax Rate] AS  
0.08
, NON_EMPTY_BEHAVIOR = [Measures].[Sales Amount];

SCOPE([Measures].[Tax Amount]);
    SCOPE(
	[Customer].[Customer].[Customer].MEMBERS, 
	[Date].[Date].[Date].MEMBERS, 
	[Product].[Product].[Product].MEMBERS);
        THIS = [Measures].[Sales Amount]*[Measures].[Tax Rate];
    END SCOPE;
END SCOPE;

And guess what? The query went back to being sub-second. Strange. Now the NON_EMPTY_BEHAVIOR property is very dangerous and this is a clear example of how it should not be set – the expression 0.08 is never going to return an empty value, regardless of the value of the [Sales Amount] measure. As a result I would be very wary of using this trick in production in case it ended up returning inconsistent results. It’s also worth noting that the following calculation, which is a correct use of NON_EMPTY_BEHAVIOR, is as slow as the other examples above:

CREATE MEMBER CURRENTCUBE.[Measures].[Tax Rate] AS  
IIF([Measures].[Sales Amount]=0, NULL, 0.08)
, NON_EMPTY_BEHAVIOR = [Measures].[Sales Amount];

Finally I created a dummy table in my DSV with one row and one column to hold the 0.08 value, and then created a real, non-calculated measure from the column. When I used this measure in my calculation then performance of my test query was also sub-second.

So, to sum up, it looks like hard-coding constant values in calculated measures is a bad idea, at least in cases like this, and either using the values themselves in your MDX calculations or creating a table and non-calculated measure specifically to hold the value is better for performance.

Analysis Services Import Performance Improvements In The August 2018 Release Of Power BI

While I was playing around with the new release (August 2018) of Power BI Desktop I noticed there was an undocumented change: similar to the OData improvements I blogged about here, there is a new option in the AnalysisServices.Database() and AnalysisServices.Databases() M functions that turns on a newer version of the MDX generation layer used by the Power Query engine. Like the OData improvements it is an option called Implementation=”2.0”, used like this:

AnalysisServices.Databases(
	"localhost", 
	[
		TypedMeasureColumns=true, 
		Implementation="2.0"
	]
)

…and also, as with the OData improvements, you will need to manually edit any existing M queries to take advantage of this.

In fact first heard about this option in a comment on this blog post back in January, but as I was told by the dev team that it hadn’t been tested properly I didn’t blog about it. However as it is now set by default in the M code generated by the Power Query Editor, I guess it’s ready for production use. I’m told it should improve the performance of M queries that import data from Analysis Services – and I would be very interested to hear from anyone who tests this about how much of an improvement they see.

I’ve done a little bit of testing myself and I can see there are indeed some differences in the MDX generated. For example, I created an M query that connected to the Adventure Works DW database and returned all combinations of customer and day name where the Internet Sales Amount measure is greater than 100. In the old version of the MDX generation layer (ie without Implementation=”2.0” set) the following MDX query is generated:

SELECT
    {
        [Measures].[Internet Sales Amount]
    }ON 0,
    SUBSET(
        NONEMPTY(
            CROSSJOIN(
                [Date].[Day Name].[Day Name].ALLMEMBERS,
                [Customer].[Customer].[Customer].ALLMEMBERS
            ),
            {
                [Measures].[Internet Sales Amount]
            }
        ),
        4096
    )
    PROPERTIES MEMBER_CAPTION, MEMBER_UNIQUE_NAME ON 1
FROM(
    SELECT
        FILTER(
            CROSSJOIN(
                [Date].[Day Name].[Day Name].ALLMEMBERS,
                [Customer].[Customer].[Customer].ALLMEMBERS
            ),
            (
                NOT((
                    ISEMPTY(
                        [Measures].[Internet Sales Amount]
                    )
                OR
                    ISEMPTY(
                        100
                    )
                ))
            AND
                ([Measures].[Internet Sales Amount] > 100)
            )
        )ON 0
    FROM [Adventure Works]
)CELL PROPERTIES VALUE

 

However with the Implementation=”2.0” option set, the following query is generated:

SELECT
    {
        [Measures].[Internet Sales Amount]
    }ON 0,
    SUBSET(
        NONEMPTY(
            FILTER(
                CROSSJOIN(
                    [Date].[Day Name].[Day Name].ALLMEMBERS,
                    [Customer].[Customer].[Customer].ALLMEMBERS
                ),
                (
                    NOT(ISEMPTY(
                        [Measures].[Internet Sales Amount]
                    ))
                AND
                    ([Measures].[Internet Sales Amount] > 100)
                )
            ),
            {
                [Measures].[Internet Sales Amount]
            }
        ),
        4096
    )
    PROPERTIES MEMBER_CAPTION, MEMBER_UNIQUE_NAME ON 1
FROM [Adventure Works] CELL PROPERTIES VALUE

The difference between the two is that the first query uses a subselect to do the filtering whereas the second does not; subselects in MDX are not necessarily bad, but I generally try to avoid using them unless I need to. There may be other differences in the way the MDX is generated in the new version but I haven’t had a chance to do any detailed testing.

More Details On The RowsetSerializationLimit Server Property For Analysis Services

The recent blog post on the Analysis Services team blog about new memory options contained information about some very interesting functionality that has just been added. The first of the new features I decided to try was the RowsetSerializationLimit server property, which restricts the number of rows returned by a query and which is intended to stop rogue users running queries that return very large amounts of data. It certainly works but there are two things to point out about it that aren’t immediately obvious.

First of all, an example of it in action. In Azure Analysis Services, using a database that contains one very simple table with one column and 5000 rows, the following DAX query returns all 5000 rows from the table:

EVALUATE 'MyTable'

image

To change the RowsetSerializationLimit server property, open SQL Server Management Studio, right-click on your server name in the Object Explorer pane and select Properties. Then go to the General page, check the Show Advanced (All) Properties box, and you’ll see the property listed under OLAP\Query\RowsetSerializationLimit:

image

The default value is –1, meaning no limit is placed on the number of rows returned by a rowset. Changing it to 100 and then running the above query results in the following error:

image

Executing the query …
Obtained object of type: Microsoft.AnalysisServices.AdomdClient.AdomdDataReader
The maximum number of rows (100) was reached.

A query that returns less than 99 rows, for example

EVALUATE TOPN(99,'MyTable')

…is ok though.

The important thing to understand is that although this applies to both DAX and MDX queries, it only applies to queries that are returned as rowsets (a tabular format) and not as cellsets (a multidimensional format). Most client tools that generate MDX, including Excel, use cellsets so this property will not apply to them unfortunately.

For example, the following MDX query run from a SQL Server Management Studio MDX query window (which also returns a cellset) on the same database runs ok and returns 5000 rows:

SELECT
{[Measures].[Number Of Rows]} ON 0,
[MyTable].[Column1].[Column1].MEMBERS ON 1
FROM
[Model]

 

image

However the same query run from Power BI Desktop to import data from the same database:

image

…runs, but returns only 100 rows and then an error:

image

Something else to point out is that this applies to all queries that return rowsets, even Discover requests and DMV queries. As a result, setting this property to a very low value can cause problems in tools like SQL Server Management Studio: for example if you set the property to 10 and you had 11 tables in a database, you would see an error when you tried to expand the Tables node of a database in the Object Explorer pane!

Even though this property only applies to rowsets I think it’s still going to be very useful, especially in scenarios where Power BI Desktop users are importing vast amounts of data from Analysis Services and causing major problems on the server.

Analysis Services Query Analyzer

Last week a new, free tool for analysing the performance of MDX queries on SSAS Multidimensional was released: Analysis Services Query Analyzer. You can get all the details and download it here:

https://ssasqueryanalyzer.github.io/

…and here’s a post on LinkedIn by one of the authors, Francesco De Chirico, explaining why he decided to build it:

https://www.linkedin.com/pulse/asqa-10-released-francesco-de-chirico/

I’ve played around with it a bit and I’m very impressed – it’s a really sophisticated and powerful tool, and one I’m going to spend some time learning because I’m sure it will be very useful to me. It’s an AddIn for SQL Server Management Studio and captures Profiler and Perfmon data for an MDX query when you run it, does so for a cold and warm cache, and then displays the results in a variety of graphs and charts. And it even has an MDX formatter built in!

image

image

image

You can find a full list of features here, and when you download the tool it comes with very detailed documentation. Definitely a must-have for all serious SSAS Multidimensional developers.

%d bloggers like this: