Deprecated And Discontinued Functionality In SSAS 2016

Some time ago I blogged about the deprecated and discontinued functionality in SSAS 2014, so I thought it would be a good idea to follow my last post on what’s new in SSAS 2016 Multidimensional with a discussion of what’s going or gone from it.

The same page that I linked to last time has been updated for 2016, and there are four more subpages with all the details. There’s nothing much interesting to say about the breaking changes (basically AMO has been rejigged) or behaviour changes (there’s no in-place upgrade for Tabular models using DirectQuery – you have to open the project and edit some settings) but the other two pages do have some news worthy of comment:

Discontinued Functionality

Here’s the official definition of a discontinued feature:

A discontinued feature is one that is no longer supported. It might also be physically removed from the product.

A few comments about what is now discontinued:

  1. The Non_Empty_Behavior property for calculated measures. To be honest, I’m happy to see this go: it doesn’t usually make much difference to performance and in most cases people use it incorrectly too. It really should be removed from Form View in the SSDT cube editor.
  2. COM assemblies. Note that this is not the same thing as .NET assemblies like the Analysis Services Stored Procedure Project! If you are using a custom MDX function implemented in a COM assembly you will probably find that the equivalent function implemented in a .NET assembly is a lot slower, but in my experience it’s almost always possible to avoid custom functions completely and use pure MDX – and this will give you the best query performance.
  3. I’m a bit sad to see the CalculationCurrentPass() and CalculationPassValue() functions die because you could do cool things like this with them, but I haven’t actually needed to use them for a long, long time.

Deprecated Functionality

Here’s the official definition of a deprecated feature:

A deprecated feature is a feature will be cut from the product in a future release, but is still supported and included in the current release to maintain backward compatibility. Typically, a deprecated feature is removed in a major release, often within two releases of the original announcement. For example, deprecated features announced in SQL Server 2012 are likely to be unsupported by SQL Server 2016.

A few comments on what’s been deprecated:

  1. The only thing that someone may possibly be using on the list of features that will not be supported in the next major release is linked dimensions; remote partitions and remote linked dimensions were always a very bad idea.
  2. The death of dimension writeback (note: not the same thing as writing values back to cells) is a bit of a shame: I never needed to use it, and most client tools didn’t support it, but it always struck me as one of those features that people might have used more if they had known about it.
  3. I see session cubes (as used by Excel PivotTables’ grouping functionality) will not be supported in a future release, which is probably a good thing given all the problems they cause. However I bet there are a lot of Excel workbooks out there that will be affected when this does go.
  4. Local cubes will also go in a future release, though I doubt anyone uses them anymore. That said, I think it would be really useful if there was something similar that allows you to create a Power Pivot model that was a local copy of an SSAS Tabular database, with the option to filter the data in it. Offline access is not so important these days but this would allow users to create their own customised Power Pivot models from a properly designed, central model rather than always having to start from scratch.
  5. Profiler for trace capture is also deprecated, and I discussed this in-depth here.

 

Excel 2016 PivotTable MDX Changes Lead To Big Query Performance Gains

Yesterday, the Excel team announced a bunch of performance improvements in Excel 2016 for PivotTables connected to Analysis Services (Multidimensional or Tabular), Power Pivot and Power BI. If you haven’t read the official blog post already, here it is:

https://blogs.office.com/2016/07/07/faster-olap-pivottables-in-excel-2016/

In that post, Microsoft point out that how much of a performance increase you get will depend on a number of factors. I guess they have to do this to manage expectations. However I’m going to come right out and say that these changes are probably the most important thing that has happened in the last five years for Analysis Services or Power Pivot query performance and if you are using Excel PivotTables with SSAS or Power Pivot for reporting, you should upgrade to Excel 2016 (you’ll need the Office 365 click-to-run version) just for these changes.

The key improvement that Microsoft have made here is to stop Excel bringing back unnecessary subtotal values in the MDX queries that it generates. This has been a gigantic problem for years and several people have blogged about it: I did here, and among other people Rui Quintino and Richard Lees have posts on the subject that are worth reading. A lot of my consultancy work involves SSAS and MDX performance tuning and I think I see around five to ten customers every year who are seriously affected by this – and I’m sure this is just the tip of the iceberg. Even if you don’t think you are affected, you may have users who are putting up with slow refresh times for Excel-based reports without your knowledge.

Let’s go into some technical detail, see what the original problem was and what has changed in Excel 2016. I’m going to provide some examples using Excel 2013/2016 against SSAS Multidimensional but as I said the same problems occur (although less severely) with SSAS Tabular and Power Pivot.

Consider the following Excel 2013 PivotTable connected to the Adventure Works cube:

image

image

It’s typical of the kind of thing an end user wants to build, and there are two things to point out:

  1. Notice there are three fields that have been placed on the Rows axis of the PivotTable: the Gender and Marital Status hierarchies from the Customer dimension, and the Category hierarchy from the Product dimension. It’s pretty common for users to put many more fields together on an axis when they want to display detailed data.
  2. The user has also turned off grand totals and subtotals in the PivotTable so that only the detailed data is shown. This often happens in combination with (1) because if the user only wants to see the detailed data the subtotals and grand totals are a distraction.

In this case the PivotTable layout has been switched Tabular and the Repeat All Item Labels option is selected to make the PivotTable look like a table, but these are just aesthetic changes.

This PivotTable contains 12 rows, but if you look at the MDX query that is generated by Excel 2013 to populate it, it returns 36 rows of data (actually the MDX in this case returns 36 columns, but that’s a quirk of Excel) because the subtotals and grand total that Excel is not displaying are still being returned. Here’s the query:

SELECT 
NON EMPTY 
CrossJoin(
CrossJoin(
Hierarchize(
{DrilldownLevel(
{[Customer].[Gender].[All Customers]}
,,,INCLUDE_CALC_MEMBERS)}), 
Hierarchize(
{DrilldownLevel({[Customer].[Marital Status].[All Customers]}
,,,INCLUDE_CALC_MEMBERS)})), 
Hierarchize(
{DrilldownLevel({[Product].[Category].[All Products]}
,,,INCLUDE_CALC_MEMBERS)})) 
DIMENSION PROPERTIES PARENT_UNIQUE_NAME,HIERARCHY_UNIQUE_NAME 
ON COLUMNS  
FROM 
[Adventure Works] 
WHERE ([Measures].[Internet Sales Amount]) 
CELL PROPERTIES VALUE, FORMAT_STRING, 
LANGUAGE, BACK_COLOR, FORE_COLOR, FONT_FLAGS

And here’s a sample of what it returns, with the unwanted values highlighted:

image

You can probably guess that these unwanted subtotals and grand totals make the query slower, and the more fields you put together on the rows or columns of a PivotTable the greater the number of subtotals/grand totals are returned and slower things get. PivotTables connected to SSAS or Power Pivot often become completely unusable with more than six or seven fields put together on the same axis, and the only workarounds before Excel 2016 are either to write custom MDX in a named set (I show how to do this here) or if you are using SSAS Multidimensional try to use member properties instead of separate fields (as Richard Lees shows here), but neither are really great alternatives.

Building the same PivotTable using Excel 2016, however, gives the following MDX:

SELECT 
NON EMPTY 
CrossJoin(
CrossJoin(
Hierarchize(
{[Customer].[Gender].[Gender].AllMembers}), 
Hierarchize(
{[Customer].[Marital Status].[Marital Status].AllMembers})), 
Hierarchize(
{[Product].[Category].[Category].AllMembers})) 
DIMENSION PROPERTIES PARENT_UNIQUE_NAME,HIERARCHY_UNIQUE_NAME 
ON COLUMNS  FROM [Adventure Works] 
WHERE ([Measures].[Internet Sales Amount]) 
CELL PROPERTIES VALUE, FORMAT_STRING, 
LANGUAGE, BACK_COLOR, FORE_COLOR, FONT_FLAGS

Which returns just the data needed for the PivotTable:

image

What has changed? Focusing on just the Gender hierarchy, the difference between the two queries is that in Excel 2013 the selection on Gender is given using the DrillDownLevel() function on the All Member of the hierarchy. Isolated and put in its own query, the set expression used looks like this:

SELECT 
{[Measures].[Internet Sales Amount]}
ON COLUMNS,
DrilldownLevel(
{[Customer].[Gender].[All Customers]}
,,,INCLUDE_CALC_MEMBERS)
ON ROWS
FROM
[Adventure Works]

image

Notice how the All Member, used by Excel to return subtotals and grand totals, is returned by the set. Excel 2016 instead just asks for the members on the Gender level of the Gender hierarchy, not including the All Member. Again, put inside its own query, you can see what it returns:

SELECT 
{[Measures].[Internet Sales Amount]}
ON COLUMNS,
{[Customer].[Gender].[Gender].AllMembers}
ON ROWS
FROM
[Adventure Works]

image

Apart from the fact that no All Member is returned, it’s also a lot cleaner and easier to read.

I’ve been very excited about this change since I first heard it was happening, and I think it will benefit a lot of people. A big thanks to Alexander Lahuerta and the rest of the Excel team for making this happen!

Monitoring SSAS Multidimensional Non Empty Filtering Using Profiler, Part 3

In Part 1 of this series I introduced the different types of non empty filtering that occur in Analysis Services Multidimensional and in Part 2 I showed how you can use monitor this activity using Profiler. In this, the final part of the series, I’m going to show some examples of how you can use this information while tuning MDX queries.

Let’s start by looking at the following query:

SELECT
{[Measures].[Internet Sales Amount]}
ON 0,
NON EMPTY
[Customer].[Customer].[Customer].MEMBERS
*
[Product].[Subcategory].[Subcategory].MEMBERS
ON 1
FROM
[Adventure Works]
WHERE([Date].[Calendar Year].&[2003])

 

It returns 19004 rows – all of the combinations of Customer and Subcategory that have a value in the year 2003:

image

Here’s what you can see in Profiler:

image

There are two Non Empty operations here: the ProgressTotal column shows that first is the NON EMPTY statement on the rows axis; the second we can ignore because it’s the evaluation of the WHERE clause. The Duration column shows that the first Non Empty operation takes just 54ms and the query as a whole takes 1021ms.

Now, let’s make things a bit more complicated by adding an extra filter so we only see the Customer/Subcategory combinations where Internet Sales Amount is less than $10:

SELECT
{[Measures].[Internet Sales Amount]}
ON 0,
NON EMPTY
FILTER(
[Customer].[Customer].[Customer].MEMBERS
*
[Product].[Subcategory].[Subcategory].MEMBERS
,
[Measures].[Internet Sales Amount]<10)
ON 1
FROM
[Adventure Works]
WHERE([Date].[Calendar Year].&[2003])

image

Here’s what Profiler shows:

image

The query now takes 2512ms. But why is it slower? The obvious assumption to make is that it’s the Filter() that has slowed things down, but it looks like the Filter() and the NON EMPTY are now being evaluated as a single operation because the first Non Empty operation in the trace is now taking 2408ms – the majority of the query duration.

Removing the NON EMPTY statement from the rows axis and putting the logic to filter out the customers with no sales into the Filter() function, like so:

SELECT
{[Measures].[Internet Sales Amount]}
ON 0,
FILTER(
[Customer].[Customer].[Customer].MEMBERS
*
[Product].[Subcategory].[Subcategory].MEMBERS
,
[Measures].[Internet Sales Amount]<10 
AND 
(NOT ISEMPTY([Measures].[Internet Sales Amount])))
ON 1
FROM
[Adventure Works]
WHERE([Date].[Calendar Year].&[2003])

 

…only makes things worse, increasing query duration to 4139ms. This confirms our suspicion that Filter() is the problem here and that NON EMPTY can remove the empty customers faster than Filter() can.

The problem with the last query but one is that the NON EMPTY statement is being applied after the Filter(). Wouldn’t it be faster to remove the empty customers first and then filter out the ones where Internet Sales Amount is less than $10, so the slower Filter() can be applied over a smaller set?

There are two ways we can do this. First of all, we can use the NonEmpty() function instead of the NON EMPTY statement to remove the empty customers. NonEmpty() is not faster than the NON EMPTY statement per se, but it does allow us to change the order that the different types of filtering are applied here, and that can make all the difference to performance. Here’s a new version of the query:

SELECT
{[Measures].[Internet Sales Amount]}
ON 0,
FILTER(
NONEMPTY(
[Customer].[Customer].[Customer].MEMBERS
*
[Product].[Subcategory].[Subcategory].MEMBERS,
[Measures].[Internet Sales Amount]),
[Measures].[Internet Sales Amount]<10)
ON 1
FROM
[Adventure Works]
WHERE([Date].[Calendar Year].&[2003])

image

Query duration is now down to 217ms and the first Non Empty operation is only 57ms.

There’s another way of doing this. For MDX geek-points you could use the ultra-obscure HAVING clause in your query to do the filtering after the NON EMPTY, like so:

SELECT
{[Measures].[Internet Sales Amount]}
ON 0,
NON EMPTY
[Customer].[Customer].[Customer].MEMBERS
*
[Product].[Subcategory].[Subcategory].MEMBERS
HAVING
[Measures].[Internet Sales Amount]<1000
ON 1
FROM
[Adventure Works]
WHERE([Date].[Calendar Year].&[2003])

From what I can see, the HAVING clause performs a few milliseconds faster than the previous query – measurable but not something a user would notice. I also tested a variation on Mosha’s classic calculated measure approach for Count/Filter optimisation but that performed worse than the two previous queries.

Monitoring SSAS Multidimensional Non Empty Filtering Using Profiler, Part 2

In part 1 of this series I introduced all of the different types of non empty filtering that can occur in Analysis Services and MDX. In this post I’ll show you how you can monitor each of these types of non empty filtering using Profiler using the queries from part 1 as examples.

Profiler events

The three Profiler events we’re going to be most concerned with here are:

  • Calculate Non Empty Begin, which is raised when the engine starts a non empty filter
  • Calculate Non Empty Current, which will be raised one or more times when the non empty filter is in progress and which will give us more detail about what’s happening
  • Calculated Non Empty End, which is raised when the engine finishes evaluating a non empty filter

I’ve also included the Query Begin/End events in my traces just to show when query execution starts and finishes. All the queries in this post will be run on a warm cache; although the values used for non empty filtering can be cached, SSAS can’t cache the output of a non empty filter (unless you do something like this) so for our purposes whether the cache is warm or cold is irrelevant.

image

NON EMPTY and NONEMPTY()

In the simplest cases SSAS treats the NON EMPTY statement and the NONEMPTY() function the same, so let’s look again at the following query from my previous post:

SELECT 
{[Measures].[Internet Order Quantity]} 
ON 0,
NON EMPTY
[Product].[Category].[Category].MEMBERS
ON 1
FROM
[Adventure Works]

image

Here’s what our Profiler trace shows us when this query is run:

image

There are several things to notice here:

  • The Duration column tells us how long, in ms, both the query took to run (on the Query End line) and the non empty filter took to evaluate (on the Calculate Non Empty End line).
  • The EventSubclass column, for the Calculate Non Empty Current events, shows the different stages of evaluation. The possible values here are:
    • 1 – cell values are evaluated for the filtering
    • 2 – calculated members are evaluated (this is no longer used from SSAS 2008 on)
    • 3 – the tuples are rendered in the resultset
  • The IntegerData column shows what type of non empty filtering is taking place. The possible values here are:
    • 1 – All non empty filtering using NON EMPTY and NONEMPTY() using a fast algorithm (as shown in the screenshot above)
    • 2 – EXISTING operations using a fast algorithm.  This is a bit misleading though, because due to an internal bug this event will not fire for most uses of the EXISTING statement but will fire for some operations related to regular autoexists.
    • 3 – Autoexists operations using a fast algorithm
    • 11 – All non empty filtering using NON EMPTY and NONEMPTY() using a slower algorithm, for when complex calculations need to be evaluated
    • 12 – as (2) but with the slower algorithm (theoretically possible but should never occur in practice)
    • 13 – as (3) but with the slower algorithm (again, theoretically possible but should never occur in practice)
  • The ProgressTotal column for the Calculate Non Empty Current (as with the Serialize Results Current event) and EventSubclass 1 should return the number of tuples evaluated for the non empty filter. If the number of tuples is greater than 1000 you’ll see multiple events, one for each 1000 tuples, with the last event showing the overall total. Unfortunately there’s a bug here that means it returns one less than the actual number of tuples evaluated. What’s more, in most cases, the non empty filter operations in the SSAS engine are difficult to match to whatever’s going on in the query, so it’s only really useful as a relative measure of how expensive the operation is.

Putting this all together, for this query we can see that there was one non empty filter operation, it was using the fast NON EMPTY algorithm, the ProgressTotal column shows 3 tuples were returned (ie the three Categories on rows) and it took 1ms.

Autoexists

Now let’s look at the autoexists query from the previous post in this series:

SELECT 
{[Measures].[Internet Order Quantity]} 
ON 0,
[Product].[Category].[Category].MEMBERS
*
[Product].[Color].[Color].MEMBERS
ON 1
FROM
[Adventure Works]

image

Here’s what Profiler shows:

image

Even with what looks like a single autoexists operation there are three sets of Non Empty events here, two of which have IntegerData 3 (for autoexists) and one with IntegerData 2 (for EXISTING which, as noted above, is to be expected here with autoexists). The ProgressTotal column returns 26 for a query that returns 24 rows which I guess is near enough correct to be useful.

EXISTS() And Existing

As you might expect, the EXISTS() function produces a result very similar to a straightforward autoexists. Taking the EXISTS() query from my previous post:

SELECT 
{[Measures].[Internet Order Quantity]} 
ON 0,
EXISTS(
[Product].[Category].[Category].MEMBERS
, {[Product].[Color].&[White]})
ON 1
FROM
[Adventure Works]

image

You can see there’s just one Non Empty Begin/End event pair generated, with IntegerData 3, and ProgressTotal returns 0 which means that one tuple is returned:

image

Whereas the use of EXISTING inside a calculated measure, like so:

WITH
MEMBER MEASURES.COLOURCOUNT AS
COUNT(EXISTING [Product].[Color].[Color].MEMBERS)
SELECT 
{MEASURES.COLOURCOUNT} 
ON 0,
[Product].[Category].[Category].MEMBERS
ON 1
FROM
[Adventure Works]

image

Causes multiple Non Empty Begin/End pairs, very probably one for each member on the hierarchy based on the number of events and the values in the ProgressTotal column (the ProgressTotal values for the second, third, fourth and fifth non empty filters tally with the four values returned by the calculated measure; I don’t know what the first non empty filter is doing) :

image

The WHERE Clause And Subselects

Weirdly enough, the presence of a WHERE clause or a subselect in a query also triggers Non Empty Begin/End events [Subselects are something that a lot of SSAS developers get confused by, especially when it comes to the differences between a subselect and the WHERE clause – I recommend watching this short video if you’re unsure of what they do].

The following query with a WHERE clause generates a single Non Empty Begin/End event pair:

SELECT 
{[Measures].[Internet Order Quantity]} 
ON 0,
[Product].[Color].[Color].MEMBERS
ON 1
FROM
[Adventure Works]
WHERE([Customer].[Total Children].&[5])

image

image

Probably the only value in knowing that this happens is that you can ignore it when you see it.

A query with a subselect instead of a WHERE clause, while it returns exactly the same results in this case, produces different activity in Profiler:

SELECT 
{[Measures].[Internet Order Quantity]} 
ON 0,
[Product].[Color].[Color].MEMBERS
ON 1
FROM
(SELECT {[Customer].[Total Children].&[5]} ON 0
FROM
[Adventure Works])

image

image

Again, if you have a subselect in your query (they are very common in MDX generated by SSRS and Excel) you should ignore this activity. In both cases you’ll note that the IntegerData column shows 3 for autoexists and ProgressTotal shows 0.

Summary

What’s clear from these examples is that trying to relate what’s going on in the query to what you see in Profiler is quite tricky even for seemingly simple queries; for most real-world queries it would be almost impossible to do so with total confidence. That said, when I’m tuning queries I usually comment out large parts of the code to try to isolate problems, thus creating much simpler queries, and I hope the value of this post will lie in you being able to spot similar patterns in Profiler to the ones I show here when you do the same thing. In part 3 of this series I’ll show you some practical examples of how all this information can help you tune your own queries.

Monitoring SSAS Multidimensional Non Empty Filtering Using Profiler, Part 1

Filtering out empty values is something that SSAS does a lot of during query execution, and it is also a common cause of performance problems. In this series of posts (similar to my series earlier this year on results serialisation) I’ll look at the different types of non empty filtering that can occur in an MDX query, how they can be monitored using Profiler and what you can do to improve their performance.

Some of this information has come from an old white paper, but I’ve found that some of what that paper says is now out of date and I’m extremely grateful to Akshai Mirchandani of Microsoft for answering my questions on this subject. Even with the long-suffering Akshai’s help a lot of the information here is based on my own research and therefore potentially incomplete/inaccurate, potentially different for different versions of SSAS (I’m using SSAS 2014 for this series) and could potentially change again in the future, so take due care!

The first question to ask is: what counts as non empty filtering? There are actually several different operations that the engine treats as a non empty filter, a few of which I was surprised by; here are the ones I know about.

1) The NON EMPTY statement

Most MDX queries generated by client tools include a NON EMPTY statement on the rows and columns axis. For example, take a look at the results returned by this MDX query on the Adventure Works cube:

SELECT 
{[Measures].[Internet Order Quantity]} 
ON 0,
[Product].[Category].[Category].MEMBERS 
ON 1
FROM
[Adventure Works]

image

As you can see, the Product Category Components has no data and returns a null. One way to remove the row for Components would be to add a NON EMPTY statement to the rows axis:

SELECT 
{[Measures].[Internet Order Quantity]} 
ON 0,
NON EMPTY
[Product].[Category].[Category].MEMBERS 
ON 1
FROM
[Adventure Works]

image

2) The NONEMPTY() function

Often confused with the NON EMPTY statement, but not the same thing: the NON EMPTY statement can only be used on an axis in a SELECT statement, whereas the NONEMPTY() function can be used in any MDX expression. Continuing our example, here’s how to use it to remove the Component category:

SELECT 
{[Measures].[Internet Order Quantity]} 
ON 0,
NONEMPTY(
 [Product].[Category].[Category].MEMBERS,
{[Measures].[Internet Order Quantity]})
ON 1
FROM
[Adventure Works]

The NONEMPTY() function is much more flexible than the NON EMPTY statement but essentially does the same thing – it isn’t any faster in what it does, but it does allow you to make certain assumptions about your data that can improve query performance (more of that later). One thing to remember is to always set the second parameter, because if you don’t you may get unexpected results.

There is also a NONEMPTYCROSSJOIN() function but it is deprecated and you should not be using it – everything that it does can be done more reliably with other functions.

3) Autoexists

Autoexists is not a feature of the MDX language but rather something that SSAS does automatically to remove tuples from a set that it knows must always be null. It’s described in great detail here, but it’s quite easy to illustrate. We already know from the queries above which categories have data; similarly the following query shows there is data for all colours except Grey and Silver/Black:

SELECT 
{[Measures].[Internet Order Quantity]} 
ON 0,
[Product].[Color].[Color].MEMBERS
ON 1
FROM
[Adventure Works]

image

However if you crossjoin every category and every colour on the rows axis, you don’t see every combination of category and colour returned:

SELECT 
{[Measures].[Internet Order Quantity]} 
ON 0,
[Product].[Category].[Category].MEMBERS
*
[Product].[Color].[Color].MEMBERS
ON 1
FROM
[Adventure Works]

image

There is no row for the category Bikes and the colour White, for example. This is because the SSAS engine knows from the data in the dimension that no product exists that is both the colour White and in the category Bikes, so it doesn’t return that combination from the crossjoin – that particular tuple could never contain any data so there’s no point returning it. Notice that there are combinations, such as Components/Black, that exist in the dimension and are present on the rows axis but still return null because there is no value for Internet Order Quantity.

It’s important to remember that autoexists only takes place when you are working with sets of members from different hierarchies on the same dimension, never with sets of members from different dimensions.

4) The EXISTS() function and the EXISTING keyword

The EXISTS() function and the EXISTING keyword allow you to take advantage of autoexists for filtering inside your own expressions without having to actually do a crossjoin (there’s another variant of EXISTS() with a third parameter that behaves more like NONEMPTY() but it’s very rarely used so I’m going to ignore it).

For example, here’s how you can use the EXISTS() function to return all the categories that have a product that is White:

SELECT 
{[Measures].[Internet Order Quantity]} 
ON 0,
EXISTS(
[Product].[Category].[Category].MEMBERS
, {[Product].[Color].&[White]})
ON 1
FROM
[Adventure Works]

image

The EXISTING keyword is used within calculations to apply autoexists filtering to a set based on the other hierarchies from the same dimension. The following query contains a calculated measure that counts the number of members on the Color level of the Color hierarchy, and unsurprisingly returns the same value each time it’s called:

WITH
MEMBER MEASURES.COLOURCOUNT AS
COUNT([Product].[Color].[Color].MEMBERS)
SELECT 
{MEASURES.COLOURCOUNT} 
ON 0,
[Product].[Category].[Category].MEMBERS
ON 1
FROM
[Adventure Works]

image

However, if you add the EXISTING keyword just before the set in the calculated measure definition, like so:

WITH
MEMBER MEASURES.COLOURCOUNT AS
COUNT(EXISTING [Product].[Color].[Color].MEMBERS)
SELECT 
{MEASURES.COLOURCOUNT} 
ON 0,
[Product].[Category].[Category].MEMBERS
ON 1
FROM
[Adventure Works]

Then you’ll see that the calculation now returns the number of members on the Color level of the Color hierarchy after autoexists filtering has been applied; so for example the first line shows there are five distinct colours associated with the Category Bikes:

image

Summary

OK, after that somewhat lengthy introduction, in part 2 I’ll show you how to use Profiler to monitor what’s going on inside SSAS when you do all of these different types of non empty filtering.

Disabling Excel PivotTable Grouping And Session Cubes In SSAS Multidimensional

The Excel PivotTable grouping functionality that is available when you are connected to an SSAS Multidimensional cube (but not a Tabular model) is a Very Bad Thing indeed. In my experience it is a major cause of query performance problems – not just for the person running the query, but because it is so resource intensive for everyone else trying to query the cube too. This post from a few years ago gives some more details about why custom grouping, and the CREATE SESSION CUBE MDX statements that it uses behind the scenes, are so expensive.

Up until now it was impossible to prevent users from using custom grouping but in a recent cumulative update this changed. First of all, I encourage you to read the details of the fix because you will probably want to install the relevant CU for security reasons anyway:

https://support.microsoft.com/en-us/kb/3080856

[And if you are wondering whether you should be installing CUs I strongly suggest you read this post by Aaron Bertrand]

Once the CU has been installed the following new property is added to the msmdrv.ini file:

\OLAP\Query\SessionCubesMode

This will be set to 1. Setting this property to 0 will prevent users from creating session cubes and therefore prevent them from using Excel’s custom grouping functionality.

My opinion is that it’s a good idea to disable session cubes and custom grouping even if you don’t know whether your users are using these features. Yes, your users will lose some functionality and some reports might even break, but you will also save yourself and your users a lot of problems. If your users need to do custom grouping in a report that is usually an indication that you have missed something in your dimension design, and that an extra attribute hierarchy or two is necessary.

[Thanks to Akshai for this information]

Monitoring SSAS Multidimensional MDX Query Results Serialisation, Part 2

In part 1 of this series I looked at the basics of monitoring SSAS Multidimensional query resultset serialisation in Profiler. In this post, I’ll be taking a look at what happens for queries that return large amounts of data

Consider the following query on the Adventure Works DW database, which, when I run it in SQL Server Management Studio returns a cellset with 60391 rows:

SELECT
{[Measures].[Internet Order Quantity], 
[Measures].[Internet Sales Count]}
ON 0,
NON EMPTY
[Date].[Date].[Date].MEMBERS
*
[Customer].[Customer].[Customer].MEMBERS
*
[Product].[Product].[Product].MEMBERS
ON 1
FROM
[Adventure Works]

image

There are a couple of interesting things to note about this query. First, SQL Server Management Studio on my laptop says that it takes nine seconds to run, even on a warm cache; the Duration column for the Query End event in Profiler, however, shows a value of around six seconds. The three second difference must be the time it takes for SSAS to return the cellset to SQL Server Management Studio, and for SQL Server Management Studio to render the results (my guess is that it’s the second operation that takes the majority of this time – other client tools may be more efficient at rendering large resultsets).

Secondly, in Profiler, you’ll see a much larger number of Serialize Results Current events. In situations where an axis contains more than a thousand tuples, or a cellset contains more than a thousand cells, you’ll see one Serialize Results Current event for each thousand tuples or cells. The ProgressTotal column will show values incrementing by one thousand up to the total number of tuples or cells. So, here’s some of what Profiler shows for the serialisation of the Rows axis:

image

…and here’s the end of the trace, showing the end of the serialisation of the cells (60391 rows * 2 columns = 120782 cells):

image

The third thing to notice is that there is only one Storage Engine operation – shown by the Query Subcube Verbose event in the first Profiler screenshot above – and that hits the Storage Engine cache and is so quick the Duration column shows 0 ms. Getting the raw data isn’t the problem here, and there aren’t any MDX calculations either – which means that it’s the Non Empty filter and construction of the cellset that is taking all the time. Since neither of these operations can be cached (although you can play tricks like this), this explains why the query always takes six seconds to run, even on a warm cache. Further investigation reveals that the Non Empty filter in fact only takes about a quarter of a second, so it’s the construction of a large cellset that’s the real problem here. This is why I say you should always avoid queries that return large amounts of data! SSAS is not very good at returning large resultsets.

Incidentally, don’t fall into the trap of thinking that the values shown in the Duration column for the Serialize Results End event only represents the amount of time taken to construct the cellset. It shows the amount of time since the Serialize Results Begin event, and in between the Begin and End events all kinds of other things necessary for the query to return (such as the evaluation of MDX calculations) could be going on. In a lot of cases the Serialize Results End event shows a duration that is almost the same as the duration for the whole query, but that only means that serialisation was able to start soon after the query began. In order to find the overhead of serialisation you need to work out how long all these other things take and subtract that from the duration shown for Serialize Results End, and that’s easier said than done.

Finally, what can you do to improve performance? Well, in the first post in this series I showed there was a tabular alternative to a cellset, and this is certainly a lot more efficient at returning large amounts of data (although you probably won’t have a choice in this unless you are building your own client tool, and, SSRS uses the tabular format anyway). For this query a tabular resultset is almost two seconds faster to return than a cellset, at just over four seconds:

image

There’s another important technique you can use, once that I have already mentioned in a blog post a couple of years ago but which is worth mentioning again: each cell returned by this query returns a large number of properties that you may not need, and these extra properties have a significant effect on the size of the resultset. Adding a CELL PROPERTIES clause to the query so that you only return the value property, like so:

SELECT
{[Measures].[Internet Order Quantity], 
[Measures].[Internet Sales Count]}
ON 0,
NON EMPTY
[Date].[Date].[Date].MEMBERS
*
[Customer].[Customer].[Customer].MEMBERS
*
[Product].[Product].[Product].MEMBERS
ON 1
FROM
[Adventure Works]
CELL PROPERTIES VALUE

…takes another two seconds off the duration of the query, whether you use a tabular resultset or a cellset:

image