Disabling Excel PivotTable Grouping And Session Cubes In SSAS Multidimensional

The Excel PivotTable grouping functionality that is available when you are connected to an SSAS Multidimensional cube (but not a Tabular model) is a Very Bad Thing indeed. In my experience it is a major cause of query performance problems – not just for the person running the query, but because it is so resource intensive for everyone else trying to query the cube too. This post from a few years ago gives some more details about why custom grouping, and the CREATE SESSION CUBE MDX statements that it uses behind the scenes, are so expensive.

Up until now it was impossible to prevent users from using custom grouping but in a recent cumulative update this changed. First of all, I encourage you to read the details of the fix because you will probably want to install the relevant CU for security reasons anyway:

https://support.microsoft.com/en-us/kb/3080856

[And if you are wondering whether you should be installing CUs I strongly suggest you read this post by Aaron Bertrand]

Once the CU has been installed the following new property is added to the msmdrv.ini file:

\OLAP\Query\SessionCubesMode

This will be set to 1. Setting this property to 0 will prevent users from creating session cubes and therefore prevent them from using Excel’s custom grouping functionality.

My opinion is that it’s a good idea to disable session cubes and custom grouping even if you don’t know whether your users are using these features. Yes, your users will lose some functionality and some reports might even break, but you will also save yourself and your users a lot of problems. If your users need to do custom grouping in a report that is usually an indication that you have missed something in your dimension design, and that an extra attribute hierarchy or two is necessary.

[Thanks to Akshai for this information]

Monitoring SSAS Multidimensional MDX Query Results Serialisation, Part 2

In part 1 of this series I looked at the basics of monitoring SSAS Multidimensional query resultset serialisation in Profiler. In this post, I’ll be taking a look at what happens for queries that return large amounts of data

Consider the following query on the Adventure Works DW database, which, when I run it in SQL Server Management Studio returns a cellset with 60391 rows:

SELECT
{[Measures].[Internet Order Quantity], 
[Measures].[Internet Sales Count]}
ON 0,
NON EMPTY
[Date].[Date].[Date].MEMBERS
*
[Customer].[Customer].[Customer].MEMBERS
*
[Product].[Product].[Product].MEMBERS
ON 1
FROM
[Adventure Works]

image

There are a couple of interesting things to note about this query. First, SQL Server Management Studio on my laptop says that it takes nine seconds to run, even on a warm cache; the Duration column for the Query End event in Profiler, however, shows a value of around six seconds. The three second difference must be the time it takes for SSAS to return the cellset to SQL Server Management Studio, and for SQL Server Management Studio to render the results (my guess is that it’s the second operation that takes the majority of this time – other client tools may be more efficient at rendering large resultsets).

Secondly, in Profiler, you’ll see a much larger number of Serialize Results Current events. In situations where an axis contains more than a thousand tuples, or a cellset contains more than a thousand cells, you’ll see one Serialize Results Current event for each thousand tuples or cells. The ProgressTotal column will show values incrementing by one thousand up to the total number of tuples or cells. So, here’s some of what Profiler shows for the serialisation of the Rows axis:

image

…and here’s the end of the trace, showing the end of the serialisation of the cells (60391 rows * 2 columns = 120782 cells):

image

The third thing to notice is that there is only one Storage Engine operation – shown by the Query Subcube Verbose event in the first Profiler screenshot above – and that hits the Storage Engine cache and is so quick the Duration column shows 0 ms. Getting the raw data isn’t the problem here, and there aren’t any MDX calculations either – which means that it’s the Non Empty filter and construction of the cellset that is taking all the time. Since neither of these operations can be cached (although you can play tricks like this), this explains why the query always takes six seconds to run, even on a warm cache. Further investigation reveals that the Non Empty filter in fact only takes about a quarter of a second, so it’s the construction of a large cellset that’s the real problem here. This is why I say you should always avoid queries that return large amounts of data! SSAS is not very good at returning large resultsets.

Incidentally, don’t fall into the trap of thinking that the values shown in the Duration column for the Serialize Results End event only represents the amount of time taken to construct the cellset. It shows the amount of time since the Serialize Results Begin event, and in between the Begin and End events all kinds of other things necessary for the query to return (such as the evaluation of MDX calculations) could be going on. In a lot of cases the Serialize Results End event shows a duration that is almost the same as the duration for the whole query, but that only means that serialisation was able to start soon after the query began. In order to find the overhead of serialisation you need to work out how long all these other things take and subtract that from the duration shown for Serialize Results End, and that’s easier said than done.

Finally, what can you do to improve performance? Well, in the first post in this series I showed there was a tabular alternative to a cellset, and this is certainly a lot more efficient at returning large amounts of data (although you probably won’t have a choice in this unless you are building your own client tool, and, SSRS uses the tabular format anyway). For this query a tabular resultset is almost two seconds faster to return than a cellset, at just over four seconds:

image

There’s another important technique you can use, once that I have already mentioned in a blog post a couple of years ago but which is worth mentioning again: each cell returned by this query returns a large number of properties that you may not need, and these extra properties have a significant effect on the size of the resultset. Adding a CELL PROPERTIES clause to the query so that you only return the value property, like so:

SELECT
{[Measures].[Internet Order Quantity], 
[Measures].[Internet Sales Count]}
ON 0,
NON EMPTY
[Date].[Date].[Date].MEMBERS
*
[Customer].[Customer].[Customer].MEMBERS
*
[Product].[Product].[Product].MEMBERS
ON 1
FROM
[Adventure Works]
CELL PROPERTIES VALUE

…takes another two seconds off the duration of the query, whether you use a tabular resultset or a cellset:

image

Monitoring SSAS Multidimensional MDX Query Results Serialisation, Part 1

Profiler (or indeed XEvents) can tell you a lot of interesting things about what happens when SSAS Multidimensional serialises the resultset returned by an MDX query. To be honest, this isn’t something I’ve looked at in detail before but recently I decided to do some research in this area – it turns out that monitoring the Profiler events related to serialisation can be very useful when you’re trying to understand what a Profiler trace is telling you about query execution as a whole.

To start off, let’s look at some very simple examples. Consider the following MDX query on the Adventure Works DW database:

SELECT
{[Measures].[Internet Order Quantity],
[Measures].[Internet Sales Count]}
ON 0,
[Date].[Calendar Year].[Calendar Year].MEMBERS
ON 1
FROM
[Adventure Works]

If you run it in SQL Server Management Studio in an MDX query window, you’ll get the following cellset back:

image

Running a Profiler trace using, amongst others, the Serialize Results Begin, Serialize Results Current and Serialize Results End events shows how SSAS is constructing the cellset returned:

image

The Serialize Results Begin event marks the point where SSAS starts to construct the cellset returned. The Serialize Results Current events that immediately follow it, with EventSubclass “1 – Serialize Axes” show SSAS serialising the tuples that are present on the Columns axis (listed as Axis 0 in the TextData column), the Rows axis (Axis 1) and the Where clause (Slicer Axis). The numeric values in the ProgressTotal column for the Serialize Results Current events shows the number of tuples on each axis: the two tuples on columns are the two measures, the six tuples on rows are the six years, and there’s one tuple on the slicer. After that SSAS gets the data for each of the cell values (as shown by the Query Subcube Verbose event – note that this query is running on a warm cache) and there is then a Serialize Results Current event with EventSubclass “2 – Serialize Cells”; the ProgressTotal column shows that twelve cells (2 columns * 6 rows) in total were returned.  The Serialize Results End event shows that SSAS has finished constructing the cellset and not surprisingly it’s followed immediately by the Query Cube End and Query End events.

Sometimes SSAS needs to do a bit more work to find out what tuples are on an axis before serialisation can begin. The following query adds a NON EMPTY to the rows axis of the query above, so that only the rows that have values are returned:

SELECT
{[Measures].[Internet Order Quantity],
[Measures].[Internet Sales Count]}
ON 0,
NON EMPTY
[Date].[Calendar Year].[Calendar Year].MEMBERS
ON 1
FROM
[Adventure Works]

image

In order to know which years will appear on rows it has to query the cube to find out which ones have values for the measures on columns; as a result the Profiler trace shows a Query Subcube Verbose event appearing now before Serialize Results Begin:

image

This trace deliberately doesn’t show any of events related to the NON EMPTY operation (something I’ll be writing about in future blog posts), but it looks like SSAS can work out which years have values and get the values needed for the cellset back in a single operation so there’s no need for another Query Subcube Verbose event before the Serialize Cells event. Note that now the empty rows have been excluded, the ProgressTotal shows that only four tuples are returned on rows and eight cells are returned overall.

Cellsets are not the only type of resultset that SSAS can return though. Some client tools, most notably Reporting Services, return a tabular dataset that doesn’t have the concept of axes; running the previous query in SSRS gives the following in Profiler:

image

As you can there is now only one Serialize Results Current event and it has the EventSubclass “3 – Serialize SQL Rowset” (even though this is still an MDX query); the ProgressTotal column shows four, which is the number of rows returned.

In Part 2, I’ll look at what happens when a query returns a much larger amount of data.

The Use And Abuse Of The MDX Freeze Statement

The other day, while helping a customer with some particularly nasty MDX scoped assignments, I realised that there weren’t many good resources on the internet that explained how to use the MDX Freeze statement. It’s something I see used quite often, but usually because some MDX calculations aren’t giving the correct results and a developer has found that putting a Freeze statement in has fixed the problem – even if they don’t understand why it has fixed the problem. So, in this post I’ll explain what Freeze does, when you might want to use it, and when there are other other, better alternatives.

First of all, the basics. Imagine you have a super-simple cube and that, apart from the Calculate statement, the only MDX you have on the Calculations tab in the cube editor is the following:

CREATE MEMBER CURRENTCUBE.MEASURES.M1 AS 1;

CREATE MEMBER CURRENTCUBE.MEASURES.M2 AS NULL;

SCOPE(MEASURES.M2);
    THIS = MEASURES.M1;
END SCOPE;

If you query the cube in Excel, you’ll see the following:

image

No surprises here: we have created two calculated measures, M1 and M2, and then used a scoped assignment to set M2 to show the value of M1. It’s important to understand that the scope statement has not copied the value of M1 into M2, but acts more like a pointer so that M1 will always display the same value as M2 even if M1 subsequently changes. This means that when we add a second scope statement to the code that alters the value of M1, as follows:

CREATE MEMBER CURRENTCUBE.MEASURES.M1 AS 1;

CREATE MEMBER CURRENTCUBE.MEASURES.M2 AS NULL;

SCOPE(MEASURES.M2);
    THIS = MEASURES.M1;
END SCOPE;

SCOPE(MEASURES.M1);
    THIS = 2;
END SCOPE;

You see the following in your PivotTable:

image

This behaviour is the source of a lot of confusion! An assignment to one measure has indirectly changed the value of another measure, and of course in a real-world cube it can be very difficult to spot situations where this has happened and if you do, what other MDX has caused this to happen.

Each statement in the MDX Script of a cube adds an extra layer of calculations to it, called a calculation pass; this is true for all the calculations in the examples above. As new calculations are added, and new passes are created, the previous passes still exist and are still accessible. In the second example above, in the outermost calculation pass, the measure M2 returns the value 2 but at the previous calculation pass (as seen in the first example) it returned the value 1. The Freeze statement allows you to freeze the values returned by a subcube of cells at a given calculation pass, so that no future calculations will change those values.

Therefore, by taking our code and adding a Freeze statement to the first scoped assignment we can prevent the second scoped assignment changing the value of M2:

CREATE MEMBER CURRENTCUBE.MEASURES.M1 AS 1;

CREATE MEMBER CURRENTCUBE.MEASURES.M2 AS NULL;

SCOPE(MEASURES.M2);
    THIS = MEASURES.M1;
    FREEZE(THIS);
END SCOPE;

SCOPE(MEASURES.M1);
    THIS = 2;
END SCOPE;

Here’s the output now:

image

Another very common way that scoped assignments can affect the value of a cell is through the aggregation of the results of a calculation. This blog post (one of the most popular I’ve ever written) explains how this behaviour can be used to implement calculations like currency conversions and weighted averages. However, in other cases, this aggregation of a calculation is an unwanted and unexpected side effect of a scope statement and calculated values that you did want to be displayed instead get replaced with weird, meaningless values. The Freeze statement can be used to stop this happening but in actual fact it’s a much better idea to understand the cause of these problems and rewrite your calculations so that Freeze isn’t necessary.

Now, imagine that in your cube you have a regular (ie not calculated) measure called Sales Amount that has its AggregateFunction property set to Sum, and that you have a fairly standard Date dimension with a Year attribute hierarchy. A PivotTable with Sales Amount on columns and Year on rows looks like this in Excel:

image

If you add the following assignment to the cube, to change the value of the All Member on Year, the value of the Grand Total in the PivotTable (which is the All Member, even if that name isn’t shown) will be changed:

SCOPE([Date].[Year].[All], [Measures].[Sales Amount]);
    THIS = 123;
END SCOPE;

image

If, on the other hand, you remove that previous assignment and replace it with an assignment on the year 2001:

SCOPE([Date].[Year].&[2001], [Measures].[Sales Amount]);
    THIS = 456;
END SCOPE;

You’ll see that not only has the value for Sales Amount for the year 2001 changed, but that the value of the All Member has been changed too: the All Member represents the aggregated total of all the years, so therefore if a year value has changed, the All Member value must change the reflect this:

image

What happens if we try to combine the two previous scope statements?

SCOPE([Date].[Year].[All], [Measures].[Sales Amount]);
    THIS = 123;
END SCOPE;

SCOPE([Date].[Year].&[2001], [Measures].[Sales Amount]);
    THIS = 456;
END SCOPE;

In this case, the output is exactly the same as with the previous example (although the measure formatting has also been lost):

image

This is because even though the first Scope statement successfully changed the value of the All Member, the aggregation of values triggered by the second Scope overwrote this value. Although you can’t see this happening in Excel, where you only see the values returned at the final calculation pass of the cube, the MDX Script Debugger can be used to see the values returned for a query at all the different passes so you can work out what’s going on.

The Freeze statement can be used to stop the second Scope from overwriting the first, like so:

SCOPE([Date].[Year].[All], [Measures].[Sales Amount]);
    THIS = 123;
    FREEZE(THIS);
END SCOPE;

SCOPE([Date].[Year].&[2001], [Measures].[Sales Amount]);
    THIS = 456;
END SCOPE;

image

However, in my opinion it makes a lot more sense to change the order of the Scope statements so that the assignment to 2001 doesn’t overwrite the assignment to the All Member:

SCOPE([Date].[Year].&[2001], [Measures].[Sales Amount]);
    THIS = 456;
END SCOPE;

SCOPE([Date].[Year].[All], [Measures].[Sales Amount]);
    THIS = 123;
END SCOPE;

The end result is the same:

image

Why do I prefer this approach to the use of Freeze? Two reasons:

  1. It works with natural MDX behaviour rather than trying to fight against it. In this case it’s just one line of code less, but in the real world it could result in a much greater reduction. It’s true that you have to put a lot of thought into the ordering of your calculations, but I don’t think you can get away from that. Using Freeze to make your calculations work properly without understanding why it’s needed results in much more complex code, often with duplicated calculations because Freeze still doesn’t give the desired results, and is frankly a bit of a hack.
  2. There are, or at least were, performance implications with the use of Freeze. In Analysis Services 2005 I saw a few cases where the use of Freeze contributed to poor query performance, and where reordering scope statements so that it was no longer necessary made performance better. I’m not sure whether this is still the case with SSAS 2014 but it may well be.

I see Freeze abused most often in financial cubes, when scope statements are used to define calculations on a chart of accounts hierarchy. Sometimes I have even seen the same calculation code appear in several places in the same MDX Script, just to make sure that the calculations always return the right result – all because the calculations on the chart of accounts dimension are aggregating up and overwriting each other. In this case the simple rule you have to remember is to always scope the calculations on the lowest level of the hierarchy first, then scope the calculations on the second-lowest level, and so on working your way up to the top of the hierarchy. This way you can be sure that your scope will never aggregate up and overwrite the result of another calculation.

Apart from that, I also see Freeze used when a cube contains a Date Tool dimension that uses regular members instead of calculated members, in the way described here. Now there are a lot of good reasons to use regular members on a Date Tool dimension (it will work with all versions of SSAS and Excel for instance) but I have also seen a lot of cases where the fact that you are scoping calculations on regular measures, which may then get aggregated up accidentally, has caused a lot of problems – not only resulting in incorrect values appearing, but also making query performance worse. For that reason, nowadays I prefer to use calculated members on my Date Tool dimension rather than regular members.

Analysing SSAS Extended Event Data With Power Query: Part 2, Storage Engine Activity

In part 1 of this series I showed how to use Power Query to extract Extended Event data generated by SSAS. Having done that, I now want to show the first (I hope of many) examples of how this data can be used for performance tuning: analysing activity in the Storage Engine, the part of SSAS that reads data from disk and aggregates it up.

I won’t go into the technical details of how I’ve used Power Query to crunch this data; you can download the sample workbook here and see for yourself. There’s nothing particularly complex going on. In brief, what I’ve done is the following:

  • Called the function shown in part 1 to get the raw Extended Event data
  • Filtered that data so that only the Query End, Query Subcube Verbose and Progress Report End events are left
  • Calculated the start time of each event relative to the start time of the earliest recorded event, to make plotting these events on a waterfall chart possible
  • Built an Excel report, including various Power Pivot measures, some normal slicers to make it easy to filter the data, some disconnected slicers for filtering so you only see events that started within a given time range, and a PivotChart showing the waterfall chart (since Excel doesn’t support this type of chart natively, I’ve used this technique to reproduce a waterfall chart with a stacked bar chart)

Here’s an example screenshot of the result, showing Storage Engine activity for a single query:

image

Though it’s hard to see the details at this resolution, the yellow line is the Query End event associated with the query, the grey lines are the Query Subcube Verbose events associated with the query, and the brown lines are the Progress Report events associated with each Query Subcube Verbose event.

What could this be used for? Here are some ideas:

  • Looking for times when there are a lot of queries running simultaneously – and which, as a result, may be performing poorly.
  • Looking for long-running Query Subcube Verbose and Progress Report End events which could be optimised by the creation of aggregations.
  • Visualising the amount of parallelism inside the Storage Engine, in particular the number of Progress Report End events that are running in parallel. This would be very interesting for queries using distinct count measures when you are testing different ways of partitioning your measure group.
  • Highlighting situations where calculations are being evaluated in cell-by-cell mode. When this happens you typically see a very large number of Query Subcube Verbose events being fired off within a query.

I’d like to stress once again that the object of this exercise is not to show off a ‘finished’ tool, but to show how Power Query, Power Pivot and Excel can be used for self-service analysis of this data. This workbook is just a starting point: if you wanted to use this on your own data it’s extremely likely you’d need to change the Power Query queries, the Power Pivot model and the report itself. Hopefully, though, this workbook will save you a lot of time if you do need to understand what’s going on in the Storage Engine when you run an MDX query.

SSAS Multidimensional Cube Design Video Training

I’ve been teaching my SSAS Cube Design training course for several years now (there are still a few places free for the London course next month if you’re interested) and I have now recorded a video training version of it for Project Botticelli.

The main page for the course is here:

https://projectbotticelli.com/cubes?pk_campaign=tt2015cwb

There’s also a free, short video on using the SSAS Deployment Wizard that you can see here:

https://projectbotticelli.com/knowledge/using-deployment-wizard-ssas-cube-design-video-tutorial?pk_campaign=tt2015cwb

clip_image001

If you register before the end of March using the code TECHNITRAIN2015MARCH you’ll get a 15% discount.

Optimising SSAS Many-To-Many Relationships By Adding Redundant Dimensions

The most elegant way of modelling your SSAS cube doesn’t always give you the best query performance. Here’s a trick I used recently to improve the performance of a many-to-many relationship going through a large fact dimension and large intermediate measure group…

Consider the following cube, built from the Adventure Works DW database and showing a many-to-many relationship:

image

The Fact Internet Sales measure group contains sales data; the Product, Date and Customer dimensions are what you would expect; Sales Order is a fact dimension with one member for each sales transaction and therefore one member for each row in the fact table that Fact Internet Sales is built from. Each Sales Order can be associated with zero to many Sales Reasons, and the Sales Reason dimension has a many-to-many relationship with the Fact Internet Sales measure group through the Fact Internet Sales Reason measure group. Only the Sales Order dimension connects directly to both the Fact Internet Sales Reason and Fact Internet Sales measure groups.

There’s nothing obviously wrong with the way this is modelled – it works and returns the correct figures – and the following query shows how the presence of the many-to-many relationship means you can see the Sales Amount measure (from the Fact Internet Sales measure group) broken down by Sales Reason:

select
{[Measures].[Sales Amount]} on 0,
non empty
[Sales Reason].[Sales Reason].[Sales Reason].members
on 1
from m2m1
where([Date].[Calendar Year].&[2003], 
[Product].[Product Category].&[3],
[Customer].[Country].&[United Kingdom])

 

image

However, to understand how we can improve the performance of a many-to-many relationship you have to understand how SSAS resolves the query internally. At a very basic level, in this query, SSAS starts with all of the Sales Reasons and then, for each one, finds the list of Sales Orders associated with it by querying the Fact Sales Reason measure group. Once it has the list of Sales Orders for each Sales Reason, it queries the Fact Internet Sales measure group (which is also filtered by the Year 2003, the Product Category Clothing and the Customer Country UK) and sums up the value of Sales Amount for those Sales Orders, getting a single value for each Sales Reason. A Profiler trace shows this very clearly:

image

The Resource Usage event gives the following statistics for this query:

READS, 7

READ_KB, 411

WRITES, 0

WRITE_KB, 0

CPU_TIME_MS, 15

ROWS_SCANNED, 87299

ROWS_RETURNED, 129466

Given that the Sales Order dimension is a large one (in this case around 60000 members – and large fact dimensions are quite common with many-to-many relationships) it’s likely that one Sales Reason will be associated with thousands of Sales Orders, and therefore SSAS will have to do a lot of work to resolve the relationship.

In this case, the optimisation comes with the realisation that in this case we can add the other dimensions present in the cube to the Fact Sales Reason measure group to try to reduce the number of Sales Orders that each Sales Reason is resolved to. Since Sales Order is a fact dimension, with one member for each sales transaction, then since each sales transaction also has a Date, a Product and a Customer associated with it we can add the keys for these dimensions to the fact table on which Fact Sales Reasons is built and join these dimensions to it directly:

image

This is not an assumption you can make for all many-to-many relationships, for sure, but it’s certainly true for a significant proportion.

The Product, Date and Customer dimensions don’t need to be present for the many-to-many relationship to work, but adding a Regular relationship between them and Fact Internet Sales Reason helps SSAS speed up the resolution of the many-to-many relationship when they are used in a query. This is because in the original design, in the test query the selection of a single member on Sales Reason becomes a selection on all of the Sales Orders that have ever been associated with that Sales Reason; with the new design, the selection of a single member on Sales Reason becomes a selection on a combination of Dates, Customers, Products and Sales Orders – and since the query itself is also applying a slice on Date, Customer and Product, this is a much smaller selection than before. For the query shown above, with the new design, the Resource Usage event now shows:

READS, 11

READ_KB, 394

WRITES, 0

WRITE_KB, 0

CPU_TIME_MS, 0

ROWS_SCANNED, 47872

ROWS_RETURNED, 1418

The much lower numbers for ROWS_SCANNED and ROWS_RETURNED shows that the Storage Engine is doing a lot less work. For the amount of data in Adventure Works the difference in query performance is negligible, but in the real world I’ve seen this optimisation make a massive difference to performance, resulting in queries running up to 15 times faster.

Don’t forget that there are many other ways of optimising many-to-many relationships such as the those described in this white paper. Also, if you have a large fact dimension, if it does not need to be visible to the end user and is only needed to make the many-to-many relationship work, you can reduce the overhead of processing it by breaking it up into multiple smaller dimensions as described here.