Monitoring SSAS Multidimensional MDX Query Results Serialisation, Part 2

In part 1 of this series I looked at the basics of monitoring SSAS Multidimensional query resultset serialisation in Profiler. In this post, I’ll be taking a look at what happens for queries that return large amounts of data

Consider the following query on the Adventure Works DW database, which, when I run it in SQL Server Management Studio returns a cellset with 60391 rows:

SELECT
{[Measures].[Internet Order Quantity], 
[Measures].[Internet Sales Count]}
ON 0,
NON EMPTY
[Date].[Date].[Date].MEMBERS
*
[Customer].[Customer].[Customer].MEMBERS
*
[Product].[Product].[Product].MEMBERS
ON 1
FROM
[Adventure Works]

image

There are a couple of interesting things to note about this query. First, SQL Server Management Studio on my laptop says that it takes nine seconds to run, even on a warm cache; the Duration column for the Query End event in Profiler, however, shows a value of around six seconds. The three second difference must be the time it takes for SSAS to return the cellset to SQL Server Management Studio, and for SQL Server Management Studio to render the results (my guess is that it’s the second operation that takes the majority of this time – other client tools may be more efficient at rendering large resultsets).

Secondly, in Profiler, you’ll see a much larger number of Serialize Results Current events. In situations where an axis contains more than a thousand tuples, or a cellset contains more than a thousand cells, you’ll see one Serialize Results Current event for each thousand tuples or cells. The ProgressTotal column will show values incrementing by one thousand up to the total number of tuples or cells. So, here’s some of what Profiler shows for the serialisation of the Rows axis:

image

…and here’s the end of the trace, showing the end of the serialisation of the cells (60391 rows * 2 columns = 120782 cells):

image

The third thing to notice is that there is only one Storage Engine operation – shown by the Query Subcube Verbose event in the first Profiler screenshot above – and that hits the Storage Engine cache and is so quick the Duration column shows 0 ms. Getting the raw data isn’t the problem here, and there aren’t any MDX calculations either – which means that it’s the Non Empty filter and construction of the cellset that is taking all the time. Since neither of these operations can be cached (although you can play tricks like this), this explains why the query always takes six seconds to run, even on a warm cache. Further investigation reveals that the Non Empty filter in fact only takes about a quarter of a second, so it’s the construction of a large cellset that’s the real problem here. This is why I say you should always avoid queries that return large amounts of data! SSAS is not very good at returning large resultsets.

Incidentally, don’t fall into the trap of thinking that the values shown in the Duration column for the Serialize Results End event only represents the amount of time taken to construct the cellset. It shows the amount of time since the Serialize Results Begin event, and in between the Begin and End events all kinds of other things necessary for the query to return (such as the evaluation of MDX calculations) could be going on. In a lot of cases the Serialize Results End event shows a duration that is almost the same as the duration for the whole query, but that only means that serialisation was able to start soon after the query began. In order to find the overhead of serialisation you need to work out how long all these other things take and subtract that from the duration shown for Serialize Results End, and that’s easier said than done.

Finally, what can you do to improve performance? Well, in the first post in this series I showed there was a tabular alternative to a cellset, and this is certainly a lot more efficient at returning large amounts of data (although you probably won’t have a choice in this unless you are building your own client tool, and, SSRS uses the tabular format anyway). For this query a tabular resultset is almost two seconds faster to return than a cellset, at just over four seconds:

image

There’s another important technique you can use, once that I have already mentioned in a blog post a couple of years ago but which is worth mentioning again: each cell returned by this query returns a large number of properties that you may not need, and these extra properties have a significant effect on the size of the resultset. Adding a CELL PROPERTIES clause to the query so that you only return the value property, like so:

SELECT
{[Measures].[Internet Order Quantity], 
[Measures].[Internet Sales Count]}
ON 0,
NON EMPTY
[Date].[Date].[Date].MEMBERS
*
[Customer].[Customer].[Customer].MEMBERS
*
[Product].[Product].[Product].MEMBERS
ON 1
FROM
[Adventure Works]
CELL PROPERTIES VALUE

…takes another two seconds off the duration of the query, whether you use a tabular resultset or a cellset:

image

Monitoring SSAS Multidimensional MDX Query Results Serialisation, Part 1

Profiler (or indeed XEvents) can tell you a lot of interesting things about what happens when SSAS Multidimensional serialises the resultset returned by an MDX query. To be honest, this isn’t something I’ve looked at in detail before but recently I decided to do some research in this area – it turns out that monitoring the Profiler events related to serialisation can be very useful when you’re trying to understand what a Profiler trace is telling you about query execution as a whole.

To start off, let’s look at some very simple examples. Consider the following MDX query on the Adventure Works DW database:

SELECT
{[Measures].[Internet Order Quantity],
[Measures].[Internet Sales Count]}
ON 0,
[Date].[Calendar Year].[Calendar Year].MEMBERS
ON 1
FROM
[Adventure Works]

If you run it in SQL Server Management Studio in an MDX query window, you’ll get the following cellset back:

image

Running a Profiler trace using, amongst others, the Serialize Results Begin, Serialize Results Current and Serialize Results End events shows how SSAS is constructing the cellset returned:

image

The Serialize Results Begin event marks the point where SSAS starts to construct the cellset returned. The Serialize Results Current events that immediately follow it, with EventSubclass “1 – Serialize Axes” show SSAS serialising the tuples that are present on the Columns axis (listed as Axis 0 in the TextData column), the Rows axis (Axis 1) and the Where clause (Slicer Axis). The numeric values in the ProgressTotal column for the Serialize Results Current events shows the number of tuples on each axis: the two tuples on columns are the two measures, the six tuples on rows are the six years, and there’s one tuple on the slicer. After that SSAS gets the data for each of the cell values (as shown by the Query Subcube Verbose event – note that this query is running on a warm cache) and there is then a Serialize Results Current event with EventSubclass “2 – Serialize Cells”; the ProgressTotal column shows that twelve cells (2 columns * 6 rows) in total were returned.  The Serialize Results End event shows that SSAS has finished constructing the cellset and not surprisingly it’s followed immediately by the Query Cube End and Query End events.

Sometimes SSAS needs to do a bit more work to find out what tuples are on an axis before serialisation can begin. The following query adds a NON EMPTY to the rows axis of the query above, so that only the rows that have values are returned:

SELECT
{[Measures].[Internet Order Quantity],
[Measures].[Internet Sales Count]}
ON 0,
NON EMPTY
[Date].[Calendar Year].[Calendar Year].MEMBERS
ON 1
FROM
[Adventure Works]

image

In order to know which years will appear on rows it has to query the cube to find out which ones have values for the measures on columns; as a result the Profiler trace shows a Query Subcube Verbose event appearing now before Serialize Results Begin:

image

This trace deliberately doesn’t show any of events related to the NON EMPTY operation (something I’ll be writing about in future blog posts), but it looks like SSAS can work out which years have values and get the values needed for the cellset back in a single operation so there’s no need for another Query Subcube Verbose event before the Serialize Cells event. Note that now the empty rows have been excluded, the ProgressTotal shows that only four tuples are returned on rows and eight cells are returned overall.

Cellsets are not the only type of resultset that SSAS can return though. Some client tools, most notably Reporting Services, return a tabular dataset that doesn’t have the concept of axes; running the previous query in SSRS gives the following in Profiler:

image

As you can there is now only one Serialize Results Current event and it has the EventSubclass “3 – Serialize SQL Rowset” (even though this is still an MDX query); the ProgressTotal column shows four, which is the number of rows returned.

In Part 2, I’ll look at what happens when a query returns a much larger amount of data.

The Table.Schema() Function In Power BI/M

Yet another new M function for you this week: the Table.Schema() function, which returns information about the columns in a table. There’s some fairly detailed documentation about what it returns here; a simple demo is always a lot more helpful though, I think.

If you connect to the Adventure Works DW database in SQL Server and import the DimDate table, you’ll get an M query that returns the contents of that table (along with some extra columns that describe the relationships between that table and the others in the database):

let
    Source = Sql.Database("localhost", "adventure works dw"),
    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data]
in
    dbo_DimDate

image

If you add an extra step to this query that calls the Table.Schema() function on the table returned by the dbo_DimDate step, like so:

let
    Source = Sql.Database("localhost", "adventure works dw"),
    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data],
    GetSchema = Table.Schema(dbo_DimDate)
in
    GetSchema 

…you get a table with one row for each column in the table returned by dbo_DimDate, and a lot of columns that give you information on each column such as its position in the table, its M data type, its data type in the original data source and so on:

image 

I can think of a lot of uses for this. Documenting a database is an obvious one; it would also serve as a much richer source of data when checking for changes in the structure of a data source, as I described here. Also, given how easy it is to tell whether two tables contain the same data in M, you could use this function to compare whether two tables have the same columns like so:

let
    Source = Sql.Database("localhost", "adventure works dw"),
    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data],
    SomeOtherTable = 
     Source{[Schema="dbo",Item="SomeOtherTable"]}[Data],
    TablesEquivalent = 
     (Table.Schema(dbo_DimDate)=Table.Schema(SomeOtherTable ))
in
    TablesEquivalent

If you want more detail when doing comparisons you can do that with a little bit more M code, but that’s probably a topic for another post..

Descriptive Statistics In Power BI/M With Table.Profile()

As Buck Woody notes here, when you are exploring a new data set it can be useful calculate some basic descriptive statistics. One new M function that appeared in Power BI recently can help you to do this: Table.Profile(). This function takes a value of type table and returns a table that displays, for each column in the original table, the minimum, maximum, average, standard deviation, count of values, count of null values and count of distinct values (but no mode or median?). So, given the following table:

…the Table.Profile() function returns the following table:

Of course you could create something similar yourself fairly easily (as I have done for a customer in the past), and it’s not as sophisticated as the Quick Insights feature, but it’s handy to have a single function that does all this.

You could even use it on all of the tables in a SQL Server database. Since the Sql.Database() function returns a table containing all of the tables and views in a database, like so:

image

All you need to do to use Table.Profile() on all these tables is to add a new custom column that calls this function for every value in the Data column:

image

Then finally expand the new custom column and you’ll see the stats for every column in every table:

image

Here’s the code:

let
    Source = Sql.Database("localhost", "adventure works dw"),
    #"Added Custom" = Table.AddColumn(Source, "Profile", 
      each Table.Profile([Data])),
    #"Expanded Profile" = Table.ExpandTableColumn(#"Added Custom", 
      "Profile", 
      {"Column", "Min", "Max", "Average", "StandardDeviation", "Count", "NullCount", "DistinctCount"}, 
      {"Column", "Min", "Max", "Average", "StandardDeviation", "Count", "NullCount", "DistinctCount"})
in
    #"Expanded Profile"