Finding Out (Approximately) How Long A Calculation Contributes To The Duration Of An MDX Query

In my last two blog posts (see here and here) I showed how to use the Calculation Evaluation and Calculation Evaluation Detailed Information trace events to work out which MDX calculations are evaluated when a query runs in Analysis Services Multidimensional. That’s very useful, but wouldn’t it be great if you could work out how long any single calculation contributes to the overall duration of a query? If you could, it would make performance tuning MDX calculations much easier.

While you can’t get an exact amount of time taken for each calculation, the good news is that it is possible to get a duration rounded to the next second if your calculation is evaluated in bulk mode.

Take a look at the following query:

WITH

MEMBER MEASURES.DAYRANK AS
RANK(
[Date].[Date].CURRENTMEMBER, 
[Date].[Date].[Date].MEMBERS)-1

MEMBER MEASURES.HADSALE AS
IIF(
[Measures].[Internet Sales Amount]=0,
NULL,
MEASURES.DAYRANK)

MEMBER MEASURES.LASTSALERANK AS
MAX(
NULL:[Date].[Date].CURRENTMEMBER, 
MEASURES.HADSALE)

MEMBER MEASURES.LASTSALE AS
([Measures].[Internet Sales Amount], 
[Date].[Date].[Date].MEMBERS.ITEM(MEASURES.LASTSALERANK))

MEMBER MEASURES.SIMPLECALC AS
[Measures].[Internet Sales Amount] * 2

SELECT 
HEAD([Customer].[Customer].[Customer].MEMBERS, 200)
*
{MEASURES.SIMPLECALC, MEASURES.LASTSALE}
ON 0,
[Date].[Date].[Date].MEMBERS
ON 1
FROM
[Adventure Works]

This query contains five calculated measures: the first four in the WITH clause, DAYRANK, HADSALE, LASTSALERANK and LASTSALE, are based on my approach for finding the last ever non-empty value for a measure across time; the final measure, SIMPLECALC, is as the name suggests a very simple calculation. On my laptop this query takes around 13 seconds to run on my laptop, on a warm Storage Engine cache. Why does it take so long? It’s clearly the calculations that are the problem, but which one(s)?

Luckily all of the calculations in this query are evaluated in bulk mode so, as I discussed in my last two posts, there is an event raised with:

Event Class = Calculation Evaluation Detailed Information

Event Subclass = 107 – RunEvalNode Finished Calculating Item

…for each of the calculations when they are evaluated. Unfortunately the Duration column for this event always shows 0, but there is a way to see approximately how long the calculation took by comparing the Start Time and Current Time columns in the trace.

The 107 – RunEvalNode event for the measure SIMPLECALC shows the same time for the Start Time and Current Time columns:

image

This indicates that the SIMPLECALC calculation is evaluated in under a second.

However, sequence of 107 – RunEvalNode events for the LASTSALE calculation shows something different:

image

There’s a gap of 7 seconds between the StartTime and the CurrentTime, and this indicates that the calculation took 7 seconds to evaluate. It’s a bit frustrating that there isn’t a way to get a more accurate duration here, but it’s still very clear which calculation is taking all the time. Even though the time for calculating LASTSALE includes the time taken for calculating LASTSALERANK, HADSALE and DAYRANK (all of which need to be calculated in order to calculation LASTSALE), the equivalent rows in the trace for these other calculations show they took under a second each. It’s only the logic inside LASTSALE itself that is slow – so that’s where any tuning needs to take place. Indeed, modifying the query to return LASTSALERANK instead of LASTSALE makes the query faster by around 6 seconds, supporting this conclusion.

If you’re curious about what the other 6 seconds of the query execution time is taken up by, it seems like it’s serialisation of the results – something I blogged about here. The query returns a cellset with 400*1190=476000 cells in, and SSAS doesn’t cope well with queries that return a large amount of data.

Finding Out Which MDX Calculations Are Being Evaluated By Your Query In Analysis Services Multidimensional, Part 2

In part 1 of this series I showed how you can use Profiler to find out which MDX calculations are being evaluated when a query runs on SSAS Multidimensional. In this post I’ll show a practical example of why this is so useful: a situation where SSAS evaluates a calculation that isn’t needed by a query.

Do you have a Date Tool dimension (also known as a Shell dimension or Time Utility dimension) in your cube? A lot of enterprise-level SSAS cubes use this technique to allow you to write a calculation once and have it apply to multiple measures. There are two main approaches to implementing Date Tool dimensions:

  • You can create a dimension with one hierarchy and one real member and then use calculated members for your calculations, or
  • You can create a dimension with one hierarchy and as many real members as you need calculations, and then use SCOPE statements on these members for your calculations

The second approach, described in detail in this article, is very popular but over the years I have seen several cases where customers of mine who use it have suffered from unexplained query performance problems, problems that have been solved by using the calculated member approach instead. It turns out that the Calculation Evaluation and Calculation Evaluation Detailed Information Profiler events can shed some light on the causes of these problems.

Here’s a simple test cube with a Date Tool dimension that has three real members on it:

image

Here’s the contents of the MDX Script, copied from the Calculations tab in the Cube Editor in SSDT:

CALCULATE;

SCOPE([Date Calc].[Date Calc].&[2 PPG]);
    THIS = ([Date Calc].[Date Calc].&[1 Value], 
            [Date].[Calendar].CURRENTMEMBER.PREVMEMBER);
END SCOPE;

SCOPE([Date Calc].[Date Calc].&[3 YTD]);
    THIS = AGGREGATE(
            YTD([Date].[Calendar].CURRENTMEMBER), 
            [Date Calc].[Date Calc].&[1 Value]);
END SCOPE;

As you can see, two of the members on the [Date Calc] dimension are overwritten by scoped assignments: [2 PPG] is overwritten with a previous period growth calculation and [3 YTD] is overwritten by a year-to-date calculation.

Here’s a query that includes a calculated measure defined in the WITH clause and returns two out of three of the members on the [Date Calc] dimension – but does not return the [3 YTD] calculation:

WITH
MEMBER [Measures].QueryCalc AS  
[Measures].[Sales Amount] + 1

SELECT
{
[Measures].[Sales Amount],
[Measures].QueryCalc
}
*
{
[Date Calc].[Date Calc].&[1 Value], 
[Date Calc].[Date Calc].&[2 PPG]
}
ON 0,
[Date].[Calendar].[Month].MEMBERS 
ON 1
FROM
TEST

image

Running a Profiler trace as described in my previous post reveals that when this query is run, not only are the [Query Calc] and [2 PPG] calculations evaluated, but [3 YTD] is evaluated too:

image

It’s worth pointing out that this query was constructed deliberately to show a scenario where SSAS does decide to evaluate the [3 YTD] calculation, but in other cases it may decide otherwise. The reason it decides to do so here is due to a number of factors, including prefetching – see Jeffrey’s blog post here and the section on “Unexpected partition scans” here for some background information on this topic. Remember that in most cases prefetching is a good thing and is beneficial for performance, so if you see something like this happening in your cube you need to be sure that it’s actually causing you a performance problem before you try to prevent it.

If this is a problem for you there are a few things you can do. Rewriting your query to use subselects (if you have control over the MDX query that is being used) is one option:

WITH
MEMBER [Measures].QueryCalc AS  
[Measures].[Sales Amount] + 1

SELECT
{
[Measures].[Sales Amount],
[Measures].QueryCalc
}
*
{
[Date Calc].[Date Calc].MEMBERS
}
ON 0,
[Date].[Calendar].[Month].MEMBERS 
ON 1
FROM
(SELECT 
{
[Date Calc].[Date Calc].&[1 Value], 
[Date Calc].[Date Calc].&[2 PPG]
}
ON 0
FROM
TEST)
CELL PROPERTIES VALUE

Using the following connection string properties also works, because it turns off prefetching:

disable prefetch facts=true; cache ratio=1

…but as I said, this might hurt query performance in other ways.

Finally, as I said, using calculated members on your Date Tool dimension instead of the real members/scope statements approach will also work too. In my opinion this is the best solution since the problems with calculated member selection in Excel that caused problems for the calculated member Date Tool approach in the past were fixed a long time ago, and it will work even if you can’t change how your MDX queries are generated.

Finding Out Which MDX Calculations Are Being Evaluated By Your Query In Analysis Services Multidimensional, Part 1

Since Analysis Services 2012 there have been two trace events that provide a lot of information about what’s going on in the Formula Engine when you run a query in Analysis Services Multidimensional: Calculation Evaluation and Calculation Evaluation Detailed Information. The problem is that they are not properly documented anywhere and they provide so much information that it’s difficult to interpret what they are telling you. This post on Thomas Ivarrsson’s blog (which I strongly advise you to read before you carry on) with information provided by Akshai Mirchandani of the dev team  is the only place that has any details about them and unfortunately it’s by no means comprehensive.

I don’t have the knowledge to provide a full description of these two trace events, so instead in this series of posts I want to do something less ambitious but hopefully still useful: show how you can use them to find out which MDX calculations are being evaluated when you run a query, which is of course going to be useful if you are trying to tune that query. It’s not always as easy as you might think to work out which calculations are referenced by a query: for example financial cubes often have hundreds of calculated members and/or scoped assignments, many of which are dependent on other calculations.

Here’s a super-simple example to start off with. Imagine you have a cube with just one regular measure, Sales Amount, and just one calculated measure with the following definition:

CREATE MEMBER 
CURRENTCUBE.MEASURES.[Sales Forecast] AS
[Measures].[Sales Amount] * 2;

Now, consider the following query:

WITH
MEMBER MEASURES.X as 123
SELECT
{[Measures].[Sales Forecast]}
ON 0
,
[Date].[Date].[Date].MEMBERS
ON 1
FROM
[test]
CELL PROPERTIES VALUE

image

The query returns the Sales Forecast calculated measure on columns and every member on the Date level of the Date hierarchy on rows – so not all that interesting. However there are two things to point out:

  • The WITH clause has a calculated measure that isn’t used in the query. The reason I’ve put this in the query is to stop the Formula Engine from caching the results of any MDX calculations for longer than the lifetime of the query (see here for more details); it doesn’t affect the Storage Engine cache however. This means that every time the query is run you know that all the calculations will be evaluated and that you’ll be able to see any related activity in Profiler, and that you can run the query on a warm Storage Engine cache and won’t see many Storage Engine-related events.
  • The CELL PROPERTIES clause only returns the VALUE property and not the FORMAT_STRING property which is normally returned as well. This reduces the number of Calculation Evaluation events that are raised in Profiler when the query runs and makes it easier to see the important information.

With a Profiler trace that includes the Calculation Evaluation and Calculation Evaluation Detailed Information events, when you run the query above you’ll see this:

image

There are a lot of events generated in the trace even for this simple query, but the important thing to look for is the line highlighted in the screenshot above: a Calculation Evaluation Detailed Information event with the following event subclass:

107 – RunEvalNode Finished Calculating Item

Any time you see this event you know that a calculation has been evaluated in bulk mode for a subcube (ie an area of cells) in your cube. You may see more than one RunEvalNode event for the same calculation in the same query if it was evaluated for more than one subcube.

The contents of the TextData column (which is displayed in the lower half of the screen in Profiler) for the RunEvalNode event in the trace shown above are as follows:

image

As you can see, it tells you the MDX expression that has been evaluated for the subcube. It also tells you the name of the calculated measure, but it’s the MDX expression that’s important here because scoped assignments that overlap with a single calculated measure could mean that many different MDX expressions must be evaluated for that calculated measure.

Now for the bad news: you won’t see a RunEvalNode event for any calculations that are evaluated in cell-by-cell mode. You probably know that inefficient or badly-written calculations are often evaluated in cell-by-cell mode, which is usually slower than bulk mode, but there are cases where the Formula Engine evaluates a perfectly good calculation in cell-by-cell mode because it’s the right thing to do. For example, take a look at the following query:

WITH
MEMBER MEASURES.X as 123
SELECT
{[Measures].[Sales Forecast]}
ON 0
FROM
[test]
CELL PROPERTIES VALUE

 

image

It’s basically the same query as the one above but with the Rows axis removed, so it only returns a single cell. In Profiler you won’t see a RunEvalNode event because in this case the Sales Forecast calculation is evaluated in cell-by-cell mode.

That said you will see other events relating to the evaluation node for the Sales Forecast calculation, such as the Calculation Evaluation event shown here, the last for this node (NodeIndex=0, the same value that is shown in the IntegerData column) in the trace:

image

Notice also the LazyEvaluation tag which is 1, which indicates a calculation that is evaluated in cell-by-cell mode.

So, to sum up, there are two ways to see which calculations are referenced by your query. With a Profiler trace and that includes the Calculation Evaluation and Calculation Evaluation Detailed Information events:

  1. If your calculation is evaluated in bulk mode you will see a Calculation Evaluation Detailed Information event with the Event Subclass 107 – RunEvalNode Finished Calculating Item.
  2. If your calculation is evaluated in cell-by-cell mode you will see Calculation Evaluation events for the Init-Build-Prepare stages of the evaluation node.

In the next post in this series I’ll look at a more complex scenario that shows some unexpected behaviour by SSAS.

[I am extremely grateful to Akshai Mirchandani for answering a lot of questions relating to this topic. If you want to learn more about the internals of the Formula Engine there are two other useful resources: this post by Jeffery Wang, also of the dev team, and chapter 29 of the book “Microsoft SQL Server 2008 Analysis Services Unleashed”]

More Detail On Detail Rows Expressions In SSAS Tabular V.Next

My second-favourite feature in SSAS Tabular v.next after Power Query integration is the Detail Rows expression property for measures – it not only brings drillthrough on measures to Tabular, it means that we can define meaningful drillthrough on any measure, no matter how it is calculated. There’s a basic description of the functionality in this blog post but I thought it would be useful to walk through a simple example showing how it can be used.

Consider a simple SSAS Tabular model with two tables in it. First, a table containing sales data called Sales:

image

Second, a date table called Date:

image

[Note: dates are in DD/MM/YYYY format]

There are two measures with the following definitions:

Total Sales:=SUM(Sales[Sales])

Total YTD Sales:=TOTALYTD([Total Sales], 'Date'[Date])

The measure Total YTD Sales gives the running total of sales from the beginning of the current year. I know it doesn’t follow Marco and Alberto’s best-practice pattern but I wanted to keep things simple on the DAX front…

Browsing the model in an Excel PivotTable gives the following result:

image

At this point if you double-click on cell C6 in the PivotTable you get the following, not very useful result, on a new worksheet:

image

What drillthrough does in SSAS Multidimensional, and what the new Detail Rows Expression property in SSAS Tabular v.next does, is allow an end user to see the detail-level data (usually the rows in the fact table) that was aggregated to give the value the user clicked on in the original PivotTable.

For the Total Sales measure, this property can be set with a DAX expression that returns a table something like this:

SELECTCOLUMNS(
	'Sales', 
	"Date", 'Sales'[Date], 
	"Sales Value", [Total Sales]
)

[For more details on the SelectColumns() function, see here]

image

Now when you click on cell C6 in the PivotTable you get the result of the table expression above filtered by the context of the cell you’ve clicked on – in this case, the date 4/1/2017. What appears in the new worksheet is data from the row from the Sales table for 4/1/2017:

image

This is already better than SSAS Multidimensional drillthrough because as a developer you have control over the column headers displayed in this table (in Multidimensional drillthrough the column names come out in a ridiculously user-unfriendly format) and the order that they are displayed in (which is equally painful to control in Multidimensional).

Now, consider cell D6 in the PivotTable, the cell that shows the year-to-date sales amount for 4/1/2017. If a user double-clicked on this cell they would expect to see all of the rows from the Sales table from 1/1/2017 to 4/1/2017, the rows whose sales have been aggregate to give the YTD total.

This can be achieved using the following expression in the Detail Rows Expression for the Total YTD Sales measure:

CALCULATETABLE(
	SELECTCOLUMNS(
		'Sales', 
		"Date", 
		'Sales'[Date], 
		"Sales Value", 
		[Total Sales]
	), 
	DATESYTD('Date'[Date])
)

image

Now, double-clicking on cell D6 in the PivotTable gives the following table:

image

It may not look all that impressive, but there are few words that can describe how happy this makes me feel. This is exactly what is not possible with drillthrough in SSAS Multidimensional, and why drillthrough in Multidimensional has always been so frustrating to use. It’s great to see the feature properly implemented in Tabular.

The Show Hidden Cubes SSAS Connection String Property

If you need to write queries in SQL Server Management Studio against an SSAS cube that has its Visible property set to false, you have a problem because when you connect to your database you can’t see the cube or its metadata! Luckily you can override this and make the cube visible by setting the following connection string property when you connect:

Show Hidden Cubes=true

image

Connection string properties can be set in the Additional Connection Parameters tab of the connection dialog that appears when you open a new MDX query window.

Unfortunately this doesn’t make any objects in the cube that are not visible, like measures or dimensions, visible again – it just makes the cube itself visible. However, if you’re working on the Calculations tab of the Cube Editor in SSDT it is possible to make all hidden objects visible as I show here.

First Thoughts On The Integration Of Power Query/M Into Analysis Services Tabular

Last Friday’s big news was the release of the first CTP for Analysis Services v.next. Among several major new pieces of functionality (Ragged hierarchies! Drillthrough that works properly, even for calculations! Table-level security!) probably the biggest is the integration of Power Query/M into Analysis Services. As you can probably guess, I’m incredibly pleased that my two favourite technologies have got together. The technical details are given in this blog post, which I suggest you read if you haven’t done so already, but what I think is missing is an explanation of why this is so important and what kind of opportunities it opens up – hence this post. Of course this is just my take on the subject and not what Microsoft may actually thinking; it’s also very early days, so as the functionality develops and I have more chance to think about this my opinions may change. If you have any ideas on this subject I would be interested to hear them so please leave a comment!

Why this had to happen: Power BI

There is an obvious reason why Microsoft decided to integrate Power Query/M into SSAS, and that is because it needs to support the conversion of Power BI models into Analysis Services Tabular models. There are two scenarios where this will be necessary.

The first is the ability to convert a Power BI model into an Azure Analysis Services Tabular model (listed as ‘planned’ here), something that will be a key selling point for Azure Analysis Services when it releases. The engine behind Power BI is essentially the same as the one used in Analysis Services so migrating the data model should be straightforward, but since Power BI uses Power Query/M to load data then a migrated Azure Analysis Services model will also have to use Power Query/M.

The second scenario is similar to the first. We now know that on-premises Power BI will be delivered through Reporting Services v.next, and it’s reasonable to assume Reporting Services will need a database engine to store the data for published Power BI reports. That engine will have to be an Analysis Services instance of some kind (either standalone or running in-process inside Reporting Services) and again for that to work Analysis Services will have to support the same data access mechanisms as Power BI.

Better support for a larger number of data sources

I’ve just argued why Microsoft was obliged to include this functionality in SSAS v.next but in fact there are many positive reasons for doing this too. The most obvious one is to do with support for more data sources. At the moment SSAS Tabular supports a pretty good range of data sources, but the world of BI is getting more and more diverse and in order to stay relevant SSAS needs to support far more than it does today. By using Power Query/M as its data access mechanism, SSAS v.next will immediately support a much larger number of data sources and this number is going to keep on growing: any investment that Microsoft or third parties make for Power BI in this area will also benefit SSAS. Also, because Power Query/M can query and fold to more than just relational databases, I suspect that in the future this will allow for DirectQuery connections to many of these non-relational data sources too.

Different data sources for partitions in the same table

Another benefit of this change is that we’ll have a lot more flexibility with partitioning tables in an SSAS Tabular model. As the blog post says:

As long as a partition’s M query adheres to the column mappings of the table, you are free to perform any transformations and pull in data from any data source defined in the model.

In SSAS 2016 the partitions in a table all have to get data from the same data source whereas in v.next we’ll be able to get data from different data sources in different partitions, and this opens up some interesting new possibilities. For example, I can imagine a simple budgeting application where the partitions in a table get data from different Excel workbooks stored in OneDrive for Business, and where the each partition gets processed automatically when changes are saved to one of these workbooks.

Does this replace SSIS and my data warehouse? 

The short answer is no. Power Query/M is not a full-featured ETL tool and I don’t think it ever will be; it certainly does not have the kind of functionality needed to perform enterprise-level ETL right now. My view is that Microsoft have built Power Query/M into SSAS for the reasons above and not to encourage enterprise SSAS users to do their own quick-and-dirty ETL when loading data (although there is a risk that that will happen anyway). That said, I think the dividing line between corporate and self-service BI will become increasingly blurred over the next few years as the Microsoft BI stack develops, and we’ll see Analysis Services being used in self-service scenarios as well as the more traditional corporate ones.

Centralised data source objects

One last thing to point out is that the way SSAS v.next makes a distinction between data sources and other queries is very interesting. In Power BI and Power Query it’s easy to end up with data source connection information duplicated across multiple queries unless you know what you’re doing, and this can cause no end of problems later on in a project. As far as I can see, in SSAS v.next a “data source object” is an M query that only contains the connection to external data, while all other queries have to reference a data source to be able to access external data. This means, as the blog post says:

Referring to data source objects helps to centralize data source settings for multiple queries and simplifies deployments and maintenance if data source definitions must be updated later on. When updating a data source definition, all M queries that refer to it automatically use the new settings.

I wonder whether this concept is coming to Power BI and Power Query at some point? I hope so – it makes a lot of sense.

SSAS Documentation With Squarehouse Commentary

A few months ago I found out about a new tool for managing and viewing Analysis Services Multidimensional documentation called Squarehouse Commentary. I’ve now had the chance to play around with it and talk to the company that makes it and I have to say I’m impressed: in my opinion it’s by far the best tool of its type that I’ve seen. Rather than generating a large pile of documentation that no-one will read and which will get left, forgotten on a fileshare or a SharePoint site, it allows you to synch the descriptions created in its own web application with the description properties in your SSAS cubes and dimensions. It also automatically generates actions so that end users can easily view rich web-based documentation for hierarchies and measures direct from whatever client tool they are using in a few clicks.

This video does a good job of showing off the main features:

Overall, definitely something to check out. It doesn’t support Tabular models yet, but I’m told that this is in the planning stages.