Running Your Own MDX And DAX Queries In Power BI Desktop

Every time there’s a new release of Power Query or Power BI Desktop, I always check to see if there are any interesting new M functions that have been added (I used #shared to do this, as detailed here). For the RTM version of Power BI Desktop I spotted two new functions:

image

As well as ODBC connections, we can now use OLEDB and ADO.NET data sources – although they aren’t shown in the UI yet. And you know what this means… with an OLEDB connection we can now run our own MDX and DAX queries against SSAS data sources! I assume this will be coming in Power Query in Excel soon too.

Here’s an example query showing how to use OleDB.Query() to run an MDX query against the Adventure Works DW cube in SSAS Multidimesional:

let
    Source = OleDb.Query(
              "Provider=MSOLAP.5;Data Source=localhost;
               Initial Catalog=Adventure Works DW 2008", 
              "select {measures.[internet sales amount]} on 0, 
               [date].[calendar].[calendar year].members on 1 
               from [adventure works]"
             )
in
    Source

As you can see, it’s pretty straightforward: you just need to supply a connection string and a query. You will need to tell Power BI Desktop which credentials to use when running the query the first time you connect to SSAS, and that’s probably going to be Windows:

image

You will also see a prompt the first time you run the query, asking for permission to run a Native Database Query:

image

This prompt will appear each time a different MDX query is run; you can turn off this prompt in the Options dialog on the Security tab by unchecking the Require user approval for new native database queries box:

image

Here’s the output of the MDX query from the example code:

image

The Use And Abuse Of The MDX Freeze Statement

The other day, while helping a customer with some particularly nasty MDX scoped assignments, I realised that there weren’t many good resources on the internet that explained how to use the MDX Freeze statement. It’s something I see used quite often, but usually because some MDX calculations aren’t giving the correct results and a developer has found that putting a Freeze statement in has fixed the problem – even if they don’t understand why it has fixed the problem. So, in this post I’ll explain what Freeze does, when you might want to use it, and when there are other other, better alternatives.

First of all, the basics. Imagine you have a super-simple cube and that, apart from the Calculate statement, the only MDX you have on the Calculations tab in the cube editor is the following:

CREATE MEMBER CURRENTCUBE.MEASURES.M1 AS 1;

CREATE MEMBER CURRENTCUBE.MEASURES.M2 AS NULL;

SCOPE(MEASURES.M2);
    THIS = MEASURES.M1;
END SCOPE;

If you query the cube in Excel, you’ll see the following:

image

No surprises here: we have created two calculated measures, M1 and M2, and then used a scoped assignment to set M2 to show the value of M1. It’s important to understand that the scope statement has not copied the value of M1 into M2, but acts more like a pointer so that M1 will always display the same value as M2 even if M1 subsequently changes. This means that when we add a second scope statement to the code that alters the value of M1, as follows:

CREATE MEMBER CURRENTCUBE.MEASURES.M1 AS 1;

CREATE MEMBER CURRENTCUBE.MEASURES.M2 AS NULL;

SCOPE(MEASURES.M2);
    THIS = MEASURES.M1;
END SCOPE;

SCOPE(MEASURES.M1);
    THIS = 2;
END SCOPE;

You see the following in your PivotTable:

image

This behaviour is the source of a lot of confusion! An assignment to one measure has indirectly changed the value of another measure, and of course in a real-world cube it can be very difficult to spot situations where this has happened and if you do, what other MDX has caused this to happen.

Each statement in the MDX Script of a cube adds an extra layer of calculations to it, called a calculation pass; this is true for all the calculations in the examples above. As new calculations are added, and new passes are created, the previous passes still exist and are still accessible. In the second example above, in the outermost calculation pass, the measure M2 returns the value 2 but at the previous calculation pass (as seen in the first example) it returned the value 1. The Freeze statement allows you to freeze the values returned by a subcube of cells at a given calculation pass, so that no future calculations will change those values.

Therefore, by taking our code and adding a Freeze statement to the first scoped assignment we can prevent the second scoped assignment changing the value of M2:

CREATE MEMBER CURRENTCUBE.MEASURES.M1 AS 1;

CREATE MEMBER CURRENTCUBE.MEASURES.M2 AS NULL;

SCOPE(MEASURES.M2);
    THIS = MEASURES.M1;
    FREEZE(THIS);
END SCOPE;

SCOPE(MEASURES.M1);
    THIS = 2;
END SCOPE;

Here’s the output now:

image

Another very common way that scoped assignments can affect the value of a cell is through the aggregation of the results of a calculation. This blog post (one of the most popular I’ve ever written) explains how this behaviour can be used to implement calculations like currency conversions and weighted averages. However, in other cases, this aggregation of a calculation is an unwanted and unexpected side effect of a scope statement and calculated values that you did want to be displayed instead get replaced with weird, meaningless values. The Freeze statement can be used to stop this happening but in actual fact it’s a much better idea to understand the cause of these problems and rewrite your calculations so that Freeze isn’t necessary.

Now, imagine that in your cube you have a regular (ie not calculated) measure called Sales Amount that has its AggregateFunction property set to Sum, and that you have a fairly standard Date dimension with a Year attribute hierarchy. A PivotTable with Sales Amount on columns and Year on rows looks like this in Excel:

image

If you add the following assignment to the cube, to change the value of the All Member on Year, the value of the Grand Total in the PivotTable (which is the All Member, even if that name isn’t shown) will be changed:

SCOPE([Date].[Year].[All], [Measures].[Sales Amount]);
    THIS = 123;
END SCOPE;

image

If, on the other hand, you remove that previous assignment and replace it with an assignment on the year 2001:

SCOPE([Date].[Year].&[2001], [Measures].[Sales Amount]);
    THIS = 456;
END SCOPE;

You’ll see that not only has the value for Sales Amount for the year 2001 changed, but that the value of the All Member has been changed too: the All Member represents the aggregated total of all the years, so therefore if a year value has changed, the All Member value must change the reflect this:

image

What happens if we try to combine the two previous scope statements?

SCOPE([Date].[Year].[All], [Measures].[Sales Amount]);
    THIS = 123;
END SCOPE;

SCOPE([Date].[Year].&[2001], [Measures].[Sales Amount]);
    THIS = 456;
END SCOPE;

In this case, the output is exactly the same as with the previous example (although the measure formatting has also been lost):

image

This is because even though the first Scope statement successfully changed the value of the All Member, the aggregation of values triggered by the second Scope overwrote this value. Although you can’t see this happening in Excel, where you only see the values returned at the final calculation pass of the cube, the MDX Script Debugger can be used to see the values returned for a query at all the different passes so you can work out what’s going on.

The Freeze statement can be used to stop the second Scope from overwriting the first, like so:

SCOPE([Date].[Year].[All], [Measures].[Sales Amount]);
    THIS = 123;
    FREEZE(THIS);
END SCOPE;

SCOPE([Date].[Year].&[2001], [Measures].[Sales Amount]);
    THIS = 456;
END SCOPE;

image

However, in my opinion it makes a lot more sense to change the order of the Scope statements so that the assignment to 2001 doesn’t overwrite the assignment to the All Member:

SCOPE([Date].[Year].&[2001], [Measures].[Sales Amount]);
    THIS = 456;
END SCOPE;

SCOPE([Date].[Year].[All], [Measures].[Sales Amount]);
    THIS = 123;
END SCOPE;

The end result is the same:

image

Why do I prefer this approach to the use of Freeze? Two reasons:

  1. It works with natural MDX behaviour rather than trying to fight against it. In this case it’s just one line of code less, but in the real world it could result in a much greater reduction. It’s true that you have to put a lot of thought into the ordering of your calculations, but I don’t think you can get away from that. Using Freeze to make your calculations work properly without understanding why it’s needed results in much more complex code, often with duplicated calculations because Freeze still doesn’t give the desired results, and is frankly a bit of a hack.
  2. There are, or at least were, performance implications with the use of Freeze. In Analysis Services 2005 I saw a few cases where the use of Freeze contributed to poor query performance, and where reordering scope statements so that it was no longer necessary made performance better. I’m not sure whether this is still the case with SSAS 2014 but it may well be.

I see Freeze abused most often in financial cubes, when scope statements are used to define calculations on a chart of accounts hierarchy. Sometimes I have even seen the same calculation code appear in several places in the same MDX Script, just to make sure that the calculations always return the right result – all because the calculations on the chart of accounts dimension are aggregating up and overwriting each other. In this case the simple rule you have to remember is to always scope the calculations on the lowest level of the hierarchy first, then scope the calculations on the second-lowest level, and so on working your way up to the top of the hierarchy. This way you can be sure that your scope will never aggregate up and overwrite the result of another calculation.

Apart from that, I also see Freeze used when a cube contains a Date Tool dimension that uses regular members instead of calculated members, in the way described here. Now there are a lot of good reasons to use regular members on a Date Tool dimension (it will work with all versions of SSAS and Excel for instance) but I have also seen a lot of cases where the fact that you are scoping calculations on regular measures, which may then get aggregated up accidentally, has caused a lot of problems – not only resulting in incorrect values appearing, but also making query performance worse. For that reason, nowadays I prefer to use calculated members on my Date Tool dimension rather than regular members.

A Closer Look At Power Query/SSAS Integration

In the November release of Power Query the most exciting new feature was the ability to connect to SSAS. I blogged about it at the time, but having used it for a month or so now I thought it was worth writing a more technical post showing how it works in more detail (since some things are not immediately obvious) as well as to see what the MDX it generates looks like.

This post was written using Power Query version 2.18.3874.242, released January 2015; some of the bugs and issues mentioned here will probably be fixed in later versions.

Connecting to SSAS

Power Query officially supports connecting to all versions of SSAS from 2008 onwards, although I’ve heard from a lot of people they have had problems getting the connection working. Certainly when I installed the version of Power Query with SSAS support in on my laptop, which has a full install of SQL Server 2014, it insisted I install the 2012 version of ADOMD.Net before it would work (I also needed to reboot). My guess is that if you’re having problems connecting you should try doing that too; ADOMD.Net 2012 is available to download in the SQL Server 2012 Feature Pack.

After clicking From Database/From SQL Server Analysis Services the following dialog appears, asking you to enter the name of the server you want to connect to.

image

If this is the first time you’re connecting to SSAS the following dialog will appear, asking you to confirm that you want to use Windows credentials to connect.

image

Unfortunately, if you’re connecting via http and need to enter a username and password you won’t be able to proceed any further. I expect this problem will be fixed soon.

Initial Selection

Once you’ve connected the Navigator pane appears on the right-hand side of the screen. Here you see all of the databases on the server you’ve connected to; expand a database and you see the cubes, and within each cube you see all of the measure groups, measures, dimensions and hierarchies.

image

The previous build of Power Query does not display any calculated measures that aren’t associated with measure groups (using the associated_measure_group property); this has been fixed in version 2.18.3874.242.

When you start to select measures and hierarchies the name of the cubes you have chosen items from will appear in the Selected items box. If you hover over the name of the cube the peek pane will appear and you’ll see a preview of the results of the query.

image

At this point you can either click the Load button to load the data either to the worksheet or the Excel Data Model, or click the Edit button to edit the query further.

You cannot specify your own MDX query to use for the query as yet.

The Query Editor

Once the Power Query Query Editor opens you’ll see the output of the query as it stands, and also on the Cube tab in the ribbon two new buttons: Add Items and Collapse Columns.

image

Here’s the MDX (captured from Profiler) showing the MDX generated for the query in the screenshot above:

select
{[Measures].[Internet Sales Amount],[Measures].[Internet Order Quantity]}
on 0,
subset(
nonempty(
[Date].[Calendar Year].[Calendar Year].allmembers
,{[Measures].[Internet Sales Amount],[Measures].[Internet Order Quantity]})
,0,50)
properties member_caption,member_unique_name 
on 1 
from [Adventure Works]

 

The MDX Subset() function is used here to ensure that the query doesn’t return more than 50 rows.

Adding Items

Clicking on the Add Items button allows you to add extra hierarchies and measures to the query. When you click the button the following dialog appears where you can choose what you want to add:

image

In this case I’ve added the Day Name hierarchy to the query, and this hierarchy appears as a new column on the right-hand edge of the query after the measures:

image

You can easily drag the column to wherever you want it though.

Here’s the MDX again:

select
{[Measures].[Internet Sales Amount],[Measures].[Internet Order Quantity]}
on 0,
subset(
nonempty(
crossjoin(
[Date].[Calendar Year].[Calendar Year].allmembers,
[Date].[Day Name].[Day Name].allmembers)
,{[Measures].[Internet Sales Amount],[Measures].[Internet Order Quantity]})
,0,50)
properties member_caption,member_unique_name 
on 1 
from [Adventure Works]

 

Collapsing Columns

Selecting the Day Name column and then clicking the Collapse Columns button simply rolls back to the previous state of the query. However, there’s more to this button than meets the eye. If you filter the Day Name column (for example, by selecting Saturday and Sunday as in the screenshot below) and then click Collapse and Remove, the filter will still be applied to the query even though the Day Name column is no longer visible.

image

Here’s what the Query Editor shows after the filter and after the Day Name column has been collapsed:

image

Compare the measure values with those shown in the original query – it’s now showing values only for Saturdays and Sundays, although that’s not really clear from the UI. Here’s the MDX generated to prove it – note the use of the subselect to do the filtering:

select
{[Measures].[Internet Sales Amount],[Measures].[Internet Order Quantity]}
on 0,
subset(
nonempty(
[Date].[Calendar Year].[Calendar Year].allmembers
,{[Measures].[Internet Sales Amount],[Measures].[Internet Order Quantity]})
,0,1000)
properties member_caption,member_unique_name 
on 1 
from(
select
({[Date].[Day Name].&[7],[Date].[Day Name].&[1]})
on 0 
from 
[Adventure Works])

 

From studying the MDX generated I can tell that certain other operations such as sorting and filtering the top n rows are folded back to SSAS.

It’s also important to realise that using the Remove option to remove a column from the query does not have the same effect as collapsing the column:

image

Using Remove just hides the column; the number of rows returned by the query remains the same.

image

User Hierarchies

In the examples above I’ve only used attribute hierarchies. User hierarchies aren’t much different – you can select either an individual level or the entire hierarchy (which is the same as selecting all of the levels of the hierarchy).

image

image

Parent-Child Hierarchies

Parent-child hierarchies work very much like user hierarchies, except that you will see some null values in columns to accommodate leaf members at different levels:

image

M Functions

There are a lot of M functions relating to cube functionality, although the documentation in the Library Specification document is fairly basic and all mention of them disappeared from the online help a month or so ago for some reason. Here’s the code for the query in the Collapsing Columns section above:

let
    Source = AnalysisServices.Databases("localhost"),
    #"Adventure Works DW 2008" = Source{[Name="Adventure Works DW 2008"]}[Data],
    #"Adventure Works1" = #"Adventure Works DW 2008"{[Id="Adventure Works"]}[Data],
    #"Adventure Works2" = #"Adventure Works1"{[Id="Adventure Works"]}[Data],
    #"Added Items" = Cube.Transform(#"Adventure Works2", {
             {Cube.AddAndExpandDimensionColumn, 
             "[Date]", {"[Date].[Calendar Year].[Calendar Year]"}, {"Date.Calendar Year"}}, 
             {Cube.AddMeasureColumn, "Internet Sales Amount", 
             "[Measures].[Internet Sales Amount]"}, 
             {Cube.AddMeasureColumn, "Internet Order Quantity", 
             "[Measures].[Internet Order Quantity]"}}),
    #"Added Items1" = Cube.Transform(#"Added Items", {
              {Cube.AddAndExpandDimensionColumn, "[Date]", 
             {"[Date].[Day Name].[Day Name]"}, {"Date.Day Name"}}}),
    #"Filtered Rows" = Table.SelectRows(#"Added Items1", each (
             Cube.AttributeMemberId([Date.Day Name]) = "[Date].[Day Name].&[7]" 
             meta [DisplayName = "Saturday"] 
             or 
             Cube.AttributeMemberId([Date.Day Name]) = "[Date].[Day Name].&[1]" 
             meta [DisplayName = "Sunday"])),
    #"Collapsed and Removed Columns" = Cube.CollapseAndRemoveColumns(
             #"Filtered Rows", 
             {"Date.Day Name"})
in
    #"Collapsed and Removed Columns"

It’s comprehensible but not exactly simple – yet another example of how difficult it is to shoe-horn multidimensional concepts into a tool that expects to work with relational data (see also SSRS). I doubt I’ll be writing any M code that uses these functions manually.

MDX Solve Order, SCOPE_ISOLATION and the Aggregate() function

Solve order in MDX is a mess. Back in the good old days of Analysis Services 2000 it was a difficult concept but at least comprehensible; unfortunately when Analysis Services 2005 was released a well-intentioned attempt at making it easier to work with in fact ended up making things much, much worse. In this post I’m going to summarise everything I know about solve order in MDX to try to make this complicated topic a little bit easier to understand.

If you’re an experienced MDXer, at this point you’ll probably lose interest because you think you know everything there is to know about solve order already. Up until two weeks ago that’s what I though too, so even if you know everything I say in the first half of this post keep reading – there’s some new stuff at the end I’ve only just found out about.

Let’s start with a super-simple cube built from a single table, with two measures (Sales Amount and Cost Amount) and a Product dimension containing a single attribute hierarchy with two members (Apples and Oranges). Everything is built from the following table:

image

Solve Order and calculated members in the WITH clause

To understand what solve order is and how it can be manipulated, let’s start off looking at an example that uses only calculated members in the WITH clause of a query. Consider the following:

WITH

 

MEMBER [Measures].[Cost %] AS

DIVIDE([Measures].[Cost Amount],[Measures].[Sales Amount]),

FORMAT_STRING='0.0%'

 

MEMBER [Product].[Product].[Total Fruit] AS

SUM({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]})

 

SELECT

{[Measures].[Sales Amount],

[Measures].[Cost Amount],

MEASURES.[Cost %]}

ON COLUMNS,

{[Product].[Product].&[Apples],

[Product].[Product].&[Oranges],

[Product].[Product].[Total Fruit]}

ON ROWS

FROM SALES

There are two calculated members here:

  • Cost % divides Cost Amount by Sales Amount to find the percentage that costs make up of the sales amount
  • Total Fruit sums up the values for Apples and Oranges

The output of the query is as follows:

image

Solve order controls the order that MDX calculations are evaluated when two or more of them overlap in the same cell. In this case Cost % and Total Fruit are both evaluated in the bottom right-hand cell; Total Fruit is calculated first, giving the values of 30 for Sales Amount and 21 for Cost Amount, and Cost % is calculated after that. The bottom right-hand cell is the only cell where these two calculations overlap and the only cell where solve order is relevant in this query.

In this case, 70% is the value you would expect to get. You, however, can control solve order for calculations in the WITH clause by setting the SOLVE_ORDER property for each calculated member, like so:

WITH

 

MEMBER [Measures].[Cost %] AS

DIVIDE([Measures].[Cost Amount],[Measures].[Sales Amount]),

FORMAT_STRING='0.0%',

SOLVE_ORDER=1

 

MEMBER [Product].[Product].[Total Fruit] AS

SUM({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]}),

SOLVE_ORDER=2

 

SELECT

{[Measures].[Sales Amount],

[Measures].[Cost Amount],

MEASURES.[Cost %]}

ON COLUMNS,

{[Product].[Product].&[Apples],

[Product].[Product].&[Oranges],

[Product].[Product].[Total Fruit]}

ON ROWS

FROM SALES

image

Now the value in the bottom right-hand corner is 135% instead of 70%: Cost % is calculated first, then Total Fruit second so 60%+75%=135%. The SOLVE_ORDER property of a calculated member is an integer value, and the lower the SOLVE_ORDER value the earlier the calculation will be evaluated, so with Cost % having a solve order of 1 and Total Fruit having a solve order of 2, this forces Cost % to be calculated first now even though in this case it gives what is clearly an ‘incorrect’ result.

Solve Order and calculated members defined on the cube

Things now get a bit more complicated. There’s a different way of controlling solve order if your calculations are defined on the cube itself: in this case, solve order is determined by the order that the calculations appear on the Calculations tab. So if the calculations tab of the Cube Editor contains the calculations in this order:

CREATE MEMBER CURRENTCUBE.[Measures].[Cost %] AS

DIVIDE([Measures].[Cost Amount],[Measures].[Sales Amount]),

FORMAT_STRING='0.0%';

 

CREATE MEMBER CURRENTCUBE.[Product].[Product].[Total Fruit] AS

SUM({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]});

image

…and you run the following query:

SELECT

{[Measures].[Sales Amount],

[Measures].[Cost Amount],

MEASURES.[Cost %]}

ON COLUMNS,

{[Product].[Product].&[Apples],

[Product].[Product].&[Oranges],

[Product].[Product].[Total Fruit]}

ON ROWS

FROM SALES

You get the incorrect result again:

image

…but if you change the order of the calculations so that Total Fruit comes first:…and rerun the same query, you get the correct results:

image

image

The SOLVE_ORDER property can also be used with calculations defined on the cube to override the effect of the order of calculations. So defining the following calculations on the cube:

CREATE MEMBER CURRENTCUBE.MEASURES.[Cost %] AS

DIVIDE([Measures].[Cost Amount], [Measures].[Sales Amount]),

FORMAT_STRING='PERCENT', SOLVE_ORDER=2;

 

CREATE MEMBER CURRENTCUBE.[Product].[Product].[Total Fruit] AS

SUM({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]}), SOLVE_ORDER=1;

image

…means that, even though Total Fruit comes after Cost % on the Calculations tab, because it has a lower solve order set using the SOLVE_ORDER property it is evaluated before Cost % and the query still returns the correct value:

image

Solve order and calculations defined in the WITH clause and on the cube

What happens if some calculations are defined on the cube, and some are defined in the WITH clause of a query? By default, calculations defined on the cube always have a lower solve order than calculations defined in the WITH clause of a query; the SOLVE_ORDER property has no effect here. So if Total Fruit is defined in the WITH clause and Cost % on the cube, you get the incorrect result:

image

WITH

 

MEMBER [Product].[Product].[Total Fruit] AS

SUM({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]})

 

SELECT

{[Measures].[Sales Amount],

[Measures].[Cost Amount],

MEASURES.[Cost %]}

ON COLUMNS,

{[Product].[Product].&[Apples],

[Product].[Product].&[Oranges],

[Product].[Product].[Total Fruit]}

ON ROWS

FROM SALES

 

image

Of course, if Total Fruit is defined on the cube and Cost % is defined in the WITH clause you will get the correct answer. However, usually measures like Cost % are defined on the cube and it’s calculations like Total Fruit, which define custom groupings, that are defined on an ad hoc basis in the WITH clause. This is a problem.

The SCOPE_ISOLATION property

This default behaviour of calculations defined on the cube always having a lower solve order than calculations in the WITH clause can be overridden using the SCOPE_ISOLATION property. Setting SCOPE_ISOLATION=CUBE for a calculated member defined in the WITH clause will give that calculated member a lower solve order than any calculations defined on the cube. So, with Cost % still defined on the cube the following query now gives the correct results:

WITH

 

MEMBER [Product].[Product].[Total Fruit] AS

SUM({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]}),

SCOPE_ISOLATION=CUBE

 

SELECT

{[Measures].[Sales Amount],

[Measures].[Cost Amount],

MEASURES.[Cost %]}

ON COLUMNS,

{[Product].[Product].&[Apples],

[Product].[Product].&[Oranges],

[Product].[Product].[Total Fruit]}

ON ROWS

FROM SALES

image

The Aggregate() function

Using the MDX Aggregate() function (and in fact also the VisualTotals() function – but you probably won’t ever want to use it) inside a calculation has a similar effect to the SCOPE_ISOLATION property in that it forces a calculation to be evaluated at a lower solve order than anything else. Therefore, in the previous example, instead of using the SCOPE_ISOLATION property you can change the calculation to use the Aggregate() function instead of Sum() and get the correct results:

WITH

 

MEMBER [Product].[Product].[Total Fruit] AS

AGGREGATE({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]})

 

SELECT

{[Measures].[Sales Amount],

[Measures].[Cost Amount],

MEASURES.[Cost %]}

ON COLUMNS,

{[Product].[Product].&[Apples],

[Product].[Product].&[Oranges],

[Product].[Product].[Total Fruit]}

ON ROWS

FROM SALES

image

The general rule is, therefore, whenever you are creating custom-grouping type calculated members like Total Fruit in the WITH clause of a query, to use the Aggregate() function rather than Sum(). The fact that Aggregate() takes into account the AggregateFunction property of each measure on the cube (so that distinct count, min and max measures are dealt with correctly) is another good reason to use it.

Using the Aggregate() function in calculations defined on the cube has the same effect. Even when the Total Fruit calculated member is defined after Cost % on the Calculations tab, as here:

image

…so long as Total Fruit uses the Aggregate() function, running the test query gives the correct result:

SELECT

{[Measures].[Sales Amount],

[Measures].[Cost Amount],

MEASURES.[Cost %]}

ON COLUMNS,

{[Product].[Product].&[Apples],

[Product].[Product].&[Oranges],

[Product].[Product].[Total Fruit]}

ON ROWS

FROM SALES

image

 

There are some very interesting details about the way Aggregate() changes solve order though.

First of all, using the Aggregate() function in a calculated member doesn’t change the solve order of the whole calculation, just the part of the calculation that uses the Aggregate() function. With the following calculations defined on the cube:

CREATE MEMBER CURRENTCUBE.[Measures].[Cost %] AS

DIVIDE([Measures].[Cost Amount],[Measures].[Sales Amount]),

FORMAT_STRING='0.0%';

 

CREATE MEMBER CURRENTCUBE.[Product].[Product].[One Aggregate] AS

AGGREGATE({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]});

 

CREATE MEMBER CURRENTCUBE.[Product].[Product].[One Sum] AS

SUM({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]});

 

CREATE MEMBER CURRENTCUBE.[Product].[Product].[Two Aggregates] AS

AGGREGATE({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]})

+

AGGREGATE({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]});

 

CREATE MEMBER CURRENTCUBE.[Product].[Product].[Two Sums] AS

SUM({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]})

+

SUM({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]});

 

CREATE MEMBER CURRENTCUBE.[Product].[Product].[One Aggregate One Sum] AS

AGGREGATE({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]})

+

SUM({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]});

…running the following query:

SELECT

{[Measures].[Sales Amount],

[Measures].[Cost Amount],

MEASURES.[Cost %]}

ON COLUMNS,

{[Product].[Product].&[Apples],

[Product].[Product].&[Oranges],

[Product].[Product].[One Aggregate],

[Product].[Product].[One Sum],

[Product].[Product].[Two Aggregates],

[Product].[Product].[Two Sums],

[Product].[Product].[One Aggregate One Sum]}

ON ROWS

FROM SALES

…gives these results:

image

The value returned for the calculation [One Aggregate One Sum], which contains an Aggregate() and a Sum(), shows that the value returned by the Aggregate() is evaluated at a different solve order than the value returned by Sum(), even if they are inside the same calculated member.

Furthermore, in some very obscure cases the contents of the set passed to the Aggregate() function determine whether its special solve order behaviour happens or not. I don’t know for sure what all those cases are but I have seen this happen with time utility (aka date tool aka shell) dimensions. Here’s an example.

The demo cube I’ve been using in this post has been changed to add a new dimension, called Data Type, which has just one hierarchy with one member on it called Actuals; Data Type is a fairly standard time utility dimension. The Cost % calculation has also been changed so that it’s now a calculated member on the Data Type dimension, although it is still defined on the cube. Here’s its new definition:

CREATE MEMBER CURRENTCUBE.[Data Type].[Data Type].[Cost %] AS

DIVIDE(

([Measures].[Cost Amount],[Data Type].[Data Type].&[Actuals]),

([Measures].[Sales Amount],[Data Type].[Data Type].&[Actuals])),

FORMAT_STRING='0.0%';

Now if I run the following query:

WITH

 

MEMBER [Product].[Product].[Simple Set] AS

AGGREGATE({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]})

 

MEMBER [Product].[Product].[Nextmember Function Used] AS

AGGREGATE({[Product].[Product].&[Apples],

[Product].[Product].&[Apples].NEXTMEMBER})

 

MEMBER [Product].[Product].[Descendants Function Used] AS

AGGREGATE(DESCENDANTS({[Product].[Product].&[Apples],

[Product].[Product].&[Oranges]}))

 

MEMBER [Product].[Product].[Descendants Function Used Twice] AS

AGGREGATE({

DESCENDANTS([Product].[Product].&[Apples]),

DESCENDANTS([Product].[Product].&[Oranges])

})

 

MEMBER [Product].[Product].[Descendants Function Used Twice With Union] AS

AGGREGATE(

UNION(

DESCENDANTS([Product].[Product].&[Apples]),

DESCENDANTS([Product].[Product].&[Oranges])

))

 

SELECT

{[Measures].[Sales Amount]}

*

[Data Type].[Data Type].ALLMEMBERS

ON COLUMNS,

{[Product].[Product].&[Apples],

[Product].[Product].&[Oranges],

[Product].[Product].[Simple Set],

[Product].[Product].[Nextmember Function Used],

[Product].[Product].[Descendants Function Used],

[Product].[Product].[Descendants Function Used Twice],

[Product].[Product].[Descendants Function Used Twice With Union]}

ON ROWS

FROM [Sales With Data Type]

I get these results:

image

Note that for some of the calculations, the Aggregate() function results in a lower solve order in the way we’ve already seen, but not for all of them. Using the NextMember() function, or having two Descendants() functions without wrapping them in a Union() function, seems to stop SSAS assigning the calculation a lower solve order. Ugh. Luckily, though, I have only been able to replicate this with calculated members from two non-measures dimensions; if Cost % is a calculated measure Aggregate() always gives the lower solve order. Apparently this is something that SSAS does on purpose to try to recognise ‘visual total’-like calculated members and make them work the way you want automatically. This is definitely something to beware of if you are using time utility dimensions and calculations on other dimensions though, as it may result in incorrect values being displayed or performance problems if you’re not careful.

[Thanks to Gabi Münster for showing me how Aggregate() works with different sets and Marius Dumitru for confirming that this is intended behaviour]

Counting Customers Who Have Bought All Selected Products

Market basket analysis is a fairly well understood problem for SSAS – it’s best handled using many-to-many relationships, as shown in the excellent The Many-to-Many Revolution  white paper. However the other week I was asked a question that was an interesting variation on this problem, and which shows up one of the limitations of the many-to-many approach.

Let me show you an example. The following PivotTable on the Adventure Works cube shows the number of distinct customers who bought products in the product category Bikes:

image

The Customer Count measure is a distinct count measure, the Country hierarchy from the Customer dimension is shown on rows, and the slicer here shows the Category hierarchy from the Product dimension. If you select Bikes and Clothing in the slicer, like so:

image

… the measure values are larger because what you’re seeing now is the number of distinct customers who bought products in either the category Bikes or the category Clothing.

But what if you want to see the number of customers who bought Bikes AND Clothing? Well, setting up a many-to-many relationship will give you this (and a lot more useful stuff too) but there are some limitations with this approach:

  • Adding the m2m relationship and the extra measure groups it needs increases processing time
  • You have to select your two product categories from two different (but identical) hierarchies
  • You can only select two product categories – if you want to add a third to your AND filter you need to add a third m2m relationship, and so on. It’s not very flexible.

It is possible to get around these issues with a pure MDX approach, and apply an AND filter that works with however many product categories that the user selects.

The key to understanding how this works is to see how you would write an MDX calculated member that does an AND filter on two product categories:

WITH

MEMBER MEASURES.[Bikes And Clothing] AS

COUNT(

EXISTING 

NONEMPTY(

NONEMPTY(

[Customer].[Customer].[Customer].MEMBERS,

([Measures].[Internet Sales Amount],[Product].[Category].&[3])),

([Measures].[Internet Sales Amount],[Product].[Category].&[1]))

)

SELECT

{MEASURES.[Bikes And Clothing]}

ON 0,

NON EMPTY

[Customer].[Country].[Country].MEMBERS

ON 1

FROM

[Adventure Works]

image

Here I’ve used two nested NonEmpty() functions to filter the set of all customers, first to get the ones that bought Clothing, then to filter these customers again to get the ones that bought Bikes; there’s also an EXISTING there to get only the ones in the current country.

Once again, to add more product categories to the AND filter you need to add more nested NonEmpty() functions… which means you need a way to dynamically generate the code, which of course you can do using StrToSet(). Now normally I avoid using StrToSet() inside MDX calculations because it can cause serious performance problems but in this case it’s the best choice.

The following named set and calculated measure show how to solve the problem for Adventure Works:

CREATE DYNAMIC SET [Selected Product Categories] as  

EXISTING [Product].[Category].[Category].MEMBERS;

 

CREATE MEMBER CURRENTCUBE.MEASURES.[Distinct Customers with all selected Product Categories] AS

IIF(

[Measures].[Internet Sales Amount]=0,

NULL,

COUNT(

EXISTING

STRTOSET(

GENERATE(

[Selected Product Categories],

"NONEMPTY(", "")

+ "[Customer].[Customer].[Customer].MEMBERS" +

GENERATE(

[Selected Product Categories],

", ([Measures].[Internet Sales Amount], " +

[Selected Product Categories].CURRENT.UNIQUENAME

+ "))", "")

)

)

);

The dynamic named set is there to help work out what is selected in the slicer in my Excel worksheet, as shown in Mosha’s old post here. The calculated measure is where all the interesting stuff happens: it uses the ability of the Generate() function to iterate over a set (in this case the dynamic named set), evaluate a string expression and concatenate these strings. The output of this is a series of nested NonEmpty()s, which then goes to StrToSet() to be evaluated, and then the contents of that set are counted.

image

This technique does not replace using a m2m relationship, because it will not allow you to show a query with product category on rows and columns and the number of customers who bought each combination shown. However, if all you want to do is show the number of customers who have bought one specific combination, this has a lot of advantages.

For all of your MDX fans out there, here’s another approach I came up with which doesn’t perform quite as well but is so much fun I had to share it:

CREATE MEMBER CURRENTCUBE.MEASURES.[Distinct Customers with all selected Product Categories V2] AS

IIF(

[Measures].[Internet Sales Amount]=0,

NULL,

COUNT(

UNION(

INTERSECT(EXISTING [Customer].[Customer].[Customer].MEMBERS AS MYCUSTOMERS, {}),

GENERATE(

[Selected Product Categories],

IIF(

[Selected Product Categories].CURRENT.ITEM(0) IS TAIL([Selected Product Categories]).ITEM(0),

NONEMPTY(MYCUSTOMERS, [Measures].[Internet Sales Amount]),

INTERSECT(NONEMPTY(MYCUSTOMERS, [Measures].[Internet Sales Amount]) AS MYCUSTOMERS, {})

)

)

)

)

);

I’m using Generate() here again, but this time I’m using it with inline named sets (which are even worse for performance than StrToSet) in a way similar to this classic post. The theory is the same though: it’s a way of calling NonEmpty() multiple times over a set with an arbitrary number of items in.

Here’s what the calculation does:

  • Inside the first Intersect() I’m declaring an inline named set called MYCUSTOMERS, then doing the Intersect() with an empty set to return… and empty set. But I’ve managed to declare my inline named set, which is the point here.
  • I’m then iterating over the same dynamic named set shown in my earlier example using Generate(), and:
    • For all but the last product category in that set, I’m doing the NonEmpty() on the contents of MYCUSTOMERS on the current product category and then overwriting the contents of MYCUSTOMERS with the output. However, for these iterations of Generate() I’m returning an empty set.
    • For the last product category in the set I’m actually returning the output of NonEmpty() over MYCUSTOMERS for the current product category. The reason I’m only returning something other than an empty set on the last iteration of Generate() is that Generate() returns the union of all the sets returned by each iteration, and that would give me the wrong results! I only want to output the set from the final iteration.

Calculating The Value Of Overdue Invoices Using Many-To-Many Relationships in SSAS Multidimensional

Recently I had to solve the same, rather complex, problem for three different customers within the space of two weeks, a variation of the ‘events-in-progress’ problem I’ve blogged about a lot in the past. It’s this one: how can you calculate the value of your overdue invoices on any given date? It’s important to stress that we are not talking about the invoices that are overdue today – we want to be able to pick any date in the past and find out the value of invoices that were overdue at that point in time.

Let’s imagine you have a fact table containing payments against invoices: each row has an invoice number, the date of the payment, the payment amount, and the date that the invoice is due. A positive value in the Amount column indicates that this is the opening amount of the invoice; negative values in the Amount column are payments against the invoice. Multiple payments can be made against an invoice before it is fully closed. Here’s some example data:

image

Looking at the rows highlighted for invoice 5, you can see in the upper box that there is a positive amount of £35 shown on January 1st 2014 – this is the date that the invoice was opened, and £35 is the full value of the invoice. The invoice is due on January 10th 2014. In the lower box you can see there were four separate payments of £5, £5, £10 and £15 before the invoice was fully paid off on January 12th 2014.

Now, if you were to do a total-to-date in MDX (similar to what I describe here) it would be pretty easy to calculate the outstanding amount on all invoices on any given date. However the requirement here is not only to do that, but to break the value down so that you can see what the total value for overdue invoices and non-overdue (which I’m going to call backlog in this post) invoices is on any given date. This is tricky to do because we have to take the Due Date of each invoice into account as well as the Payment Date. The desired output for invoice 5 is this:

image

As you can see in this PivotTable, for invoice 5 £25 was still outstanding on January 9th 2014, but this is shown as backlog because this is before the due date of January 10th. On January 11th, one day after the due date, the remaining outstanding amount of £15 is shown as overdue. Of course, we also want to be able to calculate the correct values for all invoices:

image

One way of solving this problem would be to calculate the overdue and backlog values for each date that each invoice is open in your ETL, and store these values in a snapshot fact table. This works, and will give you the best possible query performance, but it has two major drawbacks: first, it makes your ETL much slower, and second it makes your fact table much larger. This post describes how you can calculate the overdue and non-overdue values on any given date using many-to-many relationships instead, without having to blow out the fact table.

To go along with the fact table (called FactInvoice) shown above, I have a date dimension table called DimDate (I have deliberately reduced the number of rows here to the dates I have data for, for reasons that will become clear soon):

image

I have a dimension table containing all of the days that an invoice can be overdue or not overdue for (which has one less than double the number of rows as the date dimension table), plus a second column classifying each row as ‘Backlog’ or ‘Overdue’:

image

… and an invoice dimension table that just contains the distinct invoice numbers called DimInvoice.

Now, let me explain how to build the SSAS cube.

Step 1

  • Build dimensions from all of the tables shown above, so you have dimensions called Date, Overdue Days and Invoice.
  • Build a cube with one measure group, based on FactInvoice, and create one measure with AggregateFunction Sum based on the Amount column in that table.
  • Add the Invoice dimension to the cube with a regular relationship. Add the Date dimension to the cube twice, as Payment Date and Due Date, with regular relationships on the PaymentDate and DueDate columns. The Dimension Usage tab should look like this:

image

Step 2

  • Create the following view in SQL Server:
  • CREATE VIEW [dbo].[FactDateToPaymentDate]
    AS
    SELECT        a.DateKey, b.DateKey AS PaymentDateKey
    FROM            dbo.DimDate AS a INNER JOIN
                             dbo.DimDate AS b ON a.DateKey >= b.DateKey

    This view returns all of the combinations of a given date and all dates up to and including the date.

  • Add this view to the DSV and create a new measure group from it; you’ll need to create a measure here, but it can just be a Count measure.
  • Add the Date dimension to the cube once again, this time leaving the name as Date (you will now have three role-playing copies of the Date dimension in the cube).
  • Set up regular relationships between the Date and Payment Date dimensions and the new measure group, then a many-to-many relationship between Date and the Invoice measure group. This means that when you query the cube by the Date dimension, the many-to-many relationship will mean you see the sum of all Amounts whose payment date is up to and including the date selected. The Dimension Usage tab will look like this:

image

Step 4

  • Create another view in SQL Server with this definition:
  • CREATE VIEW [dbo].[FactDueDateToAsOfDate]
    AS
    SELECT        a.DateKey AS DueDateKey, b.DateKey AS AsOfDateKey, DATEDIFF(dd, a.FullDate, b.FullDate) AS OverDueDays
    FROM            dbo.DimDate AS a CROSS JOIN
                             dbo.DimDate AS b

    Yes, I am cross joining the DimDate table with itself and yes, this could return a lot of rows. However you should find that the view is very fast to execute.

  • Add this view to the DSV and build another measure group from it, again with a single Count measure on it.
  • Set up a regular relationship between this new measure group and the Due Date dimension.
  • Add the Overdue Days dimension to the cube, give it a regular relationship with the new measure group and a many-to-many relationship with the Invoice measure group through the new measure group.
  • Add the Date dimension to the cube yet again, creating a new role-playing dimension called As Of Date, give it a regular relationship with the new measure group and a many-to-many relationship with the Invoice measure group through the new measure group. The Dimension Usage tab will now look like this:

image

The purpose of this measure group is this: if you select a date on the As Of Date dimension, you will be able to select ‘Overdue’ on the Overdue Days dimension and this will give you all of the dates on Due Date that were overdue on that date.

Step 5

  • You only want to have to select one date in your PivotTable, so create the following MDX calculated measures that take your selection on the Date dimension and applies it to the As Of Date dimension too:

CREATE MEMBER CURRENTCUBE.MEASURES.[Open Amount Hidden] AS
([Measures].[Amount],
LINKMEMBER([Date].[Date].CURRENTMEMBER, [As Of Date].[Date])
), VISIBLE=FALSE;

CREATE MEMBER CURRENTCUBE.MEASURES.[Open Amount] AS
IIF(MEASURES.[Open Amount Hidden]=0, NULL, MEASURES.[Open Amount Hidden]);

  • Hide all measures apart from the calculated measures you’ve just created
  • Hide the Payment Date, Due Date and As Of Date dimensions

Conclusion

This is a very complex pattern, I know, and this is after I’ve simplified it a lot (if you need currency conversion as well then things get even worse) but I also know it’s extremely useful from a business point of view. Query performance is also reasonably good, at least in the places where I have implemented this.

You can download my sample SQL Server database and VS 2012 project here.

Optimising MDX Calculations With The Unorder() Function

The Unorder() function is probably one of the least used functions in the whole of MDX. It exists only as a query performance hint and, since I had never up to now found a scenario where it did improve the performance of a calculation I had pretty much forgotten about it (as Books Online says, the optimisation it performs is applied automatically in many cases). However I was playing around with some calculations last week and found out that it does have its uses…

What does the Unorder() function do? It’s a function that takes a set and returns a set, and what it does is remove any implicit ordering from that set. By default all sets in MDX are ordered, but for some types of operation that ordering is unimportant and ignoring it can result in faster query performance.

Take, for example, the following query on the Adventure Works cube which shows the number of customers who have bought something up to the current date:

WITH

MEMBER MEASURES.CUSTOMERSTODATE AS

COUNT(

    NONEMPTY(

        [Customer].[Customer].[Customer].MEMBERS

    , {[Measures].[Internet Sales Amount]}

      *

      {NULL : [Date].[Calendar].CURRENTMEMBER})

)

 

SELECT

{MEASURES.CUSTOMERSTODATE}

ON COLUMNS,

[Date].[Calendar].[Date].MEMBERS

ON ROWS

FROM

[Adventure Works]

 

On my laptop it executes in 35 seconds on a cold cache. We can optimise the calculation here simply by wrapping the set of all members on the Customer level of the Customer hierarchy with the Unorder() function, so:

UNORDER([Customer].[Customer].[Customer].MEMBERS)

The following query now executes in 27 seconds on a cold cache:

WITH

MEMBER MEASURES.CUSTOMERSTODATE AS

COUNT(

    NONEMPTY(

        UNORDER([Customer].[Customer].[Customer].MEMBERS)

    , {[Measures].[Internet Sales Amount]}

      *

      {NULL : [Date].[Calendar].CURRENTMEMBER})

)

 

SELECT

{MEASURES.CUSTOMERSTODATE}

ON COLUMNS,

[Date].[Calendar].[Date].MEMBERS

ON ROWS

FROM

[Adventure Works]

 

As far as I can tell, Unorder() only makes a difference on calculations when used in combination with NonEmpty(), and when it is used over a large set (here the set of customers has around 18000 members). If you have calculations like this I would recommend testing to see if Unorder() makes a difference – if it does, please leave a comment and let me know what you find!