Using Dynamically-Generated Sets in the MDX Script

Even more fun: I’ve just found that you can use the technique for dynamically generating sets within a cube’s MDX Script. For example, add the following set declarations to the end of the Adventure Works cube’s MDX Script:

create hidden set myyears as [Date].[Calendar].[Calendar Year].members;

create hidden set mytest as
{topcount(descendants([Date].[Calendar].currentmember,[Date].[Calendar].[Date]), 10,[Measures].[Internet Sales Amount])} as
[Top days for "
+ myyears.current.item(0).item(0).name + "])"

Then connect to the cube with your favourite client tool and, bingo! you see all the sets available to use in your queries:


The only thing to watch out for is that, for some reason, you need to declare the set you pass in to the first parameter of the Generate function separately as I have above, rather than inline.

I can see this as being quite useful – if you need to create a large number of very similar sets on your cube it’s much more manageable than declaring each set individually since you only need to write the set expression once.

Cumulative Update Package 2 for SP2

For those of you who like to live life on the bleeding edge, the second cumulative update package for SP2 (build 3175) is now available for download here:
Some relationally-minded friends of mine have commented that these hotfix rollups always seem to contain more than their fair share of AS fixes, and I have to admit that they are probably right. I also heard about one welcome change that I believe is now in there: the AS dev team have got round to making the query-cancellation functionality a lot more responsive, so no more hours wasted waiting for the query to end after you clicked cancel. Hurray!

Advanced Ranking and Dynamically-Generated Named Sets in MDX

Calculating ranks is a classic MDX problem, and as is usually the case Mosha has an excellent blog entry on the subject which I would encourage you to read if you haven’t done so already:

His advice can be summarised fairly simply – where possible declare a named set which is ordered to use in the Rank function rather than try to do this ordering every time you want to calculate a rank because it’ll give you much better performance. This is all you need to know for 99% of the rank calculations you’ll ever write; however, I’ve been doing a bit of thinking around the remaining 1% of scenarios and here’s what I’ve come up with.

First of all it’s worth pointing out that the Order function isn’t always the most efficient way of ordering a set. As Mosha points out in this blog entry:
… there’s a performance bug in the Order function which makes it very slow. Mosha’s own example query:

with member y as
Order( [Customer].[Customer].[Customer].MEMBERS
, Measures.[Internet Sales Amount], BDESC))
select y on 0 from [Adventure Works]

…runs in 40 seconds on my laptop, and while he shows how a sproc can improve performance he doesn’t mention that if you rewrite to use the TopCount function you get even better performance. The following query runs in 4 seconds on a cold cache on my machine:

member y as count(topcount(
  *[Product].[Style].[Style].MEMBERS} as myset
, count(myset), Measures.[Internet Sales Amount]))
select y on 0
from [Adventure Works]

The TopCount function uses a different algorithm to the Order function and it’s optimised to return the top n members of an ordered set; for a relatively small set like the one in this example it performs better than the existing buggy implementation of Order but that may not always be the case. And of course it’s highly likely that the Order function will be fixed in a future service pack so at some point it will start performing better than TopCount. As a result, use this approach at your own risk and test thoroughly!

But let’s get back onto the subject of ranking proper. The obvious problem that Mosha doesn’t deal with in his article is what happens when you have to calculate a rank on more than one criteria? To take Mosha’s example, what about calculating the rank of Employees by Reseller Sales Amount for several Months, where those Months are to appear on Columns? If you know what those Months are going to be in advance it’s fairly straightforward because you can create multiple named sets to use; the worst problem you’re going to have is that your query is going to be pretty long. But what if you don’t know what those Months are going to be or how many of them there are? For example, you might be filtering months on another criteria or using the TopPercent function. There’s no way you can create the named sets you need to get good performance if you don’t know how many sets you’re going to need, is there? If you had complete control over the code of your client tool then you could dynamically generate your MDX query string to give you all the named sets you needed, but that would be a pain even if it was possible at all (and it wouldn’t be with many off-the-shelf tools like Proclarity Desktop). Well, one solution to this problem simply uses one named set for all your Months:

SET MyMonths as TopPercent([Date].[Calendar].[Month].Members, 20, [Measures].[Reseller Sales Amount])
SET MyEmployees as [Employee].[Employee].[Employee].MEMBERS
SET OrderedEmployees AS
, ([Measures].[Reseller Sales Amount],[Date].[Calendar].CurrentMember), BDESC)
, ALL)
MEMBER [Measures].[Employee Rank] AS
,(RANK([Date].[Calendar].CurrentMember, MyMonths)-1) * Count(MyEmployees)
, Count(MyEmployees)

[Measures].[Employee Rank]
ON 0
,MyEmployees ON 1
from [Adventure Works]

The set MyMonths contains the months that I’m interested in, and as I said because it uses the TopPercent function I don’t know in advance how many Months it will contain. However I do know that there’s a static number of Employees that I want to rank so in my OrderedEmployees set I use the Generate function to create a concatenated list of ordered Employees for each Month (note the use of the ALL flag here to make sure Generate doesn’t do a distinct on the set it returns). In my Employee Rank calculation I can then use the Subset function to pick out the section of this set which returns the ordered list of Employees for the current month: it’s the subset that starts at index  (RANK([Date].[Calendar].CurrentMember, MyMonths)-1) * Count(MyEmployees) and is Count(MyEmployees) members long.

BUT… of course this only works because we know there are the same amount of Employees each month. What happens if we change the calculation and ask if the current Employee is in the top 75% of Employees by Reseller Sales Amount for each month? In this case, 8 Employees make up the top 75% for August 2003 but there are 10 for September 2003 and so on so this approach isn’t any use. 

The solution to this problem came to me while I was driving home down the motorway on the way back from my in-laws’ house on Saturday afternoon, and when it did my wife asked me why I had suddenly started smiling so broadly – this is something that will get all you MDX fetishists out there (all three of you) equally excited. Basically it’s a way of dynamically generating named sets within a query. Let’s take a look at the whole solution first:

SET MyMonths as TopPercent([Date].[Calendar].[Month].Members, 20, [Measures].[Reseller Sales Amount])
SET MyEmployees as [Employee].[Employee].[Employee].MEMBERS
SET MyMonthsWithEmployeeSets as
, Union(
{TopPercent(MyEmployees, 75, ([Measures].[Reseller Sales Amount],[Date].[Calendar].CurrentMember))
as EmployeeSet" + Cstr(MyMonths.CurrentOrdinal) + "})")

MEMBER [Measures].[TopEmployee] AS
StrToSet("EmployeeSet" + Cstr(Rank([Date].[Calendar].CurrentMember, MyMonths)))
)<>0, "Yes", "No")

ON 0
,MyEmployees ON 1
from [Adventure Works]

This executes in 2 seconds on a cold cache on my laptop, compared to 52 seconds for the equivalent query which evaluates the TopPercent for every single cell, so it’s definitely a big improvement. What I’m doing is in the set declaration for MyMonthsWithEmployeeSets using a Generate function to iterate over the set of Months I’m going to display on columns and return exactly the same set, but while doing so find the set of the top 75% Employees for each Month and store it in a named set declared inline. The way I do this is to return a set containing the current Month from the iteration and union it with an expression which returns an empty set; the top 75% set is evaluated and stored inside the expression which returns the empty set. The fun bit is that the expression which returns the empty set is inside a string which is passed into StrToSet, and as a result I can dynamically generate the names of my named sets using a static string plus the result of the currentordinal function. In the example above I end up creating five named sets called EmployeeSet1 to EmployeeSet5, one for each Month. Thanks to the fact that I can reference these sets in another calculated member so long as it’s evaluated further down the execution tree (see, assuming I construct my SELECT statement appropriately I can then get at the contents of these sets within my TopEmployee calculated member using another call to StrToSet and some string manipulation to determine the name of the set that I’m after. How cool is that?

New release of the Analysis Services Stored Procedure Project

After months of procrastination, there’s finally a new release of the Analysis Services Stored Procedure Project available here:
Thanks to Darren Gosbell for doing the build and tidying up the wiki. Several of the new features have already been blogged about here and on Darren’s blog so I won’t go into too many details, but there’s a lot of cool new stuff there to enjoy. My favourite is Greg Galloway’s CreatePartitions function that allows you to automatically create partitions with slices based on the members of an MDX set – great for creating proof of concept cubes; his functions to dump AS memory and disk usage information are also impressive. Darren’s XMLADiscover class contains a load of useful stuff, particularly the ClearCache function which clears the cache on a database or a particular cube. There’s also some stuff that hasn’t even been included in the project yet but hopefully will be soon, so stay tuned…

Creating local cubes with AMO

Interesting nugget from the MSDN Forum, in a post by Adrian Dumitrascu: you can create local cubes in AMO. Here’s the text of the post:
AMO (Analysis Management Objects, the object model for administering AS2005, the successor for DSO from AS2000) also works with local cube files (.cub). You can use it to create/alter objects or to refresh the data by re-processing every night, as you mentioned.
Sample code to connect to a local cube file:

using Microsoft.AnalysisServices; // the .dll is in "%ProgramFiles%\Microsoft SQL Server\90\SDK\Assemblies", also in GAC

Server server = new Server();
server.Connect(@"Data Source=c:\MyLocalCube.cub"); // this will create an empty local cube if not there already
… // do something, for example process cubes

Makes sense that you can do this, when you think about it, given that everything’s XMLA behind the scenes.

Scaling Out SSAS

Yet another new white paper, this time on scaling out Analysis Services:
Interesting and good that they don’t shy away from the things that don’t work so well. I’ve heard a few people complain that the built-in sychronisation doesn’t work well with complex (ie lots of dimensions/attributes/partitions as opposed to large) databases, and although I’ve not had any practical experience myself that reflects this the paper does refer to the synchronise functionality as being "quite robust" just before it describes how to do the same thing in a SSIS package using Robocopy to copy the AS data folder.

BI Survey

It’s BI Survey time again! Wasn’t it the OLAP Survey last year? Anyway, it always makes for a fascinating read when it comes out. Here’s the blurb:
We would very much welcome your participation in The BI Survey. This is the largest independent survey of OLAP users worldwide. The Survey will obtain input from a large number of users to better understand their buying decisions, the implementation cycle and the business success achieved. Both business and technical respondents are welcome.

 The BI Survey is strictly independent. While vendors assist by inviting users to participate in the Survey, the vendors do not sponsor the survey, nor influence the questionnaire design or survey results. As a participant, you will not only have the opportunity to ensure your experiences are included in the analyses, but you will also receive a summary of the results from the full survey. You will also have a chance of winning one of ten $50 Amazon vouchers. Click here to complete the survey on-line:

Top Ten Best Practices for Processing and Querying AS

Two excellent new papers published today, on best practices for AS processing and query performance:
There’s a lot of useful information in both which isn’t present in the AS Performance Guide or in other sources.

Reporting Services, MDX and Aggregated Values

Undoubtedly the best time to find out about an undocumented change in functionality in a product is halfway through a demo of said functionality to a large group of people. This happened to me last week: I was teaching my MDX course (written on the last CTP of SP2) and had just finished my speech on all the rubbish aspects of Reporting Services/Analysis Services integration and was trying to show how Reporting Services automatically filtered out all but the lowest level of granularity of data from an MDX query (see Teo Lachev’s post here: and Reed Jacobsen’s posts here: and here: for details) when I found that it wasn’t doing it any more. Later on I emailed Teo to ask if he knew anything about this change – he didn’t but he asked the RS dev team and they gave him some details (he then blogged about it here: So, a small victory for the people – Reporting Services now no longer tries to force its own aggregation functionality on you and you always see the full results of your query, unless you’re already using the RS Aggregate function in your reports. This is apparently in response to customer demand. Just be sure to check any existing RS/AS reports you’ve got in production to make sure they’re not displaying extra rows now!

I’m now hopeful that the message will get through about the other stupid restrictions that RS places on AS data sources, such as only being able to put the measures dimension on columns. I’ve not kept up with the changes in RS2008 as much as I should (Teo again has a good overview of what’s coming here: but I’ve not heard that these restrictions will be lifted. The problem is of course that RS expects to do all the aggregation of data itself, but the RS dev team don’t seem to understand that if I’m using AS then I will have designed all my aggregation business logic into my cubes and dimensions and I couldn’t care less about what RS can do in this area (for an example of their mindset, see the somewhat patronising comments on this posting on Connect from last year:

While we’re talking about Katmai I might as well mention why I’ve not blogged about the new AS features in it – it’s because there aren’t all that many. You’ve probably already read Mosha’s post about the attribute relationship designer (; Vidas Matelis has covered the other cube design wizard changes here: More features will come in later CTPS but overall it’s really going to be all about performance improvements and manageability, and while I know a bit about some of the planned features it’s too early to blog about them because it’s too early to say what’s going to actually get included. Suffice to say that for the serious AS developer there are going to be some very welcome improvements but there won’t be anything that excites the marketing people.

%d bloggers like this: