Creating local cubes in AS2005

I spotted the following thread on the AS2005 beta newsgroup, concerning the new way of creating local cubes in AS2005:

(Apologies for the link, but the web-based UI didn’t give me a better way of linking to the whole thread – hopefully you can read it backwards)

Anyway, in summary, it means you create local cubes with the same DDL as you use to create server-based cubes – a very welcome improvement in my opinion.

Finding the currentmember on all dimensions, take#2

Mosha’s teasing me now, I think. Anyway, here is a solution to the problem that is still AS2K-compatible, doesn’t hard-code any dimension names and doesn’t rely on having levels with set numbers of members:



Since you can only have 128 dimensions in a cube in AS2K you’re safe on the hard-coded pass-depth, but it’s still not a truly satisfying solution for me. Unfortunately, until my Yukon test server is back working again (hurry up with the reinstallation, Colin!) I won’t be able to test any solutions that use new MDX functionality.

Jon Axon and I have just been discussing what the Yukon solution might be: as Jon pointed out, since Yukon automatically resolves infinite recursion by taking the value from the previous calculation pass, without using CalculationPassValue(), then the same approach as above would look a lot nicer when expressed in Yukon MDX. But Mosha did say that his solution used GENERATE() too, so I reckon he has a completely different way of doing it. I’m sure it will be blindingly obvious when he finally tells me what it is…


Finding the currentmember on all dimensions

In yesterday’s entry I mentioned it would be nice if you could iterate over all the dimensions in your cube and find the currentmember on each of them. Mosha then hinted that this was possible, and there’s nothing I like more than a challenge. Here’s the solution in a Foodmart 2000 query – and it uses all-AS2K functionality too!

, IIF(
), ", ")’
[Education Level].[Education Level].MEMBERS ON 1
WHERE([Customers].[All Customers].[Canada].[BC].[Burnaby],[Gender].[All Gender].[F] )

All you need is a level with more members than you have dimensions in the cube. OK, it’s not elegant, but it works…

New AS2005 MDX white papers

I see that Richard Tkachuk has put up two more MDX Scripting white papers on his site in addition to the ‘Loading Aggregate Data’ paper that Mosha pointed to in his blog at the end of last year. All three go into a lot of detail and show off some of the cool new features available in AS2005 MDX such as the ability to return a set from an IIF statement.

Looking at the gigantic statement used to retrieve a comment for a cell in the middle of the ‘Cell Annotations’ paper reminded me of a feature I’ve thought would be useful on several occasions: the ability to create sets containing dimensions instead of members. In Richard’s MDX he’s forced to hard-code the names of all the dimensions in his cube in order to get the keys of the currentmember of each of them; if you could simply get a set of all dimensions in the cube, you could then iterate over that set with a GENERATE statement and get the keys using something like a CURRENTDIMENSION function. This would also make the code somewhat more robust, as it would no longer break if you added or removed dimensions from the cube. I don’t know whether this kind of functionality is feasible or not, but I thought I’d mention it…

Grouping in MDX – response to Mosha

I’m honoured by my mention in Mosha’s blog! But I think my entry on Grouping in MDX, and Mosha’s comments on it, need some further clarification and discussion.

First of all, CREATE SESSION CUBE. I agree it is probably the best way to implement grouping at the moment and that it works well in Excel, but it’s not ideal:

  • Surely it’s a client-only solution? You’d want the option of creating these groupings on the server for all users, just as you would any other calculated member, and CREATE SESSION CUBE doesn’t offer you that, does it?
  • I don’t know what the overhead of creating a session cube on the client is (or, in Yukon, on the server) – but is it as small as creating a calculated member? What is the impact on caching, especially given the fact that you have to drop and create the session cube when you change or add a new group?
  • A lot of the time, for example in Reporting Services, you are only working with one-off queries rather than sessions and you really want to be able to create a group with query rather than session scope.
  • The MDX needed to create the session cube and then manage it within the session seems overly complex for such a simple requirement.

Secondly, to address Mosha’s point on why you would need the VisualTotals and Aggregate functions in the same query: it’s because you’d want to use your new group member in any scenario where you could use a normal member, and that includes a query which used VisualTotals. Imagine you had a measure which showed the distinct count of customers across all your stores, and you wanted a report which had a) a single group member containing your top 5 stores, b) several other individual stores and c) showed the visualtotal of all the distinct customers in both the group and the individual stores displayed. I think that would be a reasonable requirement and one which wouldn’t be possible unless AS ‘knew’ what members went into the group.

Thirdly, sets in the WHERE clause (and also subcubes in the FROM clause, which do the same thing) inYukon. Unfortunately, this only works when you’re slicing by the group and not when the group is on a visible axis so it doesn’t fit the scenario I was describing.

Overall, then, CREATE SESSION CUBE is almost the functionality that I want but it doesn’t allow groups to be defined on the server. So we’re close…!

New version of the BI Portal

Duncan Lamb points to the new version of the BI Portal (some of you might remember the previous version), which is available here. I first saw this demoed about 18 months ago when I still worked at Microsoft and I’m glad it’s got an ‘offical’ release at last. The only disappointment is that this is a sample app, which I guess means it’s not going to be supported – not that that will stop people putting it into production, I’m sure.

Building Aggregations Manually

I’ll start with two disclaimers:

  • This entry is based on my own understanding of how the AS engine works, based on my own experience and the occasional question I’ve asked the dev team. It might not be totally accurate on the details but hopefully the general principles are sound – I’ve used them successfully on a number of occasions.
  • 99% of the time the Storage Design Wizard and Usage Based Optimisation Wizard will design the right aggregations for you, and you won’t need to build aggregations manually.


So what about the 1% of times when you will need these techniques? They usually occur when you have several dimensions which have large numbers of members at their lower levels, and queries which retrieve data at these levels. Cubes with large parent-child dimensions and dimensions containing entities like customers are the most common real-life manifestations of this scenario. The 1/3 rule, which dictates which aggregations can be built by the Storage Design Wizard and the Usage Based Optimisation Wizard (see the Analysis Services Performance Guide for more details), may have stopped any aggregations which would be useful for your queries being built and so query performance suffers as a result. Now the 1/3 rule is there for a good reason – to stop large aggregations being built which increase processing time and the size of the cube, but don’t have much of an impact on query performance – but it is not infallible; equally, when faced with query performance problems very often you are glad of any improvement in response times, however small!


The first step to take when designing aggregations manually is to understand what data the queries you want to run are actually requesting. The best way to do this is to take a representative sample of these queries, clear the Query Log, set it to log every query and then run the queries one by one. You’ll notice that one MDX query doesn’t necessarily equate to one query in the log – in fact, one MDX query can generate multiple subqueries, even thousands, depending on its complexity; you’ll also notice that what is stored in the query log is somewhat cryptic. Rather than explain it here, I suggest you read the following white paper which explains the contents in some detail. The contents of the DataSet column represent the slice of the cube that each query in the log asked for data from; the same values can be viewed in Performance Monitor using the DSN Requested counter. The other useful value to monitor is the slice of the cube that the AS engine queried to get data to service each of these requests, and unfortunately this is not present in the Query Log – you can only see it in PerfMon using the DSN Used counter.


This is probably a good place to stop and give some examples. Imagine you have a cube with 4 dimensions apart from Measures, and these dimensions have respectively 5, 6, 7 and 8 levels each. If you ran a query requesting values at the top level of each of these dimensions, DSN Requested would show it as 1111 (top, top, top, top); similarly, if you ran a query requesting data from the leaf levels of each dimension, DSN Requested would show it as 5678 (leaf, leaf, leaf, leaf). Now if you had no aggregations in your cube and you ran the first of these queries, 1111, in order to get the value returned AS would have to aggregate values from the leaf member of every single dimension and DSN Used would show 5678; the fact that all this aggregation had to happen at runtime would mean that the query might not run very quickly. If, however, you had an aggregation built at the third level of each dimension and you ran the same query, DSN Used would instead show 3333, and because AS only had to aggregate the members on the third level of every dimension the query would run much faster. Following on from this, imagine that you wanted to run a query that took values from the top levels of the first two dimensions and the leaf levels of the last two dimensions, so that DSN Requested would be 1178. Since it is unlikely that any aggregations would have been built at the bottom level of two such deep dimensions (especially if they had large numbers of members on, for example if one was a Customer dimension), then DSN Used would have to be 5678 and AS would still have to aggregate lots of values at runtime.


Back looking at your Query Log and PerfMon, the next step is to decide whether you need to build aggregations manually. If you’ve been through the Storage Design Wizard and the Usage Based Optimisation Wizard and you’ve set the Aggregation Usage property on all your dimension appropriately (again, see the AS Performance Guide for more information on this), and you still see that your queries are not hitting aggregations (so that there are big differences between the DSN Requested and the DSN Used values) then you probably need to. On the other hand, if you see DSN Requested and DSN Used values which are the same or almost the same, building more aggregations might not do any good and you might need to look at other ways of improving performance, such as partitioning.


The tool you’ll need in order to build aggregations manually is ‘Partition Manager’, which is available in the SQL 2K Resource Kit and also in an updated form in the BI Accelerator, a free download from the Microsoft website. The UI can be a bit fiddly when you have more than a few dimensions but it beats writing the code in DSO.


The final problem is knowing which aggregations you need to build. Going back to the example above, if you saw the pattern 1178 in DSN Requested you could build an aggregation which matched this exactly, ie at the top levels of the first two dimensions and the leaf levels of the last two. This would do the trick, but if your users wanted to drill down on any of the first two dimensions then AS could no longer use it. Therefore it’s a much better idea to build aggregations at the lowest level your users are likely to drill down to, perhaps 3378, and sacrifice a little performance on some higher level queries for much better performance on queries which go slightly deeper. Overall, though, it’s only through trial and error that you’ll work out which aggregations you need.

Jon Axon’s blog

OK, this is nothing whatsoever to do with BI, but my colleague Jon Axon has started his own blog here. If you’re interested in concurrent programming then check it out – and make sure you hit the site several times to make him feel like someone is actually reading his blog! It will make him feel so much better…

Grouping members together

One of the weaknesses of Analysis Services, in my opinion, is support for creating custom groupings of members. I reckon that 90% of all calculated members on non-measures dimensions must be doing just this, ie just doing an AGGREGATE or SUM over a set of members, and yes calculated members will return the right values but my complaint is something else. It’s that you then have no idea what members were aggregated together inside this calculated member, and that functions like VISUALTOTALS, NONEMPTYCROSSJOIN etc that you would like to be ‘group aware’ of course aren’t. Some examples needed, I think…

Consider the following query on Foodmart 2000:

WITH MEMBER [Customers].[All Customers].[USA].DEMO AS ‘AGGREGATE({[Customers].[All Customers].[USA].[CA], [Customers].[All Customers].[USA].[OR]})’
{[Customers].[All Customers].[USA].CHILDREN, [Customers].[All Customers].[USA].DEMO} ON 1

Wouldn’t it be nice, then if VISUALTOTALS ‘knew’ what was in the set and this query

WITH MEMBER [Customers].[All Customers].[USA].DEMO AS ‘AGGREGATE({[Customers].[All Customers].[USA].[CA], [Customers].[All Customers].[USA].[OR]})’
VISUALTOTALS({[Customers].[All Customers].[USA],{[Customers].[All Customers].[USA].[CA], [Customers].[All Customers].[USA].[OR]} },  "* DEMO")
ON 1

returned the same results as this query?

WITH MEMBER [Customers].[All Customers].[USA].DEMO AS ‘AGGREGATE({[Customers].[All Customers].[USA].[CA], [Customers].[All Customers].[USA].[OR]})’
VISUALTOTALS({[Customers].[All Customers].[USA],[Customers].[All Customers].[USA].DEMO },  "* DEMO")
ON 1

And that when you did a NONEMPTYCROSSJOIN against your calculated member, it would return the same results as when you did a NONEMPTYCROSSJOIN against the set that was aggregated in the calculated member? And perhaps also that you could drill down from the calculated member to see the members inside it?

Of course this isn’t possible at the moment, because a calculated member could contain any sort of calculation, so AS simply can’t make any assumptions. But if there was a special kind of group calculated member, which simply took a set of members as its definition and which always returned an AGGREGATE of that set, surely AS could make these assumptions?

Just a thought…

Aggregating values across a date range

Very often you’ll come across a requirement for users to be able to aggregate values across abitrary date ranges. For example, they might want to see the sum of Unit Sales between any two months, say Feburary and April 1997. While this is fairly easy to accomplish with a calculated member when you have control of the MDX, when your users want to use an off-the-shelf client it becomes more difficult – how do you allow people to pick a start date and an end date on your single time dimension?

The way I usually solve this is by creating two time dimensions, one for the start date and one for the end date. However if you hang both off the time key column in your fact table you’ll see that initially no data is returned unless you select the same members on both dimensions – a transaction can’t be in two months simultaneously. You can get round this in a calculated member and sum up the values between the two months, but its not very elegant. What you really want is for your start date dimension to act like a normal dimension and for the end date dimension to be present in the cube but have no effect on it, so then in your calculated member you can simply aggregate all the values between the currentmember on start date and the member on start date which equates to the currentmember on end date.

This is actually very easy to do in AS2005 as you can add a dimension to a measure group without it having any relationship to the fact table. In AS2K though you need to go through the following steps:

  • Create your cube with one time dimension as usual, and call it Start Date
  • Copy this dimension and paste it to the same database, and when asked to rename it call it End Date
  • Create a virtual cube containing everything from your original cube.
  • To add the End Date dimension to this virtual cube you can’t, unfortunately, use Analysis Manager, you have to use some DSO code. Here’s a bit of VBScript as an example:

Dim myserver
Dim db
Dim vcube
Dim fromdim
Set myserver=createobject("dso.server")
Set db = myserver.mdstores("MyDatabase")
Set vcube = db.mdstores("MyVirtualCube")
Set fromdim = vcube.dimensions.addnew("End Date")

  • Now you have a virtual cube with two time dimensions, but the End Date dimension does nothing. To automatically sum up the values of a measure between the selected start and end dates, you would create a calculated measure something like:


One last thing to notice – since we’re now summing up a range of months rather than returning a single month, query performance might suffer. In which case it might be worth taking a look at my recent entries on tuning YTD-style queries…