Rant: Reporting Services and Analysis Services

I’m just finishing off my presentation for PASS, in the course of which I’ve come up with a whole bunch of topics to blog about in the future, but now I’ve got some more spare time I’ve started to look at Reporting Services 2005 in more detail. Unfortunately this has also reminded me of something that has irritated me for a long time and which I need to get off my chest… 

Now, before I get all steamed up I should say that I have done quite a bit of work with Reporting Services and like it a lot – its extensibility and programmability mean you can do a lot of cool stuff with it. However in my opinion it has one big design flaw, a flaw that Yukon doesn’t really seem to be doing much to correct, and which makes me roll my eyes and tear my hair whenever I think about it: Reporting Services just doesn’t support Analysis Services properly.

Before the comments start, yes, I have seen all the new features in RS2005 to do with AS support, but in my mind the problem is a fundamental one. And it’s that RS2005 still expects to receive a flattened rowset when you’re querying AS. Why is this a problem? Well, MDX gives you all this rich functionality to decide how you want to lay out your resultset, crossjoining as many dimensions you want on rows and columns, slicing by what you want, and RS then expects you to ignore all this and make MDX behave like SQL. You have to put measures on columns (what happens if you want to put measures in the WHERE clause then?) and all the rest of your dimensions on rows or in the WHERE clause, and then use the matrix control to recreate what you used to be able to do with a simple crossjoin. In RS2000 it was only recommended that you write your queries this way, and you could ignore this advice if you liked; in RS2005 this rule is actually enforced! What about backwards compatibility? Argh! It’s clumsy, it’s restrictive, and it’s a waste of time.

All I want is to be able to enter any MDX statement I want, assign it to a matrix-like control that understands multidimensional resultsets, and be able to display the results in a report. Not difficult, surely? I know it might break the purity of the design, but why should every data provider be forced to dumb itself down to the level of SQL? After all, isn’t one of the unique selling points of AS that MDX allows you to get round exactly this kind of inherent limitation in SQL? To me it’s unbelievable that the two Microsoft flagship BI products don’t understand each other on this basic level.

OK, rant over. Is anyone listening?

 

Finding the currentmember on all dimensions, take#3

Following on from Mosha’s comment revealing the new .CurrentOrdinal function, I thought I’d mention how this solves the problem even though, for me, it’s still not quite resolved…

After a bit of grovelling, Mosha sent me the following example of how .CurrentOrdinal works:

with member measures.x as

generate(strtoset("{"+rept("measures.defaultmember,", dimensions.count-1)+"measures.defaultmember}") as a,

dimensions(a.currentordinal-1).name + ",")

select x on 0 from [Sales]

The reason .CurrentOrdinal is the key is that it allows you to create a set containing n instances of the same member and still know what iteration you were on inside a GENERATE statement. This frees you from the restriction present in my first attempt at this problem, which relied on a level being present in your cube which had at least as many members on it as there were dimensions. Remember that without this new function, you could only use the RANK function to try to find the iteration number and this only worked if each of the members in the set were unique.

The problem I had was that the vba REPT function doesn’t work on my Yukon install, and I couldn’t find another way of generating a set containing n instances of the same member. However it seems to work for Mosha so I can only assume it’s a bug in the February CTP of some sort, and in later builds we’ll all be able to use this technique.

 

More on querying dimensions

Apart from Mosha’s putting a dampener on my excitement (should I really be getting excited by MDX anyway? Is it healthy?) about being able to query a dimension directly, by saying that in the future only database administrators will be able to do it after the next CTP, there was another interesting comment on yesterday’s entry – why is being able to do this any more efficient than doing the following:

SELECT MYDIMENSION.MEMBERS ON 0, {} ON 1 FROM MYCUBE

Hmm, well, good question. To be honest I’d not thought of doing this – I’d always done

SELECT MYDIMENSION.MEMBERS ON 0 FROM MYCUBE

which of course does bring back unwanted data. You learn something new every day! In the future I suppose the first of these queries is going to be the best option available to non-database-admins, but I was curious to see if querying a dimension was any more efficient than querying a cube. Now I’m sure a certain someone out there with intimate knowledge of how the AS engine handles each of these queries will be able to give a definitive answer; in the meantime I fired up Profiler to see if I could see any difference between the two approaches. Unfortunately I don’t have much experience about the data Profiler returns and I have to say that what it did return didn’t seem that relevant, but looking at the durations of queries it seemed that querying a dimension directly was slightly more efficient although the differences were too tiny to be noticeable.

Querying a dimension not a cube

A small but significant bit of new functionality in AS2005, pointed out by Ohjoo Kwon on the beta newsgroup: you can now run MDX queries against dimensions as well as cubes. So, for example, you can do the following:

SELECT [MYDIMENSION].MEMBERS ON 0 FROM [$MYDIMENSION]

Why is this useful? Anyone who has written any kind of client interface for AS2K and used the ADOMD catalog object will know how slow it is, and that it’s easier to write queries instead; and similarly in Reporting Services very often you have to write queries to populate lists of valid values for parameters. The problem is that when you run a query on a cube you always have to bring back at least one measure’s worth of data, which is a bit of a waste and potentially time- and resource-consuming. Now you don’t have to bring back any measure data because you can query the dimension instead!

XMLA Compression

Some interesting comments from Mark Hill on his blog concerning the problems he’s had with querying over a poor network. I know lots of people have had similar problems, but I think he’s the only person I’ve heard of who has persevered and got a workable solution.

Comparison of Oracle and Microsoft OLAP

A typically intelligent and even-handed comparison of Oracle and Microsoft’s forthcoming OLAP offerings is available here from Mark Rittman (whose blog is also well worth a read). From a technical perspective I agree with everything he says, particularly about query language being a key differentiating feature. A little-known fact about Analysis Services is that you can query it using SQL, but it only supports a very basic dialect; it would have been interesting to see what the reaction of users would have been if AS’s SQL support was even slightly better than it is. Would it have affected the uptake of MDX? Possibly, since it might have discouraged users from investing the time and effort in learning the language which in turn would mean they never discovered the benefits it offers over SQL. On the other hand it would have been useful to have SQL-centric BI tools like Business Objects be able to query an AS cube natively; in the case of Business Objects, although their support for MDX and Analysis Services has actually been pretty strong I’ve always felt it’s somewhat peripheral, never properly integrated into their product line.

Maestro

Interesting nugget from Barbara Darrow here, regarding the next release of the scorecard tools that are currently available as Solution Accelerators. I’m prepared to accept that the features it will contain will be ‘useful and startling’ (when was the last time that a software vendor told you anything else about a forthcoming version?) but to me the bigger news is the fact that it might ship bundled with SQL Server itself.

I already knew from reports such as this that MS decided that the previous ‘free download’ approach of marketing the Solution Accelerators was a failure. That didn’t surprise me: MS relies heavily on partners sell and implement BI solutions and in my experience they were very often lukewarm in their response to the Solution Accelerators. Why? Partly because these companies had existing relationships with vendors such as Proclarity who had their own tools with similar, and often more sophisticated functionality; partly because they were sceptical about the long-term future of a free tool (and let’s face it, the second report above vindicates this scepticism); and partly because the accelerators weren’t promoted much, so no-one knew about them. Bundling these tools with the database itself would automatically generate more visibility for them in terms of partners and more importantly customers (why do I need to buy Proclarity? I just want what this free tool on the SQL Server CD gives me…) and would help give the impression that MS is offering a much more rounded overall BI solution incorporating client as well as server components.

So where does this leave the partners? Well, there will always be a market for more sophisticated client tools than the ones that MS offers, and this approach by MS of offering more and more basic functionality spurs them on to improve their own offerings and make more of an effort selling them. Despite the fact that customers would prefer to deal with only one vendor when buying a BI solution (Marco Russo’s comments here are pretty typical, I think) the partner/third party client tool model is unlikely to go away. MS just likes to keep them on their toes, that’s all…

MDX ‘Automated Unit Testing’

My colleague Jon Axon sent round a very interesting mail at work this morning, detailing a new method he’s using for testing during cube development. Rather than try to paraphrase it, here’s what he sent:

When I’m modelling cubes, I generally tend to create a basic model which produces correct figures, and then tweak it for performance e.g. refine calculations, play around with dimension settings; this is similar to code refactoring and an important part of this is constant repetition of unit tests to ensure you don’t alter the behaviour of the cube as you improve performance i.e. the numbers stay the same. You could always hack some ADOMD code to run the same query on your new cube iteration and on an established baseline cube and compare cellsets, but this is a pain. However thanks to a couple of MDX tricks you can actually go a surprisingly long way to producing a generic framework for this purely within an MDX query, as shown below.

The gist of it is that you produce a fixed baseline cube with correct behaviour, and then copy-and-paste this within the same database to create a version of the cube for refining; only the latter is subequently altered. The two cubes don’t have to have identical structure (e.g. identical dimensionality), however you can only compare values of cells whose context is defined by identically named members in both cubes (e.g. you can’t compare measure M in one cube to measure N in another). The template is as follows; only the sections marked as USER DEFINED need to be changed i.e. a couple of set/member definitions at the top, and the cube name at the bottom of the query (unfortunately MDX syntax doesn’t allow us to group these items into a single location). Think of the user-defined section as parameters for a procedure, these are:

TestDomain: a set expression defining the subcube (minus measures) whose values you want to compare across your two target cubes.

TestMeasures

: the set of measures to be compared within the context of the subcube defined above.

TargetCube: a string specifying the name of the cube (within the same database as the cube in the FROM clause) against which the comparison is to be made.

DisplayDetails

: a string "Y" or "N" indicating whether executing the query should display a detailed report i.e. the results of the comparison on a cell-by-cell basis, or whether an "executive summary" should be given i.e. a single cell indicating complete success or at least one failure (the first such failure will be indicated).

Here is an example:

 

WITH

/* USER DEFINED SECTION */

SET TestDomain asNonEmptyCrossjoin(Except(DimA.Members, {DimA.DefaultMember}),

Except(DimB.Members, {DimB.DefaultMember}),

Except(DimC.Members, {DimC.DefaultMember}))’

SET TestMeasures as ‘{BadCalc}’

MEMBER Measures.TargetCube as"TestCubeDeltaA"

MEMBER Measures.DisplayDetails as"Y"

/* END USER DEFINED SECTION */

SET FinalMeasures asExcept(TestMeasures, {TargetCube, CurrentTuple, Indicator, Result, DisplayDetails})’

MEMBER Measures.CurrentTuple as")"

CELL CALCULATION Calc for ‘({Measures.CurrentTuple})’ asIif(CalculationCurrentPass() = 129, "(", "") +
Iif(Dimensions(129 – CalculationCurrentPass()) IS Measures, "",
Dimensions(129 – CalculationCurrentPass()).CurrentMember.UniqueName +
Iif(CalculationCurrentPass() = (130 – Dimensions.Count), "", ",")) +
CStr(
CalculationPassValue(Measures.CurrentTuple, -1, RELATIVE))’,
CALCULATION_PASS_NUMBER = 129, CALCULATION_PASS_DEPTH = 128,
CONDITION = ‘
CalculationCurrentPass() > (129 – Dimensions.Count)’

MEMBER Measures.Indicator asIif(Len(Generate(FinalMeasures,
Iif(FinalMeasures.Current = LookupCube(CStr(TargetCube), "{{" + CStr(CurrentTuple) + "} * {" +
TupleToStr(FinalMeasures.Current) + "}}.Item(0)"), "", "x"))) > 0,
"MISMATCH!", "OK")’

MEMBER Measures.Result asIif(Count(Filter(TestDomain, CStr(Indicator) <> "OK") as failures) = 0, "All OK", "First failure: " + TupleToStr(failures.Item(0)))’

SELECT

StrToSet(Iif(UCase(CStr(DisplayDetails)) = "Y", "Union(FinalMeasures, {Indicator, Result})", "{Result}")) on 0,
StrToSet(Iif(UCase(CStr(DisplayDetails)) = "Y", "TestDomain", "{TestDomain.Item(0).Item(0)}")) on 1

FROM

/* USER DEFINED SECTION */
TestCube
/* END USER DEFINED SECTION */

One interesting "geeky" point about this aside from any useful purpose it may have is that it demonstrates a bona fide situation in which Chris’ generic "currentmember from every dimension" cell calculation (see his blog) is necessary to work around MDX limitations; the semantics of LookupCube() are such that to my knowledge there is no other way of accomplishing this generically.

Unfortunately due to both the cell calculation and the LookupCube(), performance can decrease spectacularly as the subcube size increases (particularly when DisplayDetails is set to "Y"), so try to restrict use to relativey small cellsets (as one would expect of unit tests) rather than entire cubes at once. I’ve made no particular effort to tune any of this template so you may well be able to improve it dramatically.

You can of course also accomplish a lot of this with less performance overhead by merging two such target cubes into a single virtual cube and comparing measures directly, but this imposes further restrictions e.g. only one custom rollup formula can be in operation at a time on a specific dimension in a virtual cube, whereas by using LookupCube() across two base cubes you can employ separate rollups on each (in fact this is exactly the situation for which I originally came up with this).

NB/ if you get cells reporting COM errors then this is typically due to you requesting something in the LookupCube() that can’t be found; check your TestDomain and TestMeasures definitions to ensure they are common to both cubes.

MDX Webcast

I’ve just come across a new webcast available on Technet on ‘Common MDX Mistakes and Solutions in AS2K’, available here (for some reason the sound on the webcast wasn’t working when I viewed it, but you can download the transcript separately). It has got some really good content in it – I certainly learned a few things. Highlights include more details on how the mysterious Cache Policy connection string property works, the good news that an MDX Debugger will be available for download soon for AS2K (I’ve seen an early version and it is pretty useful), and the fact that some VBA functions are now supported natively in MDX so any queries which use them will perform better.