There will be a SQL2005 SP3!

A triumph for people power! Microsoft have announced that there will be a SP3 for SQL2005 and it will be delivered this calendar year. More details here:
There’s also been yet another cumulative update released, CU7:
UPDATE: there’s a blog entry from the guys in PSS on how CU7 changes what happens when backups and processing overlap, here:

Announcing Intelligencia Query for Reporting Services

Can I be utterly shameless and use my blog to promote products I have a commercial interest in? Of course I can! But first, let me tell you a story…

Last December I spent a lot of time thinking about the problems of building Reporting Services reports on top of Analysis Services. To be honest, this issue has been a constant irritation for me ever since I first saw Reporting Services about five or six years ago; this blog entry, one of the most popular I’ve ever posted, sums up my thoughts on the matter:!1pi7ETChsJ1un_2s41jm9Iyg!163.entry
and I’ve blogged about specific workarounds and performance issues on a regular basis ever since. But anyway, as I was saying, last December I was working with a customer new to the MS BI stack who had announced their intention of using SSRS and SSAS together in the naive assumption that because these tools were produced by the same company they would work well together, and for the n-hundredth time I found myself having to manage their expectations and teach them the various tips and tricks necessary to get their reports working. I also wrote an article on this topic for SQL Server magazine that came out in December and I expanded the same material into a presentation for the last SQLBits (you can download the slide deck here); it was also around that time that I realised that nothing was going to change in Reporting Services 2008 (see this thread on the MSDN Forum for other people’s reactions to this). All this got me thinking.

The way you’d want SSRS to work is fairly clear: you’d want a decent query builder and the option of using any MDX query you’d care to write, and you’d want SSRS to render the resulting cellset in a matrix-like control automatically. It should work like pretty much like every SSAS client tool on the market only the output should be a SSRS report. The reason you can’t do this is also clear: SSRS needs to know the name and number of columns your query returns in advance, and you can’t do that with MDX queries. So how can you get around this limitation?

Then it hit me. Instead of having to unpivot your MDX query manually so that it’s in the format that SSRS likes, why couldn’t you have a custom data extension that did the hard work for you? So I wrote my first attempt in an afternoon and after a lot more polish it now works really well. Here’s an example of what it does:

Take the following query:

select {[Date].[Calendar Year].&[2003],[Date].[Calendar Year].&[2004]} on 0,
{[Product].[Category].&[4],[Product].[Category].&[1]} on 1
from [Adventure Works]
where([Measures].[Internet Sales Amount])

This returns the following SSRS-unfriendly results:


But when run through my custom data extension the data is returned as follows:

You’ll see that all of the columns have become rows, and there are two columns containing the Value and Formatted_Value properties of the cells. The cool thing is now that you can put a matrix control in your report, map the ‘column’ fields onto the matrix column groups, the ‘row’ fields onto matrix row groups and one of the cell properties onto the data area like this:

And when you run your report, the matrix control recreates the original structure of your MDX query:

This is pretty simplistic but even more complex queries with multiple levels of crossjoining on rows and columns are handled just as easily; it gets around all the limitations with the built-in Analysis Services data source/query builder as well as those of the OLEDB data source. Some of the key advantages are:

  • You can use any MDX query and display the results (see here for some of the problems associated with the built-in Analysis Services data source needing measures on columns)
  • You can parameterise your reports by measures (see here)
  • There’s no messing around with ‘aggregate rows’, all rows are treated the same
  • Because the field names (optionally) do not refer to the hierarchy names used in the query, you can write one report that displays the results of very different queries – the only requirement is that you have the same number of rows and columns in your query. For example you can create one report that displays either Years, Quarters or Months on rows and control which one gets displayed using a parameter. It also means that the report design is much more tolerant of changes to the query during development.
  • There’s full support for parameterisation, something you don’t have with the OLEDB data source.
  • It (optionally) indents member captions based on their level depths, so the captions of members on lower levels appear further to the right of captions of members on higher levels.

I was feeling quite pleased with myself at this point and was about to publish the code up to Codeplex when it occurred to me that I could make some money out of this – after all, I do have a wife, two kids and a mortgage to feed. There was also one big missing feature that I knew I didn’t have the time to implement on my own and that was an MDX query builder: whereas I’d be quite happy to write my own MDX queries, I know most people aren’t that comfortable doing so and need some kind of tool to do it for them. So I got in touch with Andrew Wiles of iT Workplace who I knew had an excellent MDX query control, signed a partnership agreement with him and over the last few months we (well, mostly him really) have worked to integrate his query builder with my custom data extension with BIDS to come up with what we think is a product that anyone using Reporting Services and Analysis Services will want to use, Intelligencia Query for Reporting Services. Yes, it’s something you’re going to have to pay for but in my opinion it’s very modestly priced and more importantly it has been saved my work from the fate of being a cool idea implemented by an occasional C# coder and stuck in perpetual beta on Codeplex, and turned into a commercial-quality product that is safe to install on your servers and which will grow and improve over time.

The main selling point of the whole solution is that it gives you, for the first time with Analysis Services, the ability to build in a query builder exactly the resultset you want to see displayed in your Reporting Services report – it’s truly WYSIWYG. Certainly in a lot of cases you can achieve what you want with the native query builder but as I said, with it you always have to try to imagine the end result you want and write a query that can be pivoted into that end result, and that’s not easy for the majority of people. Combine this with the fact that the Intelligencia query builder allows you to build much more sophisticated queries than the native query builder and that, as mentioned above, the custom data extension solves many of the problems you run into with the built-in Analysis Services and OLEDB data sources, and I think you’ve got something that finally makes it easy to build Reporting Services reports using Analysis Services data.

I think Andrew’s query builder is the equal of anything else on the market in terms of the kind of query it can create. It does all the stuff you’d expect like hierarchical selections, it does stuff that few other tools do such as allow for the creation of asymmetrical sets on an axis, and it has at least one outstanding feature that no-one else has in its ability to let users create MDX calculations with an Excel formula-like interface. It even supports SAP BW as a data source too. Here’s a screenshot:


Anyway, enough of me going on. You can find out more, download an eval version, watch a video and buy some licences here:

One last thing: we want to hear your feedback. This is a version 1.0 and while we’ve got a lot of ideas on new features that could be added, we want to be sure they are the features that our users actually want. So even if you only download the eval please drop me a mail to let me know what you like and what you think is missing. It’s likely we’ll be able to add new features pretty quickly.

illuminate, FAST and yet more completely unfounded (and probably ill-informed) speculation on my part

I was just reading this article on TWDI:

It’s on illuminate and their ‘correlation DBMS’ – not relational, not OLAP, not COP, but a ‘value-based’ system. All very fascinating indeed, although their web site is yet another one of those irritating ones that leaves you with the feeling that some important technical detail and information on where they’re positioning themselves is missing. This entry on their blog:
is a case in point – what is it exactly that illuminate can do that these other platforms can’t? I guess it’s the kind of query that they talk about here:
…which is certainly not the kind of thing that OLAP is good at, or even meant to be good at.

I might have completely the wrong end of the stick, but didn’t Microsoft get its hands on something similar when it acquired FAST last year (blogged here)? Is this it:
? I’m sure I remember reading about how FAST could speed up DW-style queries by loading all the data into an index; I’d be interested to hear from anyone who can set me straight on this. If I’m right, then perhaps some of the comments I made when talking about COP databases last year about how they would fit into the Microsoft BI stack would also be relevant here.

Google App Engine

I see Google have announced their own web application platform:
which includes BigTable as part of it (remember this post?). There’s speculation over whether Microsoft has something similar up its sleeve:

This and the Panorama and Good Data stuff I blogged about over the last few weeks make me quite excited. What will the next generation of OLAP/BI tools be like? Surely it’s a mistake to think of them as hosted versions of what we’ve got today. As I’ve said before, the attribute-based approach of databases like BigTable remind me of Analysis Services dimensions; wouldn’t it be cool just to be able to grab data from a number of these stores and use them as dimensions and fact tables? Maybe through a front-end that was something like Lotus Improv (thanks to Andew Wiles for directing me to this in a recent conversation) on the web? But with an XMLA interface too? And since in the cloud hardware scale-out will presumably be just a matter of paying a bit more cash, you’d want an engine that could handle that transparently in the way I understand something like Teradata does? Ahh, if only I had a couple of million USD$s of VC funding to waste I’d hire Mosha and set him to work on this…

Can your sum be a subtraction? Or can you avoid it altogether?

Quite often you’ll find yourself writing calculations that need to sum up large sets; in fact, they might be summing up all of the members on a level apart from one or two. In that situation it’s worth using the structure of your dimension to your advantage. Take the following query, which sums up the Internet Sales of all customers apart from one and then does a TopCount based on this:

, {[Customer].[Customer].&[20075]})
, [Measures].[Internet Sales Amount])

FROM [Adventure Works]

On my laptop using AS2005 the query runs in 2 minutes 2 seconds on a cold cache (Katmai does no better with this query, incidentally). Yuck. But if we change the calculation around, so that rather than summing the customers we do want we take the sales for all customers and subtract the sales for the customer we don’t want (which is fine if the measure we’re looking at has an aggregate function of Sum) then we can do the following:

[Measures].[Internet Sales Amount] –
([Customer].[Customer].&[20075], [Measures].[Internet Sales Amount])

FROM [Adventure Works]

…which executes in 1 second on a cold cache. Taking this further, if you have a set like this you’re frequently summing up in calculations it might be a good idea to create a new attribute on your dimension to avoid having to do any work in MDX at all. In the Adventure Works example above, maybe Customer 20075 is the Sultan of Brunei and he ordered 5000 new bikes for all his friends – which means that including him in your calculations would skew the results. What you could do is create a new boolean attribute on Customer called something like ‘Exclude from Calculations’, which would then mean you could rewrite the query like this:

SELECT [Measures].[Internet Sales Amount] ON 0,
, 10, [Measures].[Internet Sales Amount]) ON 1
FROM [Adventure Works]
WHERE ([Customer].[ExcludeFromCalculations].&[False])

and probably get even better performance.