Crossjoin function now takes more than two parameters

Crossjoin is a function which, for obvious reasons, is dear to my heart. It’s been at least a year since I last found a feature in AS2005 MDX that I wasn’t aware of, but the other day I found out that the Crossjoin function can now take more than two sets as parameters, much in the same way that NonEmptyCrossjoin can. Here’s an example:

select
measures.[internet sales amount] on 0,
crossjoin(
[Date].[Day Name].[Day Name].members,
[Date].[Day of Week].[Day of Week].members,
[Product].[Color].[Color].members
)
on 1
from [Adventure Works]

Not exactly revolutionary (and anyway I use the * operator instead nowadays) but fun to note.

Analysis Services 2005 Performance Guide

The AS2005 Performance Guide is available for download here:

http://download.microsoft.com/download/8/5/e/85eea4fa-b3bb-4426-97d0-7f7151b2011c/SSAS2005PerfGuide.doc

Thanks to everyone who mailed to let me know! Yet another important document to digest…

Parallelism and MDX Queries

A lot of my work involves tuning MDX queries, and over the last few months I’ve come across several situations where calculation-heavy queries take a long time to execute but for the most part only seem to use one CPU on a multi-processor server. When I first saw this I made some enquiries about what was going on and found that this was indeed a feature of the AS calculation engine: only one query can be run per session at any one time, and for that query the work of the formula engine (unlike the storage engine) can’t be parallelised. Hohum.

However, then I realised that there are some scenarios where you can inject parallelism into a query. Consider the following query in Adventure Works:

select [Measures].[Amount] on 0,
bottomcount(
{[Scenario].[Scenario].&[1], [Scenario].[Scenario].&[2]}
*
[Account].[Account].[Account].members
*
[Date].[Date].[Date].members
*
[Department].[Departments].[Department Level 02].members
, 10,[Measures].[Amount])
on 1
from
[Adventure Works]

On my laptop, on a warm cache, it consistently executes in just under 40 seconds; my laptop is dual core and for the vast majority of the query time I see just 50% CPU usage. This is consistent with what I found out about the formula engine since the Account dimension has unary operators . However, if I open two different connections in SQL MS and run both of the following queries simultaneously:

select [Measures].[Amount] on 0,
bottomcount(
{[Scenario].[Scenario].&[1]}
*
[Account].[Account].[Account].members
*
[Date].[Date].[Date].members
*
[Department].[Departments].[Department Level 02].members
, 10,[Measures].[Amount])
on 1
from
[Adventure Works]

select [Measures].[Amount] on 0,
bottomcount(
{[Scenario].[Scenario].&[2]}
*
[Account].[Account].[Account].members
*
[Date].[Date].[Date].members
*
[Department].[Departments].[Department Level 02].members
, 10,[Measures].[Amount])
on 1
from
[Adventure Works]

…both queries finish in 20 seconds and I see 100% CPU usage.

What I then did was create a stored procedure with a function called ParallelUnion which, when passed to MDX set operations like the Bottomcounts in the queries above, opened two separate connections, executed the operations in parallel then Unioned the results and returned a set. Here’s an example of how it worked:

select measures.amount
on 0,
bottomcount(
ASSP.ParallelUnion(
"bottomcount(
{[Scenario].[Scenario].&[1]}
*
[Account].[Account].[Account].members
*
[Date].[Date].[Date].members
*
[Department].[Departments].[Department Level 02].members
, 10,[Measures].[Amount])"
,
"bottomcount(
{[Scenario].[Scenario].&[2]}
*
[Account].[Account].[Account].members
*
[Date].[Date].[Date].members
*
[Department].[Departments].[Department Level 02].members
, 10,[Measures].[Amount])"
), 10, [Measures].[Amount])
on 1
from
[Adventure Works]

To be clear, in this query the sproc executes the two Bottomcount set expressions passed in as strings on two different connections and sessions, and returns a unioned set containing the bottom 10 tuples for both [Scenario].[Scenario].&[1] and [Scenario].[Scenario].&[2], 20 tuples in all. Given that the overall bottom 10 tuples must be present in the 20 tuples the sproc returns I still need to do another Bottomcount but it’s relatively inexpensive. The query above executed in 17 seconds, less than half the time of the original query, and shows 100% CPU usage on my laptop.

I also created a ParallelGenerate function which worked like the existing Generate function in MDX and which would allow for more flexibility in splitting up queries in order to make use of more than two processors. Here’s an example:

select measures.amount
on 0,
bottomcount(
ASSP.ParallelGenerate(
{[Scenario].[Scenario].&[1],[Scenario].[Scenario].&[2] },
"bottomcount(
{[Scenario].[Scenario].currentmember}
*
[Account].[Account].[Account].members
*
[Date].[Date].[Date].members
*
[Department].[Departments].[Department Level 02].members
, 10,[Measures].[Amount])"
), 10, [Measures].[Amount])
on 1
from
[Adventure Works]

My next problem was that although I’d proved what was possible, my code wasn’t exactly robust. Luckily at this point the guys from the Analysis Services Stored Procedure Project helped me out: both Jon Axon and Mosha provided some excellent feedback and Greg Galloway took the time to rewrite it all. I’m extremely grateful to all three for their help, esppecially Greg for all the effort he put in. You can download the code (alond with lots of other useful sprocs) as part of the Analysis Services Stored Procedure Project here:
http://www.codeplex.com/Wiki/View.aspx?ProjectName=ASStoredProcedures
(Note you need to download the latest source code and compile it; the last compiled dll release doesn’t contain these functions)

Now to mention the drawbacks to all this (thanks again are due to Mosha for pointing several of these out):

It’s only useful when the performance gained through parallelism is more than the overhead of calling the sproc, opening the two connections and running the queries. I very much doubt, for example, it would ever be useful to use these sprocs inside a calculated member; they’re only really going to be useful when you’re writing queries for something like Reporting Services and you can use them in set definitions which are only going to be evaluated once.
If you need the sproc to open its connections with the same security as the account running the main query, you need to use ImpersonateCurrentUser mode rather than the default ImpersonateServiceAccount. This, though, carries an extra performance hit. There might be a way around this using the Username function and the EffectiveUserName connection string property, but it’s not been tested and isn’t in the code.
Since the sproc opens up new sessions to run its queries, all session state (things such as VisualTotals, session calculated members and session sets) are lost and this can lead to unexpected and incorrect results.
Similarly, any query context – things like sets in the Where clause, query-scoped calculations etc – is lost in the call to the sproc, again leading to incorrect results, although to a certain extent you can work around this.

That said, I still think these functions can make a massive difference to query performance in some circumstances. I’d be very interested to hear from anyone who does use them in their queries to know what the impact is.

Five Things You Probably Didn’t Know About Me

I thought I’d avoided this meme, but since Darren Gosbell tagged me and since Mosha (who was also tagged) has already joined in, here are my five things:

Like a lot of Brits of my generation my first contact with computers was a BBC Model B that my father bought in the early 80s. Once he’d got bored of typing in programs from BEEBUG magazine that did things like play the Star Wars theme tune I got my hands on it, and in between long sessions playing ‘Elite‘ decided it would be cool if I too could write games. Unfortunately my coding skills have not progressed much since those days.
I won the Computer Studies prize at school (I had moved onto Turbo Pascal 5.5 by then, on Sanyo machines which had <gasp> a floppy disk drive and <double gasp> 256Kb of RAM) when I was 16 but only because the two other guys in the class who were better than me told the teacher that they were too cool to accept it. I, however, had no such scruples and spent the £20 book token on a French/English dictionary.
My first degree was in Modern History, although later on I picked up one of those two-year part-time conversion course Computer Science degrees. After winning the Computer Studies prize I didn’t go near a computer for several years; I was much better at arts subjects and so I concentrated on that side of things. However a year after leaving university, after having spent a few months for a guy who wanted to reform the UK’s National Lottery, I decided to find a proper job. I didn’t fancy law, accountancy or any of the other things that History graduates seemed to end up doing so I thought I’d try to get on an IT graduate training scheme instead. I got a job at IMS Health in London where one of my first projects was evaluating a beta version of the new OLAP server from Microsoft that had been recommended by Nigel Pendse…
While in that job I met my wife, Helen. She was headhunted to work for a large pharmaceuticals company in Switzerland and we decided to move out there together. I started looking for jobs out there and after one of my colleagues suggested sending my CV on spec to Microsoft I got a job in Microsoft Consulting in Zurich, where I stayed for three years. After the birth of our first daughter Natasha (joined last summer by Amelia), Helen and I moved back to the UK and I went back to IMS because they were doing some cool stuff with Yukon.
After the project I was working on got canned, I decided to leave and set up on my own as a consultant. But you knew that already… so what I shall I write for my fifth point? How about that I’m incredibly absent-minded about everything except (as my wife pointed out when she was telling me off recently because I’d taken out buildings insurance on our house twice) work. My worst experience was at Copenhagen airport when I went through security, sat down and started reading and didn’t notice that everyone else had got on the flight. By the time I did notice the plane was already on the runway about to take off, but had to turn back because my bags were on board. It couldn’t go back to the same gate though and had to go to a different one which was at the wrong height, so I had to climb halfway up the side of it to get on before walking the full length of the plane to my seat while the captain was very rude about me over the intercom.

Threading Issue

Great post from Wayne Robertson of PSS on an important threading issue on high-end AS2005 installations:
http://blogs.msdn.com/psssql/archive/2007/01/16/processing-appears-to-stall-or-become-sluggish-on-multi-processor-machines-running-analysis-services-2005.aspx

Although the way the issue manifests itself during processing is important to know about, more interesting to me are his comments that this could affect query performance too. I’ll have to test this out with some of my customers I think.

Microsoft/Teradata BI Announcement

It seems Microsoft and the newly-independent Teradata have announced they’re going to ‘optimise interoperability’ between their platforms, with Analysis Services integration arriving in Q1 this year. Here’s the full press release:
http://www.microsoft.com/presspass/press/2007/jan07/01-15MSTeradataPR.mspx

This has been in the pipeline for absolutely ages – I remember hearing something about an alliance like this about four years ago. In fact Teradata is already explicitly supported as a data source for AS, so I wonder what it is they’re actually going to announce….?

Microsoft BI Conference site is live

See

http://www.microsoftbiconference.com/default.aspx

…for more information the conference including how to submit abstracts for presentations and nominations for awards. Hmm, now I need to think of a presentation I could give. Perhaps something like ‘Performance tuning MDX Queries’; anyone got any suggestions?

Book Review: Microsoft SQL Server 2005 Analysis Services, by Edward Melomed, Irina Gorbach, Alexander Berger, Py Bateman

How can I review a book properly when I’ve only had it a day or so? Obviously I can’t, but I can tell you the most important thing you need to know and that is if you’re at all serious about Analysis Services you have to buy this book. Quite simply it contains so much useful information which is available nowhere else I don’t know where to start: on the overall architecture, on memory management, on query execution, on caching, and so on. The authors are all from the AS dev team so they know what they’re talking about and are able to go into great detail. There’s also a lot of information here which is available in other places, on general cube design, MDX and AS programming for instance, and it covers them very well too; in fact it’s very well-written in general with plenty of code snippets and illustrations. I’ve got a lot of reading and learning to do over the next few weeks…

You can buy it from Amazon UK here.

Reporting Services and Essbase White Paper

Via Brian Welcker, here’s a new paper on using Reporting Services with Hyperion Essbase:
http://download.microsoft.com/download/4/7/a/47a548b9-249e-484c-abd7-29f31282b04d/UsingSSRSandESSbase.doc

When I first heard that this was going to be possible I wondered whether there was any kind of hidden agenda here, but I made some enquiries and was assured that there wasn’t – apparently Hyperion customers had been asking for it. Pity Reporting Services is such a pain to use with multidimensional data sources…

Powershell script for backing up AS databases

Just seen on SQL Server Central, a useful-looking Powershell script that connects to an AS instance and back up all the databases on there:
http://www.sqlservercentral.com/scripts/viewscript.asp?scriptid=1850

Chris Webb's BI Blog

Crossjoin function now takes more than two parameters

Like this:

Analysis Services 2005 Performance Guide

Like this:

Parallelism and MDX Queries

Like this:

Five Things You Probably Didn’t Know About Me

Like this:

Threading Issue

Like this:

Microsoft/Teradata BI Announcement

Like this:

Microsoft BI Conference site is live

Like this:

Book Review: Microsoft SQL Server 2005 Analysis Services, by Edward Melomed, Irina Gorbach, Alexander Berger, Py Bateman

Like this:

Reporting Services and Essbase White Paper

Like this:

Powershell script for backing up AS databases

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: