Various Interesting Blog Entries

First of all, via Jamie Thomson, there’s a new white paper out detailing all the server properties for AS2005:
There’s a lot of good information on configuration here.
Secondly, from Teo Lachev, news of an incredible U-turn in terms of best-practice on designing cubes:
So we’re no longer meant to be building one large cube with multiple measure groups, but to go back to a more AS2K-like multi-cube approach and then glue them together with linked measure groups? OK…
–See the last update at the bottom of this post… 
Thirdly, Mark Garner points out that there are some performance costs associated with using role-playing dimensions:
I hadn’t realised they had no aggregations built for them although now that I’ve looked, it is mentioned in the Project REAL documentation; it’s a big reason not to use them. To me, the main benefit of using role-playing dimensions is to have better manageability – the reduced processing time is a good thing too, but slightly less important and I’d sacrifice some of this benefit to have aggregations.
Finally, although I mentioned this in passing yesterday and the important information isn’t out in the public domain yet, Mark Hill has some important news regarding the fact that you supposedly don’t need to set the data slice on your partitions any more:
UPDATE: Mark Hill has got all the details about the partitioning problem:
This tallies with what I was seeing: I had created hundreds of very small partitions because I knew the users were going to be running very finely sliced queries; unfortunately this must have worked against me: my partitions must have been too small and large numbers of them ended up being scanned when I ran a query.
UPDATE: Akshai Mirchandani has clarified the situation regarding role-playing dimensions and aggregations; see his comment on Mark’s blog posting here:
It was a bug in the Project REAL docs and you can build aggregations for role-playing dimensions.

Breaking up large dimensions

One clever trick I learned on the newsgroup a few years ago was from someone called dxd, who wrote it up in this post and others in the same thread:
It describes how to break up a single, large dimension that you need in the cube but which users don’t themselves want to view most of the time (typically this would be a degenerate/fact dimension). In the AS2K world this was useful for getting multi-select to work with distinct count calculations; in AS2005, of course, distinct counts already work with multi-select but I recently found a new application for this technique which I thought I’d share.
I was doing a PoC in a scenario which was similar to the following: imagine a data warehouse which contains data recording purchases in a supermarket with two fact tables. The first fact table contains data on the whole transaction, with a transaction id as the primary key and other dimensions like Customer and Store and a measure recording the value of the whole transaction; the second contains each purchase in the transaction, has all the same dimensions as the first fact table but also includes a Product dimension. The users wanted to run queries like ‘Show me the total value of all transactions which contain Product X’, so it was clearly a distinct sum problem and needed a many-to-many relationship between the Product dimension and the first fact table with the second fact table as the intermediate measure group.
Unfortunately, the only way to be sure of this working properly was to link the two fact tables together using the transaction id – but there were hundreds of millions of transactions, so building a MOLAP dimension was out of the question and I wasn’t sure that a ROLAP dimension would perform well enough. Then I remembered the approach in the newsgroup post above and realised that I could break up the transaction dimension into three identical dimensions of 999 members each. It’s quite easy to visualise how this works. Imagine you have a transaction with the following id:
You could express this as three different dimensions with keys of 123, 456 and 789. And of course since each of these three dimensions was identical, I only needed to build it once and could use role-playing dimensions for the other two. I added them to the cube and made them invisible, added them to both the fact tables and bingo – I had the dimensions I needed to make the many-to-many relationship work.
Performance resolving the many-to-many relationship seemed very good when I looked at the queries I ran in Profiler. Unfortunately I ran into the problem that Mark Hill talks about here:
…and overall performance of the cube wasn’t great (I assumed I’d messed up my partition definitions), but if I had used a ROLAP transaction dimension instead I’m pretty sure that the cube would have been unusable.
Thinking some more about other applications, I wonder if this could be used to work around the problems that are becoming evident with drillthrough in AS2005? See
I think this deserves some further investigation… 

What Panorama Did Next (Part 72)

I’ve been quite interested to watch what Panorama have been up to since the Proclarity acquisition, as I’m sure all you Panorama customers out there have been. Two new press releases have caught my eye, firstly:
It’s not clear to me exactly what’s being announced here. Are they talking about being able to use the new features in Excel 2007 pivot tables etc for querying BW directly, or are they building an AS2005 cube somewhere in there in between? Or are they using their own Excel addin to query BW and not using the new Excel pivot tables at all?
Secondly there’s this:
Integration with Google spreadsheets? Hmm, might be useful if Google spreadsheets ever come out of beta. How long have Google Groups been out? A good few years and I see it’s still supposedly in beta. I can’t see anyone wanting to buy or use this functionality for a while, so why build and announce it? Maybe by flirting with Google they’re trying to send MS a message…

Last Night’s BI Event

I just wanted to say thanks to everyone who turned up to last night’s BI evening at Microsoft UK, and that I hope you all enjoyed it as much as I did. All the stars of the UK MS BI world were out – it was a veritable Royal Variety Show of BI – and I can see that Jamie Thomson has already managed to blog about it:
Thanks are due to Tony Rogerson for organising the whole thing, and my co-presenters Mark Hill and Simon Sabin.
I particularly enjoyed Mark’s talk about building multi-terabyte cubes and picked up some good tips for performance tuning from him. The slides from all three presentations should be up on soon so rather than paraphrase what he had to say I’ll be able to point you to the source soon. Hopefully he’ll start blogging regularly now too.
With a bit of luck we’ll have a follow-up event before Xmas. As I said last night, if you’d like to present then please get in touch…

VSTS4DB and Analysis Services

I would imagine that most people who read this blog also read Jamie Thomson’s SSIS blog too, but just in case you don’t I thought I’d highlight his efforts to get some Analysis Services-related functionality into Visual Studio Team System for Databases:
Here’s the blog entry on Richard Waymire’s blog asking for feedback:
…the original msdn forums thread:
…and the place to submit feedback and vote on these ideas:

Optimising GENERATE() type operations

I need to get back to answering more questions on newsgroups – it’s the best way of learning, or at least remembering stuff you’ve learnt in the past and since forgotten. Take, for instance, the following thread I was involved with today:
It reminded me of some very similar queries I worked on a few years ago, and although the example in the thread above is on AS2K the techniques involved are still relevant on AS2005. Take the following Adventure Works query, which is an approximation of the one in the thread:




NONEMPTY([Customer].[Customer Geography].[Full Name].MEMBERS, [Measures].[Internet Sales Amount])


NONEMPTY([Customer].[Customer Geography].CURRENTMEMBER * [Date].[Date].[Date].MEMBERS, [Measures].[Internet Sales Amount])




[Measures].[Internet Sales Amount] ON 0,



[Adventure Works]


What we’re doing here is finding the last date that each customer bought something. Using the TAIL function within a GENERATE might be the obvious thing to do here, but in fact it isn’t the most efficient way of solving the problem: on my machine, with a warm cache, it runs in 16 seconds whereas the query below which does the same thing only takes 6 seconds:





[Customer].[Customer Geography].[Full Name].MEMBERS

* [Date].[Date].[Date].MEMBERS

, [Measures].[Internet Sales Amount])





SELECT [Measures].[Internet Sales Amount] ON 0,



[Adventure Works]

What I’m doing differently here is rather than iterating through each Customer finding the set of dates when each Customer bought something and then finding the last one, I’m saying give me a set of tuples containing all Customers and the Dates they bought stuff on and then using a FILTER to go through and find the last Date for each Customer by checking to see if the Customer mentioned in the current tuple is the same as the Customer in the next tuple in the set – if it isn’t, then we’ve got the last Date a Customer bought something. Obviously operations like this within a GENERATE are something to be avoided if you can.

Can I run SQL against an Analysis Services cube?

I take what is probably an unhealthy interest in the statistics that MSN Spaces generates about visits to this blog, and in particular the searches that people run on Google which land them here. Over the last few months I’ve noticed that the someone has been searching on the phrase "Can I run SQL against an AS cube?" on a fairly regular basis; since I mentioned the fact that you can in a post a while ago, I assume that’s why they always come here, but I thought it would be good to answer the question in more depth. Never let it be said that I’m not responsive to the needs of my audience – although I’ll draw the line at pandering to the many people who seem to be looking for pictures of "hot bi action"…
In fact, you’ve always been able to run SQL against Analysis Services right back to the days of OLAP Services. Traditionally this was only useful when you wanted to create local cubes from server cubes and the subset of SQL supported by AS was so limited you wouldn’t want to use it for anything else; the fact that this stuff isn’t documented anywhere, possibly intentionally, didn’t help. However, when I started to do some research into AS2005 SQL I was pleasantly surprised at how much I could do. My starting point was to look at some of the SQL queries that are generated when AS2005 creates local cubes. Take the following MDX statement which uses the CREATE GLOBAL CUBE syntax to create a local cube from Adventure Works:
STORAGE ‘c:\MyCube.cub’
FROM [Adventure Works]
(MEASURE [Adventure Works].[Internet Sales Amount],DIMENSION [Adventure Works].[Geography],DIMENSION [Adventure Works].[Product])
If you run a profiler trace while this executes, you’ll see a whole bunch of SQL statements are generated in the background to get the data to populate the local cube. Here are two representative examples:
  KEY ( [Adventure Works].[$Product].[End Date],0 )
  AS [oduct0_0], NAME ( [Adventure Works].[$Product].[End Date] )
  AS [oduct0_1], MemberValue ( [Adventure Works].[$Product].[End Date] )
  AS [oduct0_2]
   FROM [Adventure Works].[$Product]
SELECT  AGGREGATE ( [Adventure Works].[Internet Sales].[Internet Sales Amount] )
  AS [Sales0_0], KEY ( [Adventure Works].[$Product].[Product],0 )
  AS [oduct1_0]
   FROM [Adventure Works].[Internet Sales]
   [Adventure Works].[$Product]
  GROUP BY [oduct1_0]
The first thing to notice is that the ‘tables’ we’re querying are either dimensions or measure groups. A dimension ‘table’ has the naming convention [CubeName].[$DimensionName] (note the dollar sign) and a measure group ‘table’ has the naming convention [CubeName].[MeasureGroupName]. We can obviously do joins between them using the NATURAL JOIN syntax; we can also do GROUP BYs and use functions like AGGREGATE (which I assume aggregates the measure value by its cube aggregation function), KEY, NAME and MEMBERVALUE (which as far as I can see allow you to retrieve the key, name and membervalue properties associated with a dimension attribute). My memory might not be entirely accurate on this but I’m fairly sure that none of the above could be done with AS2000 SQL. You can also do WHERE clause filtering too, but it looks like you can only AND conditions and not OR them, so

SELECT AGGREGATE ( [Adventure Works].[Internet Sales].[Internet Sales Amount] )AS [Sales0_0], KEY ( [Adventure Works].[$Product].[Product],0 ) AS [oduct1_0], [Adventure Works].[$Product].[Product],
[Adventure Works].[$Date].[Calendar Year]
FROM [Adventure Works].[Internet Sales]
[Adventure Works].[$Product]
[Adventure Works].[$Date]
WHERE [Adventure Works].[$Product].[Product] = ‘Mountain-100 Black, 48’
AND [Adventure Works].[$Date].[Calendar Year]=’CY 2002′
GROUP BY [oduct1_0], [Adventure Works].[$Date].[Calendar Year]


runs, whereas
SELECT AGGREGATE ( [Adventure Works].[Internet Sales].[Internet Sales Amount] )AS [Sales0_0], KEY ( [Adventure Works].[$Product].[Product],0 ) AS [oduct1_0], [Adventure Works].[$Product].[Product],
[Adventure Works].[$Date].[Calendar Year]
FROM [Adventure Works].[Internet Sales]
[Adventure Works].[$Product]
[Adventure Works].[$Date]
WHERE [Adventure Works].[$Product].[Product] = ‘Mountain-100 Black, 48’
OR [Adventure Works].[$Date].[Calendar Year]=’CY 2002′
GROUP BY [oduct1_0], [Adventure Works].[$Date].[Calendar Year]
produces an error, which limits its usefulness. Nor can I get any query which uses the COUNT function to work, for example:

FROM [Adventure Works].[Internet Sales]
WHERE [Adventure Works].[$Product].[Product] = ‘Mountain-100 Black, 48’
AND [Adventure Works].[$Date].[Calendar Year]=’CY 2002′

gives an error. So while we’ve got the potential to do some useful things here, it isn’t exactly the most useful implementation of SQL I’ve ever seen. There are some other pointers to other functions that are supported in the Analysis Services 2005 cartridge, found in
C:\Program Files\Microsoft SQL Server\MSSQL.5\OLAP\bin\Cartridges\as90.xsl
on my machine. This is the file that contains the instructions for AS on how to generate the SQL used for processing cubes, and there are what looks like several other functions mentioned in here that could be worth looking at.
But do we want to use SQL to query Analysis Services anyway? I’ve talked about this before, here:
I don’t think so, and it seems to me that although there are some things that are easier to express in SQL than MDX the acutal subset of SQL that is implemented in here is crippled in some important respects. MDX is so much better suited for BI queries and although I know a lot of people struggle with it at first, it’s definitely worth the pain in the end. 

Project REAL Code and Docs

Finally, the complete Project REAL is available for download here:
From the download page, the contents are:
The kit contains:

1. A set of instructions for setting up the environment
2. Guidance on how to explore the implementation
3. A sample relational data warehouse database (a subset of the Project REAL data warehouse)
4. A sample source database (from which we pull incremental updates)
5. SSIS packages that implement the ETL operations
6. An SSAS cube definition and scripts for processing the cube from the sample warehouse
7. Sample SSRS reports
8. Sample data mining models for predicting out-of-stock conditions in stores
9. Sample client views in briefing books for the Proclarity and Panorama BI front-end tools


Panorama Industry-Specific Solutions

A lot of people, me included, have been wondering what the future holds for Panorama after the Proclarity acquistion. Well it looks like Microsoft have tried to make amends in the form of a partnership to develop industry-specific BI applications:
There’ll be solutions for retail, credit management, manufacturing and other areas. Quite apart from the fact that the new apps will be nice to have, I’m glad that Panorama hasn’t been left to sink – it has some good products and the diversity of the MS BI client tool ecosystem has always been one of its strengths in my opinion, so anything that Microsoft can do to maintain that diversity should be welcomed.