So, what is the UDM?

The other week I went to an evening event at Microsoft’s UK office in Reading, given by Matt Stephen. It was a general introduction to BI in SQL2005 and as such, attended by people who didn’t know much at all about the new features in AS, RS, IS and so on. All the familiar Powerpoints were shown and much was made of the Unified Dimensional Model as being the best thing since sliced bread. I’m sure almost everyone reading this has seen these presentations, especially the slides where the relational reporting and olap reporting worlds ‘come together’ like two pieces of a jigsaw and the one where Analysis Services is described as a cache on top of your data warehouse. At the end of the session, though, the very first question that was asked was which I think had been on a lot of people’s minds – "What exactly is the UDM?". This reminded me of the first time I saw any presentations on Yukon AS at an airlift in Redmond two-and-a-half years ago: for a while afterwards I was confused over what exactly the UDM was too. And Myles Matheson in a blog entry from a month or so back feels obliged to answer exactly the same question so I suspect this is a common reaction.
 
The answer of course is actually pretty simple. Put simply, the UDM is just the cube in Analysis Services 2005; because it can now model so many more features of a relational data warehouse (eg many-to-many dimensions, role-playing dimensions) the message is that there’s now no reason to run queries directly against your data warehouse at all because you’ll get much better performance and query-flexibility by building a cube and querying that instead. From a technical point of view I have no problems at all with the claims being made here – in my experience AS2005 lives up to its hype as much as any software product can – but I didn’t understand why all this talk of the UDM and the resulting confusion was necessary. Why not just talk about the new capabilities of cubes in AS2005?
 
Then I came up with the following theory. The UDM doesn’t exist as a feature, really, but is more of a marketing concept. Marketing concepts are meant to help sell a product. For AS2005, who are the new customers that Microsoft is trying to target? Probably the same new customers that AS2000 won over, people who hadn’t been customers of other BI companies but who had either been priced out of the market or were trying to hand-code their own BI solutions using a relational database and encountered the usual problems. They’re going to be easier to sell to than someone who already has a big investment in Cognos, Essbase or Oracle. In my experience there’s a vast amount of people out there who are still in this position, but in contrast to the people who picked up on AS2K they’re by nature a bit more cautious and unwilling to leave their relational comfort-zone – they know about OLAP but they’re not sure they want to learn a new technology. This constituency is, in my opinion, who the whole UDM pitch is aimed at: let’s not talk about cubes, because that might frighten you, but let’s talk about the cube as a cache (which is less threatening) and the UDM as something that is the successor of both relational reporting and OLAP reporting.
 
So this is why I think I was confused: I was meant to be confused. Quite a clever strategy to avoid knee-jerk anti-cube prejudice or fear, then, if it works. But does it work? Well, maybe, maybe not. The fact I was confused doesn’t really matter because I’m cube-friendly anyway, but for the confused relational guy his first reaction to hearing about the UDM is to start asking questions to try to clarify the situation. And what I found interesting at Matt Stephen’s presentation was that the second question asked was exactly the same question that I asked when I was trying to understand what the UDM was: since the UDM is a replacement for both OLAP and relational reporting, can you therefore run both SQL and MDX queries against the UDM? The answer is a qualified no, because although AS2005 like AS2K does support querying using a very limited subset of SQL, it is a very limited subset and isn’t practically useful. You have to learn MDX to query your UDM or buy a tool that will generate MDX for you. I suspect that this is the point where many relational guys turn off, having realised that the UDM is the cube and that they’ll still have to learn a completely new, non-SQL technology.
 
This then leads on nicely to the question of whether OLAP is better off shoe-horned into the relational world and queried with SQL, as I understand Oracle have done with what used to be Express, or whether it’s better off as a distinct technology with its own query language as Microsoft have done. I touched on this topic a few months ago here, and as you might have guessed I’m in favour of the Microsoft approach. I don’t blame Microsoft for trying to blur this distinction though, as anything that will get people to look at AS2005 is a good thing in my book. It’s just that I’m not sure that your average BI customer can be hoodwinked in this way for very long, that’s all…
 
 

AS2005 Calculation Engine Performance

I’ve been talking about measure expressions (and Mosha has been plugging the fact you can do the same thing in MDX Scripts) for a while now, but up until now I’ve not got round to running any tests which compare their relative performance. However, I’ve been doing a PoC over the last week or so which has given me just this opportunity and I thought I’d share my results…
 
The scenario I had to implement was as follows: I had a sales fact table which showed the number of Units for a particular SKU of a Product sold per month (plus various other dimensions), and I had a prices fact table which showed the Price of each SKU per Month, and I wanted to be able to analyse sales values. Fairly straightforward so far, I just created a view which joined the two fact tables on SKU and Month and then multiplied Units by Price to get the Value. However, I also had a requirement for the user to be able to pick any given Month and to be able to look at Values for a range of Months to be calculated as if the Price for each SKU was frozen at the Month they had picked – this would allow them to remove the effect of price changes from any changes in sales Value. You can probably see that you might want to do something similar when using exchange rates.
 
Anyway, I originally tried to use measure expressions to implement this ‘fixed price’ requirement but after a lot of thought (and discussion with Jon) came to the conclusion that it wasn’t actually possible. I had one physical Time dimension but this was present in the cube twice as two role-playing dimensions, one linked to the Units measure group and one to the Price measure group, and the fact that these two dimensions weren’t common to both measure groups made the numbers aggregate up incorrectly. Even when I tried linking the Units Time dimension to the Prices measure group and the Prices Time dimension to the Units measure group using many-to-many dimensions, which I thought should work in theory, I still didn’t get the correct numbers. It obviously needs a regular relationship in order to work properly. So I went ahead and used an MDX Script calculation instead and was pleasantly surprised with the results.
 
In order to properly compare the performance between ordinary measures, measures which have measure expressions and measures whose values are calculated in the MDX Script I created a test cube based on a subset of my data. I took six months of sales data which resulted in a fact table of approx 7 million rows, the whole of my prices fact table (approx 6 million rows), and the whole of my Product dimension (170000 SKUs) and built a cube with no aggregations or partitions and four measures: Units, Value (calculated within the fact table and a regular measure), Value calculated using a measure expression of the form [Measures].[Units]*[Measures].[Price], and a regular measure exactly the same as Units but whose value was overwritten in the MDX Script with a calculation (based on Mosha’s recommendations here) similar to:

CALCULATE;
([Measures].[VALUE – SCRIPT],
LEAVES([Product]), LEAVES([Time]))

= [Measures].[VALUE – SCRIPT] * [Measures].[PRICE];

 
I checked that all three Value measures returned the same result and then ran a very big query (26000 rows and 4 columns) to see how they compared, stopping and starting the Analysis Services service before each query to try to reduce the effect of caching. On my fairly ordinary 2-way Windows 2003 box running the June CTP this mega-query ran for 13 seconds for both the ordinary Value measure and for the measure expression version, which I thought was pretty amazing. The MDX Script version ran for 18 seconds which I thought was still very good. Subsequent runs of each query on a warm cache all ran in 9 seconds. Obviously the next step would be to install AS2K on the same box and see how performance was on the same cube when using a calculated member to do this calculation, but I unfortunately I haven’t got the time to do this and in any case I can’t imagine it would perform anywhere near as well.
 
So, what can we take away from this? It looks like the claims for big performance improvements in the AS2005 calculation engine might be justified. Where you can use measure expressions – and in most classic currency exchange rate scenarios you will be able to use them, even though I wasn’t able to – they seem to perform almost as well as if the value was precalculated in the fact table. Of course I didn’t test what impact writeback has on query performance, and this is the other essential factor in most financial applications, but in AS2K you always had to design you cube to account for the possibility that values might have been written back to the cube even if they hadn’t been, and even then performance could be bad. And even if you can’t use measure expressions, it looks like a pure MDX Scripts approach can still perform really well.
 

Patents

I was having a look (for no good reason) at the Free Patents Online site at http://www.freepatentsonline.com/ and while I was there, I remembered the photo on the OLAP Report website (at http://www.olapreport.com/products/MicrosoftOLAP.htm if you have a subscription) of the big pile of cube/nick-nacks, each one representing a patent, that is to be found in Amir Netz’s office. So I thought I’d have a look and see just what the content of these patents was to see if they were interesting. Who better to start with but our man Mosha? Here are the results a search on his surname brings up. Hmm, if you can wade through the reams of legalese there are some insights into how AS2005 might work (assuming that it does work according to the patent) but a lot of it is statin’ the bleedin’ obvious.
 
I can’t mention OLAP patents without the infamous Timeline – read this thread for an idea of what they’re about. Memo to self: remember to patent the wheel at some point over the weekend.

Drillthrough in AS2005

I mentioned this thread from the microsoft.public.sqlserver.olap ng on ‘Limitations of AS2005’ the other day, and it looks like it has pretty much finished. It turned into an interesting discussion on drillthrough in AS2005. Something I hadn’t realised was that in AS2005 any values you wish to see returned by drillthrough must already be defined in the cube – which, when you consider how you use drillthrough in AS2K seems a bit bizarre. However if you read the thread you’ll see what Mosha and Deepak have say on the subject, namely that for fields you only want to access via drillthrough you should create a hidden, ROLAP Fact dimension. It should be hidden because it probably wouldn’t make any sense at all to your users if they saw it, and it should be ROLAP so that it has no impact on the performance of the rest of your cube until it is queried in the drillthrough. Certainly a different way of handling the issue compared to AS2K, maybe not as intuitive and maybe it will take a little bit longer to set up, but to me quite an elegant way of solving the problem. It also fits in well with the thinking that the UDM should model everything in your data warehouse that you want to query on.

AS2005 Processing Architecture

Marco Russo points to this very interesting paper on "Analysis Services 2005 Processing Architecture" by TK Anand, which I hadn’t seen before; he also has a good discussion of how this all relates to slowly changing dimensions. Particularly fascinating for me was the section on push-mode processing – I can think of a whole bunch of interesting applications for this functionality when it comes to creating cubes from non-standard data sources and you don’t want to use SQLIS.
 
If I could make a criticism of such a good white paper it’s that it makes no mention of local cubes. As I noted a while ago here, in AS2005 you can create local cubes using this functionality – indeed, if you run a CREATE GLOBAL CUBE statement to create a local cube and run a Profiler trace, you can see it translated into the equivalent XMLA batch command which defines and then populates the local cube. I’m covering this topic in MDX Solutions 2nd Edition but I can tell there’s a bit more research to do to see exactly how much of what’s in the paper is relevant to local cubes.

Editing an AS2005 database ‘live’

Here’s an interesting bit of functionality I didn’t know existed in the BI Development Studio (following on from Mosha’s reply to my post on this interesting ng thread) – as well as making changes to an AS2005 database in VS in the default ‘project’ mode, where the changes you make are only propagated to the server when you click Deploy, you can also edit the database directly on the server by using ‘online’ mode and get a very similar experience to what you had with Analysis Manager in AS2K. To do this, instead of opening the project file in VS, go to File->Open->Analysis Services Database, type in your server name and select the server. Now when you make your changes they are implemented on the server when you click ‘Save’.

Partitioning in AS2K and AS2005

Great post in the microsoft.public.sqlserver.olap newsgroup from Dave Wickert which sums up how partitioning works in AS2K and how it’s even better in AS2005:
This new automatic setting of the data slice for partitions and improved flexibility of the slicing is really cool.
 

Measure Expressions

I know this blog has been rather light on useful content (as opposed to links) recently, and I’m sorry – it’s not that I’m not learning anything new, I am, but much of it is going into my chapters for MDX Solutions 2E rather than on here. I’ll try to get round to blogging about stuff like the HAVING clause in MDX and the MemberValue() function sometime soon if I can do so without giving away all the juicy information that you’ll want to buy the book for, and if Mosha (or even BOL) don’t beat me to it…
 
But here’s a cool cube design feature that doesn’t cause any conflict of interest for me – measure expessions. The easiest way to explain what they are is to explain one scenario where they’re useful. In AS2K, if you’ve tried to model currency conversion, you’re probably aware that you need to multiply the measure values in your main fact table by the currency rate before any aggregation of the resulting values takes place. You then have two choices for your cube: either precalculate the values in the fact table itself or in a view, which leads to much faster queries but also much bigger cubes, and which means the rates can’t then be altered without reprocessing; or do the currency conversion at runtime using MDX, which generally leads to slower queries but which allows users to change the currency rates dynamically (for example using writeback). Neither of these options are exactly ideal so in AS2005 measure expressions offer a kind of halfway house – they are calculated at query time and yet are much faster than the equivalent MDX, but the price you pay is that they are nowhere near as flexible as calculated members in terms of the calculations you can define.
 
So what does a measure expression look like and where do you set it? In Visual Studio, in the cube tab of the cube editor, you’ll see the MeasureExpression property in the properties pane when you highlight a measure. Click on this and you get a text box where you can enter your measure expression. This has to be of the form
M1 op M2
where M1 is a measure from the same measure group as the measure you’re currently editing, M2 is a measure from a different measure group which shares at least one dimension with the current measure group, and op is either the * or / operator. Here’s an example:
[Measures].[Sales]*[Measures].[Currency Rate]
In previous builds AS has been very picky about having square brackets around the measure names and things like that, so if you’re trying this yourself make sure you copy the above example very closely. There are several other restrictions too, such as the fact that neither of the measures can be distinct count measures, so you can see that your options here are quite limited. That said for most scenarios they should offer all the flexibility you need to do the job.
 
How much faster are they than normal MDX, then? That’s the million dollar question, and I have to say that I don’t know yet. Even if I did I’m sure there would be some clause in the beta licence that stopped me from publishing performance data, and anyway it’s not fair to test performance on a product that hasn’t been released yet. Hopefully it will make a big enough improvement to make the headaches that accompanied this kind of requrement in AS2K a thing of the past. If anyone else out there has used this functionality and would care to share their experiences I’d be interested in hearing from them…
 
Measure expressions also allow you to do some really cool things with many-to-many dimensions too, but I’m going to save that for a future blog entry; this one’s already long enough.
 
UPDATE: No need to blog on measure expressions and m2m dimensions, actually, because Richard Tkachuk covers the subject in his white paper on m2m dimensions here.
 

Attributes and Relationships

I was just thinking today that I ought to cannibalise the presentation I gave at PASS Europe for a blog posting, and now I find that Steve McHugh has beaten me to it:
http://stevemchugh.blogspot.com/2005/05/sql-pass-2005-new-cube-design-features.html
http://stevemchugh.blogspot.com/2005/06/new-cube-design-features-part-2-more.html

Well, it saves me the effort! The original presentation is available to download for PASS members at
http://www.sqlpass.org/

But just to reiterate the most important point – build those relationships between your attribute hierarchies, because they’re the only way that AS can make use of aggregations at ‘lower’ levels. Not having them there could impact query performance in a big way. And one more thing – when Dave Wickert covered some of the same subject matter in one of his presentations he made a point I didn’t know about: that when the aggregation design wizard runs it favours long, complete chains of relationships. I guess this is because there’s more chance that a given aggregation can be reused by queries higher up the relationship chain. 

If it’s not too late it would be great if the dev team could insert a message in the dimension wizard reminding people to build these relationships, and maybe even make it a bit more obvious where to do this in the Edit Dimension tab (it’s a bit hidden away if you ask me). I’m sure that otherwise people will forget to do it and wonder why their queries are crawling.