Tableau v1.5 released

Version 1.5 of Tableau, in my opinion probably the best looking, easiest to use and most innovative (but unfortunately also rather expensive and fat-client only) AS client tool has just been released. You can see a list of all the new features here, chief of which is support for AS2005. If you’re looking for an AS client tool I strongly recommend you download a trial and take a look even if you don’t think it can meet all your requirements – it really shows up how poor the other client tools out there are in user interface terms.
 
I did a tiny bit of beta testing on this release and remained as impressed as I was when I first saw it. However the discovery that you can’t use Time Utility dimensions with the tool – a modelling technique which is going to be very common with AS2005 since that’s what the Time Intelligence Wizard builds to hang all your time calculations such as YTD and Previous Period Growth off – was a bit of a disappointment. I found the dev team very intelligent and responsive to feedback, though, and they’ve promised to look at this problem for the next release…

So, what is the UDM?

The other week I went to an evening event at Microsoft’s UK office in Reading, given by Matt Stephen. It was a general introduction to BI in SQL2005 and as such, attended by people who didn’t know much at all about the new features in AS, RS, IS and so on. All the familiar Powerpoints were shown and much was made of the Unified Dimensional Model as being the best thing since sliced bread. I’m sure almost everyone reading this has seen these presentations, especially the slides where the relational reporting and olap reporting worlds ‘come together’ like two pieces of a jigsaw and the one where Analysis Services is described as a cache on top of your data warehouse. At the end of the session, though, the very first question that was asked was which I think had been on a lot of people’s minds – "What exactly is the UDM?". This reminded me of the first time I saw any presentations on Yukon AS at an airlift in Redmond two-and-a-half years ago: for a while afterwards I was confused over what exactly the UDM was too. And Myles Matheson in a blog entry from a month or so back feels obliged to answer exactly the same question so I suspect this is a common reaction.
 
The answer of course is actually pretty simple. Put simply, the UDM is just the cube in Analysis Services 2005; because it can now model so many more features of a relational data warehouse (eg many-to-many dimensions, role-playing dimensions) the message is that there’s now no reason to run queries directly against your data warehouse at all because you’ll get much better performance and query-flexibility by building a cube and querying that instead. From a technical point of view I have no problems at all with the claims being made here – in my experience AS2005 lives up to its hype as much as any software product can – but I didn’t understand why all this talk of the UDM and the resulting confusion was necessary. Why not just talk about the new capabilities of cubes in AS2005?
 
Then I came up with the following theory. The UDM doesn’t exist as a feature, really, but is more of a marketing concept. Marketing concepts are meant to help sell a product. For AS2005, who are the new customers that Microsoft is trying to target? Probably the same new customers that AS2000 won over, people who hadn’t been customers of other BI companies but who had either been priced out of the market or were trying to hand-code their own BI solutions using a relational database and encountered the usual problems. They’re going to be easier to sell to than someone who already has a big investment in Cognos, Essbase or Oracle. In my experience there’s a vast amount of people out there who are still in this position, but in contrast to the people who picked up on AS2K they’re by nature a bit more cautious and unwilling to leave their relational comfort-zone – they know about OLAP but they’re not sure they want to learn a new technology. This constituency is, in my opinion, who the whole UDM pitch is aimed at: let’s not talk about cubes, because that might frighten you, but let’s talk about the cube as a cache (which is less threatening) and the UDM as something that is the successor of both relational reporting and OLAP reporting.
 
So this is why I think I was confused: I was meant to be confused. Quite a clever strategy to avoid knee-jerk anti-cube prejudice or fear, then, if it works. But does it work? Well, maybe, maybe not. The fact I was confused doesn’t really matter because I’m cube-friendly anyway, but for the confused relational guy his first reaction to hearing about the UDM is to start asking questions to try to clarify the situation. And what I found interesting at Matt Stephen’s presentation was that the second question asked was exactly the same question that I asked when I was trying to understand what the UDM was: since the UDM is a replacement for both OLAP and relational reporting, can you therefore run both SQL and MDX queries against the UDM? The answer is a qualified no, because although AS2005 like AS2K does support querying using a very limited subset of SQL, it is a very limited subset and isn’t practically useful. You have to learn MDX to query your UDM or buy a tool that will generate MDX for you. I suspect that this is the point where many relational guys turn off, having realised that the UDM is the cube and that they’ll still have to learn a completely new, non-SQL technology.
 
This then leads on nicely to the question of whether OLAP is better off shoe-horned into the relational world and queried with SQL, as I understand Oracle have done with what used to be Express, or whether it’s better off as a distinct technology with its own query language as Microsoft have done. I touched on this topic a few months ago here, and as you might have guessed I’m in favour of the Microsoft approach. I don’t blame Microsoft for trying to blur this distinction though, as anything that will get people to look at AS2005 is a good thing in my book. It’s just that I’m not sure that your average BI customer can be hoodwinked in this way for very long, that’s all…
 
 

Analysis Services Book List (attempt #2)

Since I can’t get my links to Amazon working in an MSN Spaces list, I thought I’d just put my book list in a regular entry and then update it as necessary.
 
Microsoft SQL Server 2005 Analysis Services: Irina Gorbach, Alexander Berger, Py Bateman, Edward Melomed

 
Updates/News:
Teo Lachev has announced that ‘Applied Microsoft Analysis Services 2005’ has gone to the printers and will be available by the end of November. More details and resources can be found here.
 
Mosha has announced that the second edition of ‘Fast Track to MDX’ is on the verge of publication. He has more details and some comments on other books on this list here.
 
Nick Barclay has a review of ‘Data Mining with SQL2005’ on his blog here. I’ve also just bought a copy and will be reviewing it as soon as I’ve read it properly! First impressions are good though.
 
If anyone wants to send me a free copy of their book for review (cheeky idea for getting free books, I know, but it might just work!) then please drop me a line at the email address mentioned in my profile.
 
Thanks to Nick Barclay again for the fact that ‘The Microsoft DataWarehouse Toolkit’ book (what’s listed above as ‘Data Warehousing with SQL 2005’ – I’ll update the link when Amazon UK updates its page for the book) has its own web page with some content.
 
Nick Barclay has a positive review of ‘Applied Analysis Services 2005’ on his blog here. Mark Hill also reviews it favourably here.
 
I have a review of ‘Data Mining with Analysis Services 2005’ here.
 
Nick Barclay has a review of ‘MDX Solutions’ second edition here.

The AS Dev Team wants your feedback

Just spotted this post on the Analysis Services 2005 beta public newsgroup by Mosha (who I guess is a bit too busy with other work at the moment to put it in his blog), asking for feedback on MDX changes and performance. It’s good to see that the AS Dev team are as interested in engaging with customers as they always have been, but if I do have a criticism it’s that beta testers would be better able to test out new functionality if they actually knew what it was. I don’t want to sound too negative here but for instance I know that MDX Scripts have changed a lot over the last six months, and if the only information you had to work with was Richard Tkachuk’s white paper (which is now out of date in a few respects), Mosha’s blog and BOL you’d probably be struggling to understand what’s going on let alone implementing any apps which really push MDX Scripts to the limit.
 
I’ve been lucky in the amount of access I’ve had to Redmond to get my questions answered – Matt Carroll and Marin Bezic, take a bow – but I know from talking to other people that they’ve been frustrated at the lack of information available. I suppose the onus is on people like me, who do have the knowledge, to spread it around by blogging etc. Unfortunately I don’t have as much time as I’d like to blog or answer questions via email (I also have to work), and in any case I’m under obligation to my publishers and co-authors to save the really detailed explanations of new functionality for ‘MDX Solutions’. Similarly the dev team, although I know they make a really big effort, are obviously more focussed on building the product than writing about it. Maybe the SQL Server team needs to recruit some full-time bloggers to pump the information out to the community. Now that would be a cool job to have…

Dundas OLAP Services

I see that Dundas have entered the market for ADOMD and ADOMD.Net client components with Dundas OLAP Services. It’s available in Windows Forms and ASP.Net flavours and although it doesn’t offer anything much in terms of functionality that isn’t already available, I’ll be taking a look because a) the web component looks prettier than most of the competition, which isn’t hard, and b) it’s from Dundas rather than a one-man-and-a-dog software company, so there’s less risk about future support.
 
UPDATE: you can see a live demo on Foodmart 2000 here. Having looked at it briefly, it’s as I thought – does nothing new, but those charts are nice to look at.

Yet more stuff on the Project REAL site

Greg Fodor mailed me to say that there’s even more new stuff on the Project REAL site. The two things that leap out are the long promised AS2005 best-practices white paper (which I’ll be reading over the next few days) and the paper on using data mining to build out-of-stock predictive models.

AS2005 Calculation Engine Performance

I’ve been talking about measure expressions (and Mosha has been plugging the fact you can do the same thing in MDX Scripts) for a while now, but up until now I’ve not got round to running any tests which compare their relative performance. However, I’ve been doing a PoC over the last week or so which has given me just this opportunity and I thought I’d share my results…
 
The scenario I had to implement was as follows: I had a sales fact table which showed the number of Units for a particular SKU of a Product sold per month (plus various other dimensions), and I had a prices fact table which showed the Price of each SKU per Month, and I wanted to be able to analyse sales values. Fairly straightforward so far, I just created a view which joined the two fact tables on SKU and Month and then multiplied Units by Price to get the Value. However, I also had a requirement for the user to be able to pick any given Month and to be able to look at Values for a range of Months to be calculated as if the Price for each SKU was frozen at the Month they had picked – this would allow them to remove the effect of price changes from any changes in sales Value. You can probably see that you might want to do something similar when using exchange rates.
 
Anyway, I originally tried to use measure expressions to implement this ‘fixed price’ requirement but after a lot of thought (and discussion with Jon) came to the conclusion that it wasn’t actually possible. I had one physical Time dimension but this was present in the cube twice as two role-playing dimensions, one linked to the Units measure group and one to the Price measure group, and the fact that these two dimensions weren’t common to both measure groups made the numbers aggregate up incorrectly. Even when I tried linking the Units Time dimension to the Prices measure group and the Prices Time dimension to the Units measure group using many-to-many dimensions, which I thought should work in theory, I still didn’t get the correct numbers. It obviously needs a regular relationship in order to work properly. So I went ahead and used an MDX Script calculation instead and was pleasantly surprised with the results.
 
In order to properly compare the performance between ordinary measures, measures which have measure expressions and measures whose values are calculated in the MDX Script I created a test cube based on a subset of my data. I took six months of sales data which resulted in a fact table of approx 7 million rows, the whole of my prices fact table (approx 6 million rows), and the whole of my Product dimension (170000 SKUs) and built a cube with no aggregations or partitions and four measures: Units, Value (calculated within the fact table and a regular measure), Value calculated using a measure expression of the form [Measures].[Units]*[Measures].[Price], and a regular measure exactly the same as Units but whose value was overwritten in the MDX Script with a calculation (based on Mosha’s recommendations here) similar to:

CALCULATE;
([Measures].[VALUE – SCRIPT],
LEAVES([Product]), LEAVES([Time]))

= [Measures].[VALUE – SCRIPT] * [Measures].[PRICE];

 
I checked that all three Value measures returned the same result and then ran a very big query (26000 rows and 4 columns) to see how they compared, stopping and starting the Analysis Services service before each query to try to reduce the effect of caching. On my fairly ordinary 2-way Windows 2003 box running the June CTP this mega-query ran for 13 seconds for both the ordinary Value measure and for the measure expression version, which I thought was pretty amazing. The MDX Script version ran for 18 seconds which I thought was still very good. Subsequent runs of each query on a warm cache all ran in 9 seconds. Obviously the next step would be to install AS2K on the same box and see how performance was on the same cube when using a calculated member to do this calculation, but I unfortunately I haven’t got the time to do this and in any case I can’t imagine it would perform anywhere near as well.
 
So, what can we take away from this? It looks like the claims for big performance improvements in the AS2005 calculation engine might be justified. Where you can use measure expressions – and in most classic currency exchange rate scenarios you will be able to use them, even though I wasn’t able to – they seem to perform almost as well as if the value was precalculated in the fact table. Of course I didn’t test what impact writeback has on query performance, and this is the other essential factor in most financial applications, but in AS2K you always had to design you cube to account for the possibility that values might have been written back to the cube even if they hadn’t been, and even then performance could be bad. And even if you can’t use measure expressions, it looks like a pure MDX Scripts approach can still perform really well.
 

Office 12 BI

Marco Russo, lucky guy, was at the PDC and saw the session on Office 12 BI. I’ve had a Google and can’t find anyone else with more details – if any of you did see this presentation, leave a comment and tell me what you thought!
 
It sounds like the Office guys have finally turned Excel into a useable BI platform, and this will be an additional squeeze on all those small Excel addin vendors out there. The integration with the new Excel server bits that Marco hints at would be the real killer – I have often had BI users wanting to take their worksheet full of OLAP data and publish it to the web as a live report.

PASS 2006 European Conference

I got an email today advertising the PASS 2006 European Conference in Barcelona next February. I went to (and spoke at) the last one in Munich and found it very useful – certainly the level of technical detail was way ahead of your average conference, especially as far as Analysis Services went, and I’d recommend it. I’m going to submit an abstract for a presentation too; I guess it will be on something like ‘Introduction to MDX Scripting’.

Patents

I was having a look (for no good reason) at the Free Patents Online site at http://www.freepatentsonline.com/ and while I was there, I remembered the photo on the OLAP Report website (at http://www.olapreport.com/products/MicrosoftOLAP.htm if you have a subscription) of the big pile of cube/nick-nacks, each one representing a patent, that is to be found in Amir Netz’s office. So I thought I’d have a look and see just what the content of these patents was to see if they were interesting. Who better to start with but our man Mosha? Here are the results a search on his surname brings up. Hmm, if you can wade through the reams of legalese there are some insights into how AS2005 might work (assuming that it does work according to the patent) but a lot of it is statin’ the bleedin’ obvious.
 
I can’t mention OLAP patents without the infamous Timeline – read this thread for an idea of what they’re about. Memo to self: remember to patent the wheel at some point over the weekend.