Displaying totals at the bottom

For someone like me without a financial background, the default behaviour of AS when displaying a hierarchy seems natural: when you ask for a set like MyHierarchy.Members, you’d get all the parent members appearing in the list before their child members. But for financial reports you want the opposite, you want to see all child members displayed before the parent members. The other day I was asked how to do this in turn and asked Andrew Wiles (who has a lot more financial OLAP experience than me), and he pointed me to the simple answer – the HIERARCHIZE function.

Here’s an example from Adventure Works:

SELECT [Measures].[Amount] ON 0,
DESCENDANTS([Account].[Accounts].&[95])
ON 1
FROM [Adventure Works]

Run the query and you’ll see the following returned, with the parent member Statistical Accounts displayed above its children:

pre

However, if you want to display it in the financial style, all you need to do is wrap your set with HIERARCHIZE( <<set>>, POST) as follows:

SELECT [Measures].[Amount] ON 0,
HIERARCHIZE(
DESCENDANTS([Account].[Accounts].&[95])
, POST)
ON 1
FROM [Adventure Works]

post

I have to admit, I’d always wondered what the point of Hierarchize() was – this is the first time I’ve actually needed to use it!

SQLBits III Summary

Phew, another SQLBits done and dusted. And I think it was a good one, especially looking at the comments so far:
http://sqlblogcasts.com/blogs/drjohn/archive/2008/09/14/busting-for-a-wii-at-sql-bits.aspx
http://andrewwiles.spaces.live.com/blog/cns!43141EE7B38A8A7A!368.entry
http://blogs.technet.com/andrew/archive/2008/09/14/sql-bits-cubed-thanks.aspx

Thanks to everyone that came and especially to everyone that helped out! And if you weren’t there, the best thing is that Microsoft brought along a camera crew and filmed the sessions (I think all of them, certainly all of the sessions in the room I was monitor for) which should be made available online somewhere soon. Apologies to everyone who stopped me to say hello – I was running around like the proverbial blue-a*sed fly all day so I couldn’t stop to chat with anyone for more than a few minutes, I hope I didn’t seem too rude…

Code sample to create Analysis Services partitions in SSIS

Courtesy of Jamie Thomson, here’s some sample code showing how to create Analysis Services partitions from within an Integration Services package using AMO:

http://blogs.conchango.com/jamiethomson/archive/2008/09/15/ssis-create-analysis-services-partitions-from-a-ssis-package.aspx

Although I should stress that if you’re using this in production, you do want to set the slice property on a partition to ensure you get the best possible performance in all cases.

Google, Panorama and the BI of the Future

The blog entry I posted a month or so ago about XLCubed where I had a pop at Microsoft for their client tool strategy certainly seemed to strike a chord with a lot of people (see the comments, and also Marco’s blog entry here). It also made me think that it would be worth spending a few blog entries looking at some of the new third party client tools that are out there… I’ve already lined up a few reviews, but if you’ve got an interesting, innovative client tool for Analysis Services that I could blog about, why not drop me an email?

So anyway, the big news last week was of course Google’s announcement of Chrome. And as several of the more informed bloggers (eg Nick Carr, Tim McCoy) the point of Chrome is to be not so much a browser as a platform for online applications, leading to a world where there is no obvious distinction between online and offline applications. And naturally when I think about applications I think about BI applications, and of course thinking about online BI applications and Google I thought of Panorama – who incidentally this week released the latest version of their gadget for Google Docs:
http://www.panorama.com/newsletter/2008/sept/new-gadget.html

Now, I’ll be honest and say that I’ve had a play with it and it is very slow and there are a few bugs still around. But it’s a beta, and I’m told that it’s running on a test server and performance will be better once it is released, and anyway it’s only part of a wider client tool story (outlined and analysed nicely by Nigel Pendse here) which starts in the full Novaview client and involves the ability to publish views into Google Docs for a wider audience and for collaboration. I guess it’s a step towards the long-promised future where the desktop PC will have withered away into nothing more than a machine to run a browser on, and all our BI apps and all our data will be accessible over the web.

This all makes me wonder what BI will be like in the future… What will it be like? Time for some wild, half-formed speculation:

  • Starting at the back, the first objection raised to a purely ‘BI in the cloud’ architecture is that you’ve got to upload your data to it somehow. Do you fancy trying to push what you load into your data warehouse every day up to some kind of web service? I thought not. So I think ‘BI in the cloud’ architecture is only going to be feasible when most of your source data lives in the cloud already, possibly in something something like SQL Server Data Services or Amazon Simple DB or Google BigTable; or possibly in a hosted app like Salesforce.com. This requirement puts us a long way into the future already, although for smaller data volumes and one-off analyses perhaps it’s not so much an issue.
  • You also need your organisation to accept the idea of storing its most valuable data in someone else’s data centre. Now I’m not saying this as a kind of "why don’t those luddites hurry up and accept this cool new thing"-type comment, because there are some very valid objections to be made to the idea of cloud computing at the moment, like: can I guarantee good service levels? Will the vendor I chose go bust, or get bought, or otherwise disappear in a year or two? What are the legal implications of moving data to the cloud and possibly across borders? It will be a while before there are good answers to these questions and even when there are, there’s going to be a lot of inertia that needs to be overcome.
    The analogy most commonly used to describe the brave new world of cloud computing is with the utility industry: you should be able to treat IT like electricity or water and treat it like a service you can plug into whenever you want, and be able to assume it will be there when you need it (see, for example, "The Big Switch"). As far as data goes, though, I think a better analogy is with the development of the banking industry. At the moment we treat data in the same way that a medieval lord treated his money: everyone has their own equivalent of a big strong wooden box in the castle where the gold is kept, in the form of their own data centre. Nowadays the advantages of keeping money in the bank are clear – why worry about thieves breaking in and stealing your gold in the night, why go to the effort of moving all those heavy bags of gold around yourself, when it’s much safer and easier to manage and move money about when it’s in the bank? We may never physically see the money we possess but we know where it is and we can get at it when we need it. And I think the same attitude will be taken of data in the long run, but it does need a leap of faith to get there (how many people still keep money hidden in a jam jar in a kitchen cupboard?). 
  • Once your data’s in the cloud, you’re going to want to load it into a hosted data warehouse of some kind, and I don’t think that’s too much to imagine given the cloud databases already mentioned. But how to load and transform it? Not so much of an issue if you’re doing ELT, but for ETL you’d need a whole bunch of new hosted ETL services to do this. I see Informatica has one in Informatica On Demand; I’m sure there are others.
  • You’re also going to want some kind of analytical engine on top – Analysis Services in the cloud anyone? Maybe not quite yet, but companies like Vertica (http://www.vertica.com/company/news_and_events/20080513) and Kognitio (http://www.kognitio.com/services/businessintelligence/daas.php) are pushing into this area already; the architecture this new generation of shared-nothing MPP databases surely lends itself well to the cloud model: if you need better performance you just reach for your credit card and buy a new node.
  • You then want to expose it to applications which can consume this data, and in my opinion the best way of doing this is of course through an OLAP/XMLA layer. In the case of Vertica you can already put Mondrian on top of it (http://www.vertica.com/company/news_and_events/20080212) so you can already have this if you want it, but I suspect that you’d have to invest as much time and money to make the OLAP layer scale as you had invested to make the underlying database scale, otherwise it would end up being a bottleneck. What’s the use of having a high-performance database if your OLAP tool can’t turn an MDX query, especially one with lots of calculations, into an efficient set of SQL queries and perform the calculations as fast as possible? Think of all the work that has gone into AS2008 to improve the performance of MDX calculations – the performance improvements compared to AS2005 are massive in some cases, and the AS team haven’t even tackled the problem of parallelism in the formula engine at all yet (and I’m not sure if they even want to, or if it’s a good idea). Again there’s been a lot of buzz recently about the implementation of MapReduce by Aster and Greenplum to perform parallel processing within the data warehouse, which although it aims to solve a slightly different set of problems, it nonetheless shows that problem is being thought about.
  • Then it’s onto the client itself. Let’s not talk about great improvements in usability and functionality, because I’m sure badly designed software will be as common in the future as it is today. It’s going to be delivered over the web via whatever the browser has evolved into, and will certainly use whatever modish technologies are the equivalent of today’s Silverlight, Flash, AJAX etc.  But will it be a stand-alone, specialised BI client tool, or will there just be BI features in online spreadsheets(or whatever online spreadsheets have evolved into)? Undoubtedly there will be good examples of both but I think the latter will prevail. It’s true even today that users prefer their data in Excel, the place they eventually want to work with their data; the trend would move even faster if MS pulled their finger out and put some serious BI features in Excel…
    In the short-term this raises an interesting question though: do you release a product which, like Panorama’s gadget, works with the current generation of clunky online apps in the hope that you can grow with them? Or do you, like Good Data and Birst (which I just heard about yesterday, and will be taking a closer look at soon) create your own complete, self-contained BI environment which gives a much better experience now but which could end up being an online dead-end? It all depends on how quickly the likes of Google and Microsoft (which is supposedly going to be revealing more about its online services platform soon) can deliver usable online apps; they have the deep pockets to be able to finance these apps for a few releases while they grow into something people want to use, but can smaller companies like Panorama survive long enough to reap the rewards? Panorama has a traditional BI business that could certainly keep it afloat, although one wonders whether they are angled to be acquired by Google.

So there we go, just a few thoughts I had. Anyone got any comments? I like a good discussion!

UPDATE: some more details on Panorama’s future direction can be found here:
http://www.panorama.com/blog/?p=118

In the months to come, Panorama plans to release more capabilities for its new Software as a Service (SaaS) offering and its solution for Google Apps.  Some of the new functionality will include RSS support, advanced exception and alerting, new visualization capabilities, support for data from Salesforce, SAP and Microsoft Dynamics, as well as new social capabilities.

SQLBits – now with free beer!

So, you were wondering whether you should give up your precious weekend to attend SQLBits. OK, so it has some great technical content (the agenda has now been published here: http://www.sqlbits.com/information/mainagenda.aspx) and it will be generally good fun, but you were planning on washing your hair/mowing the lawn/watching the entire Lord of the Rings trilogy on DVD… what could possibly tempt you away from that? How about FREE BEER? Yes, we can now announce that there will be an evening ‘do’ after the sessions have finished with drinks and pizza. So what are you waiting for? Register today at http://www.sqlbits.com/default.aspx !

OK, ok, I promise this is the last time I use my blog to promote SQLBits before it actually takes place. I need to get down to work on the content for the training day I’m running with Allan; incidentally, if you can’t make it to that, Allan will be presenting all the material himself here:
http://sqlknowhow.com/training/am20080929/

SSWUG BI Virtual Conference

The SSWUG are organising a virtual BI conference on September 24th-26th for a very reasonable $100:
 
They’ve got some good speakers and sessions, so it should be worth checking out.

Generating large numbers of partitions using Excel

Quite often when I’m doing proof-of-concept type work I find myself in the situation where I need to build a cube with near-production data volumes, and in order to make sure that performance is good I have to partition that cube. However setting up tens or even possibly hundreds of partitions in BIDS manually is no-one’s idea of fun, so how can you automate this process easily? If you’re using a SQL Server datasource then you should try using the functionality built into the Analysis Services Stored Procedure Project:
http://www.codeplex.com/ASStoredProcedures/Wiki/View.aspx?title=Partition&referringTitle=Home

But what if you’re using an Oracle datasource or hit a bug with the ASSP code? Most consultants have a preferred method (such as their own custom code, or SSIS packages) but I thought I’d blog about the most commonly used approach which is to use Excel. Here are the steps:

  1. Your starting point should be a cube with just one partition in each measure group.I would recommend not putting all your data in this partition, but create it as a slice of the data, the first of the partitions you want to create. So you’ll probably want to make it query-bound rather than table bound, and set the Slice property appropriately (see here for why this is important).
  2. Open up SQL Management Studio and expand the tree in the Object Explorer so you can see the partition for your first measure group then right click and Script Partition As -> CREATE To. This will open a new XMLA window and give you your template XMLA command to create a partition.
  3. Let’s say we are going to partition by month, we have three months and they have surrogate keys 1 to 3. Our template partition is correctly configured for Month 1 and we want to be able to alter this template for months 2 and 3, so we’re going to ‘parameterise’ the following properties:
    1. ID – so, if your existing ID property is set to "Month_1", we need to replace the 1 with a string we can easily find with a search and replace like "@@@", making the new ID "Month_@@@"
    2. Name – which is usually the same as the ID
    3. Query Definition – you will have a Where clause in the SQL query behind the partition which is something like "Where Month_ID=1" and this should be changed to "Where Month_ID=@@@"
    4. Slice – the tuple will be something like "[Period].[Month].&[1]" which again should be changed to "[Period].[Month].&[@@@]"
  4. Copy this XMLA command text into a cell in a new Excel workbook, say cell A1. Make sure you paste the text into the formula bar and not directly onto the worksheet – you want it all in one cell.
  5. Underneath this cell we’re going to use Excel formulas to take this template and generate the XMLA needed for all the partitions we want. In cell A2 enter the value 1, in A3 enter 2 and so on for as many months as you need. Remember in Excel if you enter values like this that increment by 1, if you select that area then drag it downwards Excel will automatically fill the new cells with incrementing values
  6. In cell B2 we’re going to use an Excel formula to replace the string @@@ with the value in A2. So something like the following will work:
    =SUBSTITUTE($A$1,"@@@",A2)
  7. You can then copy and drag this formula downwards, and you’ll see all your new XMLA commands to create partitions appear in B3 and the cells underneath
  8. Copy and paste the new XMLA into a new XMLA query window in SQL Management Studio
  9. You may find that some unwanted double-quotes have appeared now. You need to replace the double sets of double quotes ("") with (") and the delete the single sets of double quotes. So first do a find and replace on "" and change it to something like @@@, then do a find and replace to delete all instances of ", then do another find and replace to change @@@ to ".
  10. You now need to wrap these XMLA commands in a batch statement so they can be run together. So paste the following text before the first Create:
    <Batch xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
    and this at the very end:
    </Batch>
  11. Now delete your original partition and execute the new XMLA batch command you’ve just created and hopefully you’ll have your new partitions created. You can then process them.

Post-Holiday Roundup

Ahh, so I’m back from my holiday and feeling much better -even if it did manage to rain every single day while I was away (that’s the risk you take with holidays in England). Now all I have to do is get through the massive pile of emails waiting for me and steel myself in preparation for the next few months of hard work… roll on Xmas! Anyway, a few interesting things that happened/thoughts that occurred to me while I was away…

Of course the big thing that happened, the day after I left, was the RTM of SQL2008. Hopefully you’ve heard this news by now, but the big questions here are: is AS2008 any good? Do I want to migrate, and if so, when? Personally, I’ve been using it for a few months now on a project and my impressions of it are positive. As I’ve said before there aren’t any really amazing wow features that will make you want to upgrade, but the performance improvements can in some cases be quite significant, the new BIDS is easier to use, and there are a few obscure fixes/changes in behaviour which tie up some loose ends left over from AS2005. Since migration is very, very easy indeed I would encourage you to install it on a test machine if you haven’t already and start thinking about moving up. Of course the mantra of ‘wait until SP1’ is so deeply ingrained in people’s minds that most people will want to do exactly that – and there’s a lot of sense to that approach, since the first bugs are being found already (see here) but equally there are a fair few known problems with AS2005 SP2 and given the problems that all of the CU releases have (see here for example, and I’ve heard the same story for every single CU, they create as many new bugs as they fix) I wouldn’t recommend them; I suppose you could wait for 2005 SP3 but my feeling is that AS2008 is the better bet.

Meanwhile, in the cloud I see that Good Data have gone into beta, and there’s a new, mysterious MDX-queryable (Mondrian-based?) offering that has broken cover called BI Cloud. If I have time, I’ll try to check them out. Also on the net http://www.learnmicrosoftbi.com/ seems to have a lot of good videos explaining the basics of AS. And there’s a new podcast featuring Richard Tkachuk from the SQLCat team where he talks about the performance improvements in AS2008 and seems to suggest that it’s now possible to use hints in MDX with a new function whose name I couldn’t make out – I’ll post if I get more details.

I’ve also been thinking some more about the DATAllegro deal. There seems to be some discussion about when something that works with SQL2008 can be released, and the folks at DATAllegro are keen to stress that their architecture allows them to plug in new RDBMSs easily so the implication is that it will be sooner rather than later; clearly the investigation work has been going on for a while, and must somehow tie in with the MatrixDB stuff that got leaked a few months ago. All of this would be good for AS running in ROLAP/HOLAP mode on a MPP SQL Server, but can this technology but I wonder whether it could be made to work with AS in MOLAP mode? I think it could – surely the hooks are already there with the remote partitions/linked measure groups/dimensions stuff. Just conjecture though; I think we’ll find out more around the time of PASS and the BI Conference.

Lastly, I’ve booked my place for Mosha’s MDX Deep Dive pre-conf seminar at PASS this year. Who else is going?

SQLBits and my training day

And before I disappear off on my hols for a few weeks, can I remind you that SQLBits is happening on September 13th and that you really ought to be there? We’re just about at the stage of finalising the sessions (and we’ve got a great BI track lined up) so check http://www.sqlbits.com/default.aspx for more details!
 
As I mentioned before, Allan Mitchell and I will be doing a 1-day pre-conf seminar the day before (September 12th) on the Microsoft BI stack:
http://sqlknowhow.com/training/CWAM20080912/default.htm
It’s an introductory session, so if you’ve got colleagues who want to get a good overview of what BI is and what you can do with the MS tools in this area, then send them along.

Bill Baker leaving MS

Bill Baker, pretty much the top guy in BI at Microsoft since Microsoft first got interested in BI, is leaving the company. Not that I’m reading anything much into the move though – after ten years he’s probably looking for a new challenge or some way of spending all that money he’s earned.