Simple vs Searched CASE statements

I recently had a customer show me an MDX calculation using a CASE statement, where they had achieved a big improvement in performance when they moved from using a searched CASE to using a simple CASE. For the record, here’s an example of a simple CASE statement:

CASE [Measures].[Internet Sales Amount]
WHEN 0 THEN 0
WHEN 1 THEN 1
WHEN 2 THEN 2
WHEN 3 THEN 3
ELSE 4
END

And what the equivalent searched CASE looks like this:

CASE
WHEN [Measures].[Internet Sales Amount]=0 THEN 0
WHEN [Measures].[Internet Sales Amount]=1 THEN 1
WHEN [Measures].[Internet Sales Amount]=2 THEN 2
WHEN [Measures].[Internet Sales Amount]=3 THEN 3
ELSE 4
END

In my customer’s code they were doing lots of complex conditions involving calculated measures, but nonetheless I couldn’t work out why their searched CASE performed so much worse because the logic in both the simple and the searched CASE was the same. Therefore, in turn I asked my friends in Redmond and Marius Dumitru gave me some very useful insights that I thought I’d pass on. Basically, a searched CASE requires all previous branch conditions to have been evaluated before a given branch is chosen, which in turn makes it harder for the query optimiser to evaluate a searched CASE in bulk mode. As a result it’s always better to use a simple CASE where possible; in the worst scenarios (and I guess this is what was happening to my customer) the performance of a searched CASE can be several times worse than a simple CASE even when they seem to be doing the same thing.

Platforms For Building Richer BI Applications

One of the mysteries of the MS BI third-party ecosystem is how slow it has been to make use of technologies like WPF and Silverlight. Marco Russo has a plausible explanation of why this is here; it’s really only in the last six months that things have begun to change. A few products I’ve seen or heard of include Clearway GeoAnalyzer, Radarsoft’s RIA Grid and Intelligencia for Silverlight; there’s also increased interest in building your own specialised BI apps in Silverlight – for example I’ve seen Sascha Lorenz do presentations on this subject at various conferences, and of course Bart Czernicki’s book “Next Generation Business Intelligence Software with Silverlight 3” was released a few months back. I still don’t think we’re anyway near reaching the potential of the technology though.

I think one way to increase uptake would be to provide some kind of toolkit or additional layer to help developers or even power users build BI applications. Maybe something like a Microsoft version of SAP’s XCelsius would be a good idea? I know there would be a lot of overlap with what PerformancePoint is meant to do, but I think there is sometimes a need for highly visual presentation of data rather than plain old dashboards, beyond what’s possible with PerformancePoint, Excel or Visio even in Office 2010. I’m not advocating the abandonment of Stephen-Few-ish design principles for serious business dashboards in favour of fancy gauges and animations – but sometimes, for example in presentations or newspaper articles, a bit of ‘wow’ in the way the data is presented can be as important for the overall purpose as the meaning of the data; the kind of visualisations you can find on http://www.informationisbeautiful.net/, for example, are what I’ve got in mind here.

Here’s two examples of what could be done. When I saw Microsoft Semblio I thought something like it for BI developers for creating dashboards or presentations would be cool: it’s an SDK for creating rich, multimedia content for educational purposes. In a similar vein, I recently met up with an ex-customer of mine, Steven Peters, who is now the owner of a startup called Munglefish that develops a platform for developing closed-loop sales and marketing presentation applications. Munglefish’s EpicX platform is something like an interactive PowerPoint, and among other things each ‘slide’ can display BI data as an aid to the sales process (eg if you’re selling Widgets to an IT consultant in his mid-30s in SE England, you’d be able to display just how much money other IT consultants in their mid-30s in SE England had saved buying your brand of widgets) as well as capturing information about the flow of the sales process and sending it back to a data warehouse to be analysed; I think it is one of the best examples of BI being integrated in what is not primarily a BI application that I’ve ever seen, and its success is completely due to the kind of high-quality graphics that are possible with WPF and Silverlight. These platforms don’t remove the need for a developer but they do reduce the overall amount of development work needed. They are also targeted at scenarios where slick visualisations are very important for engaging the audience – we know it’s just as important to hold the CEO’s attention in a dull meeting where you’re presenting your financial data as it is to hold a 12-year-old’s attention in a science lesson.

Finally, last week I also saw the announcement of Vedea, a new, experimental data visualisation language from Microsoft Research. You can find full details of it on Martin Calsyn’s blog here:
http://blogs.msdn.com/martinca/archive/2009/12/03/introducing-the-microsoft-visualization-language.aspx
It’s basically a new .NET language for “creating interactive infographics, data visualizations and computational art” – pretty much what I’ve been asking for so far in this post, and although I still think it would be too technical for the average business user I can see it would have a lot of interesting uses for BI professionals. With a bit of luck, like F#, it will make the transition to being a full member of the .NET family one day and maybe then we’ll have a tool that will allow us to make the most of the power of Silverlight and WPF for BI with the minimum of effort.

DAX: Running Totals

Continuing my exploration of DAX, here’s a quick post on calculating running totals. I’ll be honest and say that I didn’t work out these expressions myself but got it from a newsgroup exchange between Colin Banfield and Karen Aleksanyan. In this case I want to calculate a running total from the beginning of time up to the current date (as opposed to from the beginning of the current year, as with a year-to-date). There are two possible approaches:

=’FactInternetSales'[Sum of SalesAmount](VALUES(‘DimDate'[FullDateAlternateKey])>=’DimDate'[FullDateAlternateKey], ALL(‘DimDate'[FullDateAlternateKey]))

=’FactInternetSales'[Sum of SalesAmount](DATESBETWEEN(‘DimDate'[FullDateAlternateKey], BLANK(), LASTDATE(‘DimDate'[FullDateAlternateKey])), ALL(‘DimDate'[FullDateAlternateKey]))

The first simply asks for a filter of all values from FullDateAlternateKey where the date is greater than or equal to the current row date. Don’t ask me why it needs a ‘greater than’ as opposed to ‘less than’, but that’s what it needs to work – possibly this is a bug. The second expression uses the DatesBetween, Blank and LastDate functions (interesting that the Blank function seems to work in the same way a null does with the colon operator in MDX in this case); it has one advantage over the first expression in that it doesn’t display errors when other columns from the DimDate table are used in the pivot table. Here’s what the output looks like with just dates on rows:

image

As with the original ytd calculation in my previous post on DAX, neither expression displays the expected values when looking at Years or Quarters, for example – you again just see the same value as Sum of SalesAmount. The really weird thing is that at the date level both calculations return different values when you have Years and Dates on rows to when you just have dates. The screenshot below shows the crossover between 2001 and 2002 with just dates on rows:

image

Now here’s what you see with Years present:

image

Notice how in the first screenshot the running total carried on across the year boundary, as you’d expect, but in the second case the running total is only calculated from the beginning of the current year. I’m not sure whether this is a bug (although I suspect it is) or expected behaviour; the same thing happens when you slice by Year too, which makes more sense:

image

As ever, I’ll find out what’s going on and post an update here…

UPDATE: Marius Dumitru has come to the rescue again. Here’s the working version that does everything I want:

=’FactInternetSales'[Sum of SalesAmount](DATESBETWEEN(‘DimDate'[FullDateAlternateKey], BLANK(),LASTDATE( ‘DimDate'[FullDateAlternateKey])), All(‘DimDate’))

Here it is working:

image

First steps in DAX: Year-To-Date

With the release of the latest CTP of PowerPivot, DAX has gained a lot of new functions (mostly to do with time calculations), so over the weekend I decided that it was time to sit down and start learning it properly. A lot of other people out there are doing just the same thing – Vidas, for example – and like them I thought I’d blog about my experiences while I was checking it out. So far I agree with Shaun Ryan that anyone with a background in MDX will have an advantage when it comes to understanding DAX, but there are as many differences as there are similarities and it’s still a bit tricky to understand what’s going on.

I would like to point out that the code here doesn’t necessarily represent best-practice for DAX (I’ve only been using it for a short time, remember!) and in some cases the code will have to change before RTM because DAX itself will change; hopefully, though, these examples will help anyone else out there who, like me, at the bottom of the learning curve. In all this blog post I’m using a very simple PowerPivot model based on the DimDate and FactInternetSales tables from the 2008 version of the Adventure Works DW database, with the two tables joining on the OrderDate key.

I thought I’d choose year-to-dates as the subject of this first blog post because it’s a common calculation and relatively easy to understand. There is, of course, a DAX function specifically for year-to-date calculations: TotalYTD. Here’s what my first attempt at a DAX year-to-date calculated measure looked like:

TotalYTD(Sum([SalesAmount]), ‘DimDate'[FullDateAlternateKey], All(‘DimDate'[FullDateAlternateKey]))

I would have put in some line breaks but of course DAX doesn’t allow them (yet? hopefully this will change before RTM). Step by step:

  1. TotalYTD( – the year-to-date function itself
  2. Sum([SalesAmount]) – the value we’re going to be summing up, ie the sum of the [SalesAmount] column
  3. , ‘DimDate'[FullDateAlternateKey] – the date we want to do the year-to-date sum up to, which can be found in the [FullDateAlternateKey] column of the DimDate table
  4. , All(‘DimDate'[FullDateAlternateKey]) – which is an extra filter condition that just needs to be there to make things work. If it ain’t there, the calculation won’t work, but it doesn’t really serve any purpose and it won’t need to be there at RTM. Howie Dickerman of Microsoft pointed this out in his session at PASS earlier this month and if I hadn’t picked up this tip I’m sure I would have got absolutely nowhere with DAX by now.

Here’s a screenshot of the calculated measure in action:

image

As you can see, it works well when we’re looking at dates. But when we start to introduce years and quarters you can see we don’t get the results we might expect:

image

Hmm, cue several hours of head-scratching on how to fix this. After a lot of trial and error, I found that by creating a second calculated measure like this:

‘FactInternetSales'[YTD Sales](LASTDATE(VALUES(‘DimDate'[FullDateAlternateKey])), all(‘DimDate’))

Worked. What I’m doing here is forcing the measure to display the value of the [YTD Sales] calculated measure for the last date in the current context. Here’s what it displays (note the value for Quarter 2):

image

Various other similar approaches didn’t work, though, and the obvious next step of using LASTDATE(VALUES(‘DimDate'[FullDateAlternateKey])) inside my original calculated measure didn’t work either.

Given that PowerPivot doesn’t allow you to hide calculated measures (another thing that, in my opinion, needs to be fixed before RTM – breaking up calculations into smaller steps is a useful thing to do for debugging and sharing code) I don’t particularly like this two step process. I’ll post an update here when/if I find a way to do this in a single calculation; if anyone else out there finds out how to do this, please leave a comment!

UPDATE: Marius Dumitru has given me the answer to my problem. Here’s a version of the YTD calc that works on all time selections:

=TotalYTD(Sum([SalesAmount]), ‘DimDate'[FullDateAlternateKey], All(‘DimDate’))

The difference here is the All function is taking the whole DimDate table rather than just the [FullDateAlternateKey] column. I’m currently trying to work out why this is important… In the meantime, here’s a screenshot of it working:

image

Free version of Microstrategy Reporting Suite for SSAS

Here’s a cheeky move by Microstrategy: they’ve made the free version of their Reporting Suite work for Analysis Services. More details and a download link here:
http://www.microstrategy.com/freereportingsoftware/learnmore/microsoft-analysis-services-ssas.asp

I’ve not tried it so I don’t know whether it’s any good or not, but it’s free and you can have up to 100 users, so it will be worth checking out. Of course this is Microstrategy trying to hurt Microsoft and its partners, but, well, it’s free…

SQLBits V Summary & Thanks

Another SQLBits – the fifth! – has come and gone, and I wanted to say thanks to everyone that helped to make it such a success. It was a three-day event for the first time this time, which meant that there was even more organisation work needed, but looking back I think it all went extremely smoothly and was by far the slickest conference we’ve put on so far. My colleagues on the organising committee, Simon Sabin, Martin Bell, Allan Mitchell, Darren Green and James Rowland-Jones, are a great bunch of people and it’s always a pleasure to work with them on SQLBits even if it does take a big chunk out of our spare time. Thanks are also due to the speakers and sponsors, without whom the event would not be possible, and I’d also like to highlight the people who volunteered to help out doing unglamorous things like stuffing the attendee bags and room monitoring – Rachel Clements, Jon Reade,  Gary Short, Rachel Hawley, Richard Douglas, Luke Hayler and many others.

If you came to SQLBits I hope you enjoyed it, and if you did enjoy it please let everyone know by blogging and twittering about it. Please also join the SQLBits groups on LinkedIn and Facebook, and if you’ve got any pictures of the event why not post them on the latter?

Anyway, it’s time to crack on with some real work and deal with the big pile of emails that has built up over the last few weeks. I’ve missed a whole bunch of big announcements I would otherwise have blogged about but I’m sure you’ve caught them elsewhere… I must get round to downloading and installing the latest Powerpivot CTP etc.

See you at the next SQLBits!

Pinpoint and Dallas

Interesting news from PDC: Microsoft has announced two new services – Pinpoint and Dallas.

You can find Pinpoint here: http://pinpoint.microsoft.com

Here’s the blurb from the site:

Pinpoint is the fast, easy way for business customers to find experts, applications, and professional services to meet their specific business needs—and build on the software they already have.

At the same time, Pinpoint helps developers and technology service providers quickly and easily get software applications and professional services to market—and engage customers who need what they offer.

Pinpoint is the largest directory of qualified IT companies and their software solutions built on Microsoft technologies.
  • More than 7,000 software application offerings.
  • More than 30,000 Microsoft-technology experts.
  • The largest, most diverse set of Microsoft business platform offerings in the industry in a central location.
  • Direct links between applications and the services that support them.

Whether you’re searching for expert help or offering it, Pinpoint helps you easily find and engage the right people and technologies to get the job done.

 

Much, much more interesting from a BI point of view is Dallas, which is part of Pinpoint: http://pinpoint.microsoft.com/en-US/Dallas

It’s Microsoft’s marketplace for data, all built on Azure. Again from the blurb:
Microsoft Codename “Dallas” is Microsoft’s Information Services business, enabling developers and information workers to instantly find, purchase, and manage Web services and datasets to power the next set of killer applications on any platform.

The Register has the best write-up of what this is here: http://www.theregister.co.uk/2009/11/17/microsoft_dallas_data_service/

From that article:
Dave Campbell, a Microsoft technical fellow, demonstrated Dallas at PDC. He showed a list of data provides from the partners such as infoUSA, subscriptions, the ability to store structured and unstructured data, and to explore the data without needing to parse it, to preview the data in ATOM, invoke the data as a Rest service and analyze the data using PowerPivot in Microsoft’s Excel spreadsheet program.

Note my emphasis on the last sentence! Here at last is the ability to buy that third party data that’s been a part of every Powerpivot demo. I’ve worked with a lot of companies that sell data in my career, and this looks like it could be a very significant development for them. I’d even heard vague rumours that MS were interested in buying commercial data providers at one point, several years ago – if they were prepared to go this extreme then it would certainly go a long way to making this strategy a success.

Now just think how cool it would be if SSAS or PowerPivot could be hosted on the cloud, so all you needed was Excel to analyse this data. Maybe one day…

SQLBits Agenda Published

At long last, the SQLBits agenda has been published:
http://www.sqlbits.com/information/newagenda.aspx

It’s not too late to register, even though SQLBits is only next week. It’s looking like it will be the largest event yet in terms of attendance… If you’re coming and you see me around, say hello!

UPDATE: car sharing is live now too – http://sqlbits.com/CarSharing.aspx

Ragged Hierarchies, HideMemberIf and MDX Compatibility

Here’s something that I suspect a few people out there will be surprised I didn’t know – but as far as I’m concerned, if I didn’t know it then it’s worth blogging about.

Anyway, it regards ragged hierarchies, the HideMemberIf property and the MDX Compatibility connection string property. Now you probably know that if you want to turn a user hierarchy into a ragged hierarchy (perhaps to avoid using a parent child hierarchy) you need to use the HideMemberIf property on the user hierarchy. For example, imagine you were using the following SQL query as the basis of your customer dimension:

SELECT        ‘UK’ AS Country, ‘Bucks’ AS State, ‘Amersham’ AS City, ‘Chris Webb’ AS Customer
UNION ALL
SELECT        ‘Italy’ AS Country, ‘BI’ AS State, ‘Biella’ AS City, ‘Alberto Ferrari’ AS customer
UNION ALL
SELECT        ‘Vatican’ AS Country, ‘Vatican’ AS State, ‘Vatican’ AS City, ‘The Pope’ AS customer

We could build a dimension off this with attributes for Country, State, City and Customer, and for two out of our three customers that would be fine. However the Pope lives in the Vatican, which is (at least for the purposes of this exercise) a Country with no concept of City or State; and in the case of customers who live in the Vatican, we just want to be able to drill down on the Country ‘Vatican’ and see all of the Customers who live there without drilling down through a meaningless State and a City.

So what we can do is build a user hierarchy on our dimension with levels Country, State, City and Customer, and on the lower three levels set the HideMemberIf property to OnlyChildWithParentName:

image

Then, with any sensible client tool, we can connect to the cube and browse the dimension as we want:

image

I saw ‘sensible’ client tool, because of course this only works if you set:
MDX Compatibility=2
…in the connection string. And of course Excel 2007 hard-codes MDX Compatibility=1 in the connection string and doesn’t allow you to change it, so you can’t use ragged hierarchies properly.

This much I knew.

However, what I didn’t realise until last week when I was moaning about this to TK Anand from the SSAS dev team at PASS, is that for some ragged hierarchies you don’t need to set the MDX Compatibility connection string property at all.

For example, if in our case we duplicate the Customer upwards rather than the Country downwards, like so:

SELECT        ‘UK’ AS Country, ‘Bucks’ AS State, ‘Amersham’ AS City, ‘Chris Webb’ AS Customer
UNION ALL
SELECT        ‘Italy’ AS Country, ‘BI’ AS State, ‘Biella’ AS City, ‘Alberto Ferrari’ AS customer
UNION ALL
SELECT        ‘Vatican’ AS Country, ‘The Pope’ AS State, ‘The Pope’ AS City, ‘The Pope’ AS customer

…and then build the dimension, setting HideMemberIf on our user hierarchy to OnlyChildWithParentName, we can get the result we want without setting the MDX Compatibility property. Here’s a screenshot of this new dimension in Excel just to prove it:

image

The difference here is that we’re hiding all members below the State level right down to the bottom of the hierarchy, rather than hiding members somewhere in the middle of the hierarchy. Truly, this is one of those “Doh, if only I’d known!” moments… this at least means that in some of the scenarios where you’d use ragged hierarchies you can get them to work with Excel, even if it means that we have to hack the data (‘The Pope’ is a Customer, not a State or a City). On the other hand there are plenty of scenarios where you do need to hide members in the middle of a hierarchy, and frankly I don’t see why Excel 2007 can’t set MDX Compatibility=2 in its connection string so they work properly.

PASS Summit Thoughts

The PASS Summit is over for another year and I’m just starting out on the long trip back home, so there’s plenty of time to get my thoughts together on what’s happened over the past week. In fact there’s not much to say about the event itself: it was, as ever, a lot of fun and totally worthwhile. Hey, within 30 minutes of arriving at the conference I learned I’d won an award for the best BI-related blog entry, for my post on implementing real SSAS drilldown in SSRS!

Attendance was up from last year although probably the recession still took its toll: remember that there was no BI Conference this year and I would have thought that a lot of people who would have gone to it would have gone to PASS instead. To be honest I think not having a BI Conference is a good thing, actually. I don’t like having to choose which conference to attend, and part of the benefit of a conference is to get as many members of a tech community together in one place. And this was certainly the largest gathering of Analysis Services people I’ve ever seen: all the usual crowd were there, I met a lot of people who I’d only met a few times before, and I finally got to meet Darren Gosbell in person after having known him by email for at least five years. One complaint I would make about the event was that the sessions weren’t scheduled particularly well. I know everyone always complains about this but in this case it did seem worse than usual: my session, for example, was up against two other SSAS-specific sessions, but in other cases there were time slots with no SSAS content at all.

The other benefit of PASS is that you get to talk at length about what’s going on in the world of SQL Server with other like-minded people. As a result you get to crystallise your thoughts on a lot of matters and – guess what – I’m going to share mine here.

First of all, the topic that was on everyone’s lips was PowerPivot. In fact everyone at the conference must have seen the standard demo at least five times and there were also a lot of advanced sessions on it too. Don’t get me wrong, I really think PowerPivot it cool from a technology point of view, I am going to take the time to learn it, and I also think from a make-money-by-getting-people-to-upgrade-to-Office-2010 point of view it is a very clever move for Microsoft. But my feelings about it remain ambiguous. Quite apart from the arguments about it discouraging ‘one version of the truth’ and encouraging spreadmarts that have already been discussed ad nauseam, I have another problem with it: I don’t honestly know whether I, as a consultant, will be able to make any money from it. The very nature of it, as a self-service tool, means no expensive outside consultancy is necessary. I don’t think it will take business away from me though; it will be widely used and it will be used instead of regular SSAS for more basic projects, but the more serious stuff will stay with SSAS I hope. I think the need for sophisticated security and more complex calculations will be the deciding factor when people choose between SSAS and PowerPivot; I’m not sure I see many people upselling from PowerPivot to SSAS either. We’ll see.

Something that worries me more about PowerPivot is the fact that it seems to have diverted the attention of the SSAS dev team. For SSAS 2008 we had few new features, although the performance improvements were very welcome. For 2008 R2 I can only think of one new feature in SSAS, and that’s the ability to use calculated members in subselects that will allow Excel 2010 to use time utility dimensions properly (I’ll blog about that at some point). Even though work on good old server-side SSAS will resume for the next major release of SQL Server I worry that PowerPivot will take priority in the future. If this happened it would be bad for me and other BI partners from a business point of view, and seems crazy given that SSAS has been such a successful product in the enterprise sector; it’s not like there aren’t a lot of new features and fixes that could be done. Shades of IE6 and Microsoft getting complacent once it’s cornered a market, I think.

Last of all on PowerPivot, I suspect that there is something new relating to it in the roadmap that hasn’t been announced yet. David DeWitt devoted his keynote on Thursday to it, the specifics of column-store databases and the Vertipaq engine (which is the new in-memory storage engine that PowerPivot uses), and at the end hinted at this saying that although he couldn’t make any announcements, those people who had been paying attention might have some ideas on what the future held for it. Of course I hadn’t been paying attention properly, but the obvious thing would be to integrate it with the relational database somehow. Given that PowerPivot is now being hosted inside Sharepoint, why not host it in SQL Server too? It’s already very table and join friendly, and I could imagine a scenario where it was used inside SQL, pointed at a schema, some kind of proactive caching kept the data in SQL in synch with the data in the Vertipaq store, difficult BI calculations could be expressed in DAX, but the whole thing was transparent to TSQL. Imagine integrating that with Madison too!

Moving on, the other thing that has become clear to me is that I really have to sit down and learn Sharepoint (or at least the relevant bits of it) properly. It’s at the heart of Microsoft’s BI strategy and there’s no avoiding it. I have to admit to some mixed feelings about this move though, and I know other people I talked to at the conference share them. Partly it’s because, in the past, there were BI specialists and there were Sharepoint specialists and we didn’t necessarily have much to do with each other; now,  though, the two worlds are colliding and I’m outside my comfort zone. You might say that Sharepoint has been part of the MS BI strategy for ages now, what with PerformancePoint etc, but I see an awful lot of MS BI customers in my work and I very rarely seem to see any Sharepoint, although it could be because I’m not looking out for it. A more valid objection is that the need for Sharepoint Enterprise Edition CALs adds a lot of extra cost to a project; and from a technical standpoint Sharepoint itself carries a very big overhead – its installation and maintenance may put a lot of customers off if they don’t already have a company-wide Sharepoint strategy, and if they do have one they may not be willing to go to 2010 for some time. Sharepoint might be just too big for some customers to swallow, and be a difficult sell for BI partners.

I’d like to stress though, once again, that I see the considerable technical benefits for using Sharepoint for BI, and even if the reception of the latest wave of PerformancePoint has been somewhat muted (eg the realisation that the decomposition tree has been tacked on at the last minute and isn’t properly integrated) I am impressed with what’s coming with Excel 2010 and Excel Services too; for example I think the Excel Services REST API is very cool indeed, and as a SSAS client Excel 2010 is a big improvement on 2007 (which wasn’t all that bad either). I’ve decided I also need to learn Excel properly now as well – get to know all those advanced Excel functions, use Solver and all that. Once again two worlds are colliding: the Excel guys and the SSAS guys are going to have to learn a lot more about each others’ technologies for truly effective BI applications to get built.

Anyway, I think this post has gone on quite long enough now. As always, your comments on everything I’ve written here would be much appreciated.