BIDSHelper beta released

Via Darren Gosbell, news that the beta of BIDSHelper has been released:
You can download BIDSHelper from Codeplex here:
BIDSHelper contains so many cool features that make your life as an AS developer easier it’s untrue. The MDX Script updating (bypassing the sometimes 30 minutes waits I’ve sometimes seen) feature is worth the download alone. Kudos to Darren and Greg Galloway for their work on this.

Companion for MS Analysis Server

Via Teo Lachev, here’s news of a new tool for doing basic monitoring and admin for Analysis Services called ‘Companion for MS Analysis Server’ from SQLMinds:
I’ve long thought that there’s a big gap in the market for a tool like this, and I’ve signed up to the beta so hopefully I’ll be taking a look at it soon.

The Rumour Mill

Mark Whitehorn has an interesting non-story over at the Register:
What was going on here? Perhaps something was going to be announced that couldn’t be announced at the last minute? Well, I did hear some rumours about… hold on, someone’s knocking at my hotel room door. Who could it be at this time of night? Be back in a minute…

An MDX Challenge: Debtor Days

As I said in my previous post, last night’s event at the Experience Music Project was good fun. There was live music, free booze and of course – and this is probably evidence that I need to be taken away by the men in white coats – conversation naturally turned to the topic of tricky MDX problems. Richard Halliday (I hope I’ve got your name right) came up with an interesting calculation for me which he referred to as ‘debtor days’: if I understood correctly, he had a cube with a measure containing values of individual debts incurred by customers and was interested in finding out for any given customer and any given day, the minimum number of days it took from the current date backwards in time for the total of that customer’s debts to reach a given value. After digesting it overnight I had a go at implementing it this morning and found that it was a really fascinating problem – although getting the right value is fairly tricky and worth discussing, there are some great opportunities for optimisation too which I wanted to blog about.
First let’s translate the problem into Adventure Works: we’ll use the Date dimension and the Customer dimension, and find out how many days you need to go back from the current date for the current member on the [Customer Geography] user hierarchy for the cumulative total of Internet Sales Amount to exceed 10000. Here’s my sample query with my calculated member:

with member measures.daysto10000 as
count(nonempty(null:{{[Date].[Date].currentmember} as mycurrentdate}.item(0),[Measures].[Internet Sales Amount]) as mynonemptydates)=0,
iif(isempty([Measures].[Internet Sales Amount]) and (not isempty((measures.daysto10000, [Date].[Date].currentmember.prevmember)))
, (measures.daysto10000, [Date].[Date].currentmember.prevmember)+1,
count({{} as myresults,
, iif(count(myresults)=0,
sum(subset(mynonemptydates, count(mynonemptydates)-mynonemptydates.currentordinal), [Measures].[Internet Sales Amount]) > 10000
, {mynonemptydates.item(count(mynonemptydates)-mynonemptydates.currentordinal)} as myresults
, myresults)
, myresults
})=0, null,
count(myresults.item(0): mycurrentdate.item(0))-1

([Date].[Calendar].[Calendar Year].&[2004]
[Date].[Calendar].[Date]) on 0,
non empty
[Customer].[Customer Geography].[State-Province].&[HH]&[DE]
[Customer].[Customer Geography].[Postal Code])
on 1
from [Adventure Works]

On a cold cache this executes in a touch under 15 seconds on my laptop. The select statement puts all of the days in the year 2004 on columns, all of the postal codes in Hamburg on rows, and slices on the calculated measure defined in the with clause. Here are some things to notice about the calculated measure:

  • The outermost iif simply says that if the set of dates from the start of the Date level to the current date contains no values at all for Internet Sales Amount, then return null. If there are values then the set of dates with values is stored in the named set mynonemptydates, declared inline.
  • The next level of iif represents a recursive calculation, and I found that this was one of the extra touches that made a big difference to performance. It says that if the current date has no value for Internet Sales Amount but the value of the calculated measure is not null for the previous day, then simply add one to the value of the calculated measure from the previous day – this avoids a lot of extra work later on.
  • The next level of iif is where I do the main part of the calculation and this is going to need a lot of explaining… Put simply, I’ve already got from step 1 a set of members representing the dates from the start of time to the current date which have values in and what I want to do is loop through that set from the end backwards doing a cumulative sum, stopping when that sum reaches 10000 and then taking the date I’ve stopped at and finding the number of days from that date to the current date. Originally I attempted the problem like this:

    , sum(subset(mynonemptydates, mynonemptydates.currentordinal-1), [Measures].[Internet Sales Amount]) > 10000
    ,1) as myresults
    )=0, null,
    count(myresults.item(0): mycurrentdate.item(0))-1

    Here I’m filtering the entire set to get the set of all dates where the sum from the current date to the end of the set is greater than 10000, then getting the last item in that set. This seemed inelegant though – if we had a large set then potentially we’d be doing the expensive sum a lot of times we didn’t need to do it. It seemed better to loop through the set backwards and then somehow be able to stop the loop when I’d reached the first member which fulfilled my filter criteria. But how was this going to be possible in MDX? I didn’t manage it completely, but I did work out a way of stopping doing the expensive calculation as soon as I’d found the member I was looking for. Let’s take a look at the specific section from the main query above:

    count({{} as myresults,
    sum(subset(mynonemptydates, count(mynonemptydates)-mynonemptydates.currentordinal), [Measures].[Internet Sales Amount]) > 10000
    , {mynonemptydates.
    item(count(mynonemptydates)-mynonemptydates.currentordinal)} as myresults
    , myresults)
    , myresults
    count(myresults.item(0): mycurrentdate.item(0))-1

    What I do first is declare an empty set inline called myresults. I then use the generate function to loop through the set nonemptydates. The first thing you’ll see after the generate is an iif checking if the count of myresults is 0, and the first time we run this check it will be so we need to do our cumulative sum. Because generate loops from the start of a set to the end, and I want to go in the other direction, I get the current ordinal of the iteration and then find the cumulative sum from the item that is that number of members away from the end of the set up to the end of the set. Once I’ve got the cumulative sum I can check if it is greater than 10000; if it is, then I return a set from the iif statement and at the same time overwrite the declaration of myresults with a set of the same name which now contains that one member. As a result, at all subsequent iterations the test count(myresults) returns 1 and I don’t try and do the cumulative sum again. I was quite pleased at finding I could do this – I hadn’t realised it was possible. It only makes about 0.5 seconds difference to the overall query performance though.

  • Finally, on the last line of the calculated measure I can take the member I’ve got in the set myresults and using the range operator find the number of days between it and the current date, which I’ve also stored in a named set called mycurrentdate.

Pretty fun, eh? No, please don’t answer that question. But if you can think of an alternative, better-performing way of solving this problem I would love to hear it…

UPDATE: it turns out that Richard Tkachuk not only had a go at the same problem (although his interpretation of what it is is slightly different) but got just as excited about it as I did, and wrote up his findings here:

Microsoft BI Conference Thoughts

The first Microsoft BI Conference has just finished, so how did it go? Hmm, well, let’s say it’s a typical Microsoft version 1.0. There were a lot of good features, several things which could have been much much better but overall everyone was very enthusiastic for the future. I certainly enjoyed myself a lot and will be coming back next year.
First of all, let’s talk about the positives. There were 2800 people there so this was a fairly big conference, much bigger than I (and I suspect the organisers) were expecting. Almost everyone who is anyone in the world of Microsoft BI was there and that made it a great opportunity for networking and gossiping. I finally had the chance to meet a whole bunch of people who I’ve only ever had contact with via email plus many others who read this blog and just wanted to say hello. The only downside to this was that I spent so much time hanging out and talking to people that I didn’t get a chance to look round all the exhibition stands and didn’t, for example, get to see many of the third party tools (eg Panorama) in the kind of detail I’d been wanting to. The facilities were very good and last night’s party at the Experience Music Project was lots of fun too; there’ll be at least one picture from that event I’ll be posting up next week. In terms of content there were a couple of excellent sessions of the kind you could only have at a dedicated BI conference, and I learned a lot from them.
The negatives? I got the distinct impression that the organisers had initially expected to run a business-focused conference and then found that the majority of people attending were technical. The agenda was published way too late and there were way too many breakout sessions that were simply an excuse for other customers and consultancy firms to blather on about how successful their project had been. Many of the technical breakout sessions were uninspiring walk-throughs, there seemed to be a lot of repeated content for example on PerformancePoint, and there were a few sessions that had been given at conferences before over the last year or so and where the content was already familiar. I heard a lot of people complaining they wanted a much greater level of technical detail and at the same time I came across a lot of people in technical sessions who were out of their depth; perhaps the tracks should have been more clearly defined and the content of each session thought through better? For me, though, the chalk talks made it all worthwhile, although it was a pity that the chalk talk theatres didn’t have enough space and that they were so noisy. I found most of the keynotes pretty dull, with Steve Ballmer shaming all of the other keynote speakers with a masterclass in how to hold an audience’s attention.
Other bloggers who covered the conference in much greater depth than I have done are Patrice Truong and Mike Diehl.
(UPDATE: Patrick Husting and Vidas Matelis also have blogged about the conference)
(UPDATE 2: More thoughts from Marco Russo and Thomas Ivarsson)
(UPDATE 3: Here are what Jesse Orosz and Peter Koller thought)

Calculating Seasonally-Adjusted Values in MDX

My chalk talk yesterday at the BI Conference went…. ok. Unfortunately, despite having asked for projector so I could show my demo code it turned out the screen I got was pretty small so only the people in the front few rows could see anything. So I told everyone that I would put the code up on my blog so they could study the code in more detail and this is the first such post.

One of the points I wanted to make when discussing how to calculate trend values for KPIs in MDX is that we really need to think about the formulas we’re implementing to make sure they actually provide useful results. Unfortunately the business people we talk to often have even less idea than we do about what kind of calculations actually make sense: there’s a real need for input from someone with a statistical background, and I only wish I knew more about this subject. For example if your sales vary a lot by season (you might be selling ice cream, so sales are always going to be much higher in summer than in winter) there’s no point looking at previous period growth to determine whether your sales are growing or not because the strong seasonal component will mask the long term trend.

As I said, for this presentation I looked at a book I’ve had hanging around on my shelf for years which is basically a stats textbook for managers and MBA students: "Quantitative Methods for Decision Makers" by Mik Wisniewski. It explains some simple algorithms and techniques for analysing data that are very relevant for BI practitioners and I took the method for calculating a seasonally-adjusted total and translated it into an MDX query on Adventure Works as an illustration. Here it is:

–Calculate a 12-month moving average from 6 months up to and including
–the current member to 6 months after
member measures.movingaverage as
, [Measures].[Internet Sales Amount])

–Centre the underlying trend, by taking the average of the moving average
–for the current member and the next member
member measures.trend as
, measures.movingaverage)

–Find the deviation by dividing Internet Sales by Trend
member measures.deviation as
[Measures].[Internet Sales Amount] / measures.trend

–Find the average deviation for any given month
member measures.averagemonthlydeviation as
, {[Date].[Month of Year].currentmember})
, measures.deviation)

–Find the sum of these average deviations for each month
–then divide by 12 to find out by how much each month’s average deviation
–needs to be adjusted by
member measures.adjustedcomponent as
(sum(head([Date].[Calendar].[Month].members,12), measures.averagemonthlydeviation)
/ 12)

–Adjust the monthly deviation by the value we’ve just found
member measures.adjusteddeviation as
measures.averagemonthlydeviation / measures.adjustedcomponent

–Finally find the seasonally adjusted Internet Sales Amount
member measures.seasonallyadjustedsales as
[Measures].[Internet Sales Amount] / measures.adjusteddeviation

–Run the query
select {[Measures].[Internet Sales Amount], measures.trend, measures.deviation
, measures.averagemonthlydeviation,measures.adjustedcomponent
, measures.adjusteddeviation, measures.seasonallyadjustedsales}
on 0,
[Date].[Month of Year].[Month of Year].members
on 1
[Adventure Works]

This works on the following theory. What we want to do is take the Internet Sales Amount measure and remove the seasonal variations so we can view the underlying trend. If we assume that there are three factors in play with our sales, the long term trend, seasonal variations and random factors (for example a health scare in July could kill our ice cream sales that month) then we can say that Sales = Trend * Seasonal Variations * Random Factors. The random factors we can’t do anything about, but taking this formula we can say that Trend  = Sales / (Seasonal Variations * Random Factors). To get here, first we estimate the trend by doing a twelve month moving average which hopefully will smooth out those seasonal variations; we then find the deviation from the average for each month and then find the average deviation for each month (eg I can say then that January sales are on average 10% lower than usual, Julys are on average 25% higher than usual). For a whole year we’d like these deviations to cancel each other out and add up to 12; of course they don’t so we now need to find what they do add up to then divide by 12, then divide each month’s average by this value. We can then take this value and divide Internet Sales Amount by it to find the seasonally-adjusted value, the combination of the trend and the random factors.

Hopefully I haven’t screwed up in my interpretation of how it works; it certainly isn’t the most robust or optimal bit of MDX I’ve written either but it shows what you could do if you wanted.

Officewriter *could* be licensed for inclusion in Katmai

OK, finally a chance to do some proper blogging now that my session is over and I’ve got hold of a wireless connection. The conference is going well – I can’t believe they’ve got 2600 people to come for a Microsoft BI conference! It just goes to show that the avalanche is starting.
And now, some news. Yesterday Jeff Raikes announced Microsoft was acquiring Softartisans in his keynote speech, which of course got everyone excited until they released that what he meant to say was that they haven’t bought the company just licensed the technology for possible inclusion in Katmai Reporting Services. I blogged about Softartisans’ product, Officewriter, a while ago and have actally just written a white paper for them on using their components inside SSIS to create a batch reporting solution (similar to what you can do with SSRS and data driven subscriptions) which should be available on their site any day now. Here’s the press release on it all:

DS Panel

It’s Day 0 of the BI Conference, I haven’t even registered and I’m already busy… This morning I went to a presentation by DS Panel (the guys who brought you the Santa Dashboard a few months back). They’ve been around in the Microsoft BI world a long time but for some reason I’d never seen their stuff so I was curious to check it out.
Now the question that I’ve been asking all third-party tool vendors, and which I’m sure all prospective customers are asking too, is why should I buy from you when PerformancePoint is just around the corner? For DS Panel and others about 80% of what they do is what all AS client tools do and certainly what PerformancePoint is going to do, so it’s the details, the unique features and the quality of execution that’s important. DS Panel have a new release of their core dashboarding product, DSP Performance Canvas (see for the press release, for the company home page) and it certainly looks pretty and seems easy to use; their use of AJAX means it’s extremely responsive for a web app too. What caught my eye was their integration with various enterprise search engines, something I blogged about a while ago although I’m a bit more positive about the idea than I was then. They also have a cool solution for dashboarding on mobile devices, which I’ve not seen anyone else do, and where their use of sparklines is really effective; and a few other nice features such as the ability to add annotations to specific slices of data. Overall definitely worth checking out if you’re in the market for a dashboard application, need easy Sharepoint integration, and can’t wait for PerformancePoint to be released or reach a useable level of maturity.
Hmm, now time for me to go over to the conference centre and register. There will be a lot more blogging to come over the next few days: more product reviews, reports of presentations and maybe even another podcast!
%d bloggers like this: