PROOF

Seen on Claudia Imhoff’s blog today, PROOF seems to be an interesting new tool from Symmetry Corporation:
it sounds like it does roughly what the old SSABI (SQL Server Accelerator for BI) did. I’ll have to see if I can get an eval copy…

Local Cubes and .NET 2.0

Here’s a little nugget of information I thought I’d post here for future reference: if you want to create local cubes direct from relational sources in AS2005 you need to have .NET 2.0 installed on your machine, although you don’t need to if you’re creating your local cubes from server cubes. I’m sure someone will find this useful… thanks to Siva for letting me know about this.

Redundant Attribute Relationship Health Warning!

My ex-colleague Jon might spend far too much time pseudonymously posting silly comments on blogs, but he deserves an extra helping of chocolate mousse for finding this problem to do with redundant attribute relationships – this is something everyone designing AS2005 cubes needs to be aware of!
 
Imagine you had a Geography dimension with three attributes: City (the key attribute), State and Country, where each Country has many States, and each State has many Cities. Now you probably already know that if you have built your dimension with the wizard, the chances are that your dimension will have two relationships pre-built – between City and State and City and Country – and that you need to change this so that there are relationships between City and State and State and Country in order to get optimal performance and to help aggregation design. AS2005 understands transitive relationships between attrbutes, and so even though there are just two relationships built in your dimension it knows that because City is related to State and State is related to Country, there is a transitive relationship between City and Country.
 
BUT, did you know that bad things can happen if you actually build the relationship between City and Country in your dimension? Before Jon found out about this, I had understood that the only effect was that having this redundant relationship might increase the size of the dimension and increase processing times. However, it turns out there are two other potentially serious side-effects:
1) MDX Script Scopes may include unexpected extra cells. So, for example, if you’re assigning values to a subcube in your MDX Script (and most of the calculations created by wizards in BI Dev Studio do this) then you may see incorrect results with the redundant attribute relationships in place.
2) Querying fact data may ignore some slices or return Nulls on certain related attributes when the measure group has granularity on an attribute different from the key attribute.
 
Apparently what’s happening is that, using our example, the redundant relationship between City to Country would cause AS to ignore the legitimate relationship between State and Country. Even more confusingly, if you were to create a user hierachy from Country to State to City then the incorrect behaviour would go away.
 
So the moral of all this is to be extra-careful when designing your dimensions and not to let any redundant attribute relations remain in there. This problem won’t be fixed in SP1 but there will be a warning in the dimension editor UI when redundant relationships are detected, though, which should make more people aware of it.
 
Apart from Jon, thanks also to Matt Carroll and Marius Dumitru on the AS dev team for taking the time to explain what’s going on here to me.
 
 

SQL2005 SP1 CTP Released

The CTP of SP1 for SQL2005 is now out:
There’s also an updated version of BOL. No idea what fixes exactly have been included though because the link to the KB article in the readme file is dead! However, for Analysis Services, I’ve heard about one potentially interesting change with solve order – see the following thread on MSDN forums for the full story:
I wonder if the ability to cache the results of MDX Script calculations is also going to be there?
 
UPDATE: The KB article is now up and there are lots of AS fixes in there. Here’s the link:

Configuring and Connecting Analysis Services 2005 Webcast

SQL 2005 Service Manager

Courtesy of Mike Gunderloy, I’ve just come across the download for SQL2005 Service Manager. It does exactly what you think, ie it’s a tool that does for SQL2005 (and AS2005 and all the rest) what SQL Server Service Manager did for SQL2000. You can get it here:
 
Now why didn’t Microsoft provide this?

Where to get the OLE DB for OLAP v9.0 Provider

The other day I was looking for the OLE DB for OLAP v9.0 Provider to install on a client machine but couldn’t find it on the AS2005 install CD (which is where it was in the June CTP, and where you got PTS for AS2000); I then found a newsgroup post stating that it was only going to be available as a separate download from now on. Here’s where you can now get it and other useful stuff like ADOMD.Net, the latest PTS for AS2000, various other OLE DB providers etc:
It’s called the ‘Feature Pack for Microsoft SQL Server 2005’. One to bookmark, I think.

Handling Time Zones

As a general rule, whenever several different people ask me about the same problem I begin to think that it’s time to blog about it. In this particular case the problem is how to display data by different time zones; I’ve been asked about this a few times recently, possibly because AS2005’s ability to handle multiple languages and currency conversion means that people now feel like they can build ‘international’ cubes and this requirement arises as a result. For example imagine you were collecting usage stats for your web application and were interested in seeing how many visitors you had at different times of the day, but users in your UK and US offices wanted to analyse this data by their local time. In your cube you’d need to have a Time Zone dimension and/or attribute which, when you chose the GMT member, would mean that the selection 3:00 on 1st November 2005 on your Time dimension would show the same values as if you’d chosen the EST Time Zone and selected 22:00 on 31st October 2005. Note that this isn’t the same problem that Ralph Kimball talks about here, which is doesn’t allow you to analyse data in anything other than the local time zone of the event or the standardised time zone.
The obvious first step towards solving this problem is to hold the data in your fact table at one common time zone, probably GMT. However doing the time zone translation isn’t as simple as finding how many hours + or – each time zone is with regard to GMT, of course. The big complicating factor is Daylight Saving Time, and as Wikipedia shows it’s a pretty big complicating factor – in my opinion you wouldn’t want to write code to model these rules, you’d be much better off having a table which for every hour of every day in every time zone you were interested in would give the equivalent hour and day in GMT. So if you were interested in 20 time zones and had 5 years of data in your fact table, you would have approximately 5*365*24*20=876000 rows in this table. Populating this table would be a pain though… I wonder if there’s a web service out there which can do this conversion for you, operated by people who keep on top of all the changes surrounding Daylight Saving Time such as the recent ones Wikipedia mentions in Indiana?
How you can then use this table in your AS2005 cube design is the really interesting question. My first thought was that even though this isn’t a many-to-many relationship (unless you wanted the duplicated hours that occur when DST moves backwards to be modelled as one member rather than two, which doesn’t seem sensible) you could model it as such – the table descibed in the previous paragraph would become your intermediate measure group, which joins to the fact table via a GMT Time dimension; this time dimension could then be reused (as a role-playing dimension) as your Time in Time Zone dimension, joined to just the intermediate measure group along with a separate Time Zone dimension. You could then hide your GMT Time dimension and your users would just have to select which ever time zone they wanted to analyse by on the Time Zone dimension and the Time in Time Zone dimension would behave exactly as described in my first paragraph.
However there are some drawbacks to this approach. First and foremost is that you couldn’t make use of any aggregations above the hour attribute, because the many-to-many relationship is resolved at this level. This would mean that if you wanted to run queries at the Month or Year level they might not perform as fast as you’d like. Also, the larger the table you’ve built your intermediate measure group from, the slower it will be to resolve the many-to-many relationship, although partitioning this measure group by Time Zone and possibly Year to will mitigate this effect.
The alternative is to model all this as a single regular dimension. You’d have a GMT Hour attribute, an Hour in Time Zone attribute (with a one-to-many relationship between the two) and a Time Zone attribute; the key thing to note here is that every hour in a time zone becomes a single member, so the overall dimension size is much larger than in the many-to-many approach. Since AS2005 can (so I’m told) handle dimensions of many million members even in 32-bit the size issue wouldn’t be the showstopper it would have been in AS2000, although it would presumably have some impact on overall performance. The fact that you could now use aggregations above the Hour level would, on the other hand, be a big factor in improving performance which I would guess would more than make up for the negative impact of dimension size. Judicious hiding of attributes and the use of user hierarchies (for example Time Zone->Year->Month->Day->Hour) would mean that the complexity of the dimension could be hidden from the user; the only real drawback I can think of is that because every Hour, Month, Year etc would be a separate member on the dimension, it would be difficult for users to create reports in an ad-hoc tool which selected a couple of months on a visible axis and then allowed them to slice by time zone – something that wouldn’t be an issue with the many-to-many approach.
 
One other approach I thought about was using a stored procedure called from an MDX Script assignment to perform the time zone translation – but as I said earlier, trying to handle all the wrinkles of DST changes in code would be a real nightmare, and calling a stored procedure multiple times to get values for each Hour probably wouldn’t perform well either.
I’d be interested to hear other people’s thoughts on how to solve this problem; even better, it would be good to hear from anyone who’s had to actually implement a solution. So post some comments! It would be useful to know whether the regular dimension approach does in fact perform better than the many-to-many approach, for instance…
UPDATE (March 2015)  – almost ten years after writing this post, I’ve been asked to revise it. With rather more experience of SSAS than I had back then, I can say that my proposed solution for a single dimension is just wrong and wouldn’t work, and using a stored proc wouldn’t be feasible either. The only workable solution would be the many-to-many approach.

SQL2005 RTM

I’m a few days late onto this news (I dunno, you take a day off on holiday and this happens) but in case you didn’t know, SQL2005 was released to manufacturing last Thursday. It’s available to download from MSDN if you’re a subscriber:
 
 

BI Certification from Microsoft

Euan Garden notes that the new SQL 2005 certifications have been announced, one of which is the MCITP: Business Intelligence Developer. See here for more details:
 
Looks like I’ll have to get one of these.