Breaking up large dimensions

One clever trick I learned on the newsgroup a few years ago was from someone called dxd, who wrote it up in this post and others in the same thread:
It describes how to break up a single, large dimension that you need in the cube but which users don’t themselves want to view most of the time (typically this would be a degenerate/fact dimension). In the AS2K world this was useful for getting multi-select to work with distinct count calculations; in AS2005, of course, distinct counts already work with multi-select but I recently found a new application for this technique which I thought I’d share.
 
I was doing a PoC in a scenario which was similar to the following: imagine a data warehouse which contains data recording purchases in a supermarket with two fact tables. The first fact table contains data on the whole transaction, with a transaction id as the primary key and other dimensions like Customer and Store and a measure recording the value of the whole transaction; the second contains each purchase in the transaction, has all the same dimensions as the first fact table but also includes a Product dimension. The users wanted to run queries like ‘Show me the total value of all transactions which contain Product X’, so it was clearly a distinct sum problem and needed a many-to-many relationship between the Product dimension and the first fact table with the second fact table as the intermediate measure group.
 
Unfortunately, the only way to be sure of this working properly was to link the two fact tables together using the transaction id – but there were hundreds of millions of transactions, so building a MOLAP dimension was out of the question and I wasn’t sure that a ROLAP dimension would perform well enough. Then I remembered the approach in the newsgroup post above and realised that I could break up the transaction dimension into three identical dimensions of 999 members each. It’s quite easy to visualise how this works. Imagine you have a transaction with the following id:
123456789
You could express this as three different dimensions with keys of 123, 456 and 789. And of course since each of these three dimensions was identical, I only needed to build it once and could use role-playing dimensions for the other two. I added them to the cube and made them invisible, added them to both the fact tables and bingo – I had the dimensions I needed to make the many-to-many relationship work.
 
Performance resolving the many-to-many relationship seemed very good when I looked at the queries I ran in Profiler. Unfortunately I ran into the problem that Mark Hill talks about here:
…and overall performance of the cube wasn’t great (I assumed I’d messed up my partition definitions), but if I had used a ROLAP transaction dimension instead I’m pretty sure that the cube would have been unusable.
 
Thinking some more about other applications, I wonder if this could be used to work around the problems that are becoming evident with drillthrough in AS2005? See
and
I think this deserves some further investigation… 

BI Documenter

Just come across this new tool for documenting SQL Server and Analysis Services 2005 databases called BI Documenter:
A touch pricey perhaps, but looks quite slick and has some good features.

VSTS4DB and Analysis Services

I would imagine that most people who read this blog also read Jamie Thomson’s SSIS blog too, but just in case you don’t I thought I’d highlight his efforts to get some Analysis Services-related functionality into Visual Studio Team System for Databases:
Here’s the blog entry on Richard Waymire’s blog asking for feedback:
…the original msdn forums thread:
…and the place to submit feedback and vote on these ideas:

Can I run SQL against an Analysis Services cube?

I take what is probably an unhealthy interest in the statistics that MSN Spaces generates about visits to this blog, and in particular the searches that people run on Google which land them here. Over the last few months I’ve noticed that the someone has been searching on the phrase "Can I run SQL against an AS cube?" on a fairly regular basis; since I mentioned the fact that you can in a post a while ago, I assume that’s why they always come here, but I thought it would be good to answer the question in more depth. Never let it be said that I’m not responsive to the needs of my audience – although I’ll draw the line at pandering to the many people who seem to be looking for pictures of "hot bi action"…
 
In fact, you’ve always been able to run SQL against Analysis Services right back to the days of OLAP Services. Traditionally this was only useful when you wanted to create local cubes from server cubes and the subset of SQL supported by AS was so limited you wouldn’t want to use it for anything else; the fact that this stuff isn’t documented anywhere, possibly intentionally, didn’t help. However, when I started to do some research into AS2005 SQL I was pleasantly surprised at how much I could do. My starting point was to look at some of the SQL queries that are generated when AS2005 creates local cubes. Take the following MDX statement which uses the CREATE GLOBAL CUBE syntax to create a local cube from Adventure Works:
 
CREATE GLOBAL CUBE [Test]
STORAGE ‘c:\MyCube.cub’
FROM [Adventure Works]
(MEASURE [Adventure Works].[Internet Sales Amount],DIMENSION [Adventure Works].[Geography],DIMENSION [Adventure Works].[Product])
 
If you run a profiler trace while this executes, you’ll see a whole bunch of SQL statements are generated in the background to get the data to populate the local cube. Here are two representative examples:
 
SELECT
  DISTINCT
  KEY ( [Adventure Works].[$Product].[End Date],0 )
  AS [oduct0_0], NAME ( [Adventure Works].[$Product].[End Date] )
  AS [oduct0_1], MemberValue ( [Adventure Works].[$Product].[End Date] )
  AS [oduct0_2]
   FROM [Adventure Works].[$Product]
 
SELECT  AGGREGATE ( [Adventure Works].[Internet Sales].[Internet Sales Amount] )
  AS [Sales0_0], KEY ( [Adventure Works].[$Product].[Product],0 )
  AS [oduct1_0]
   FROM [Adventure Works].[Internet Sales]
    NATURAL JOIN
   [Adventure Works].[$Product]
  GROUP BY [oduct1_0]
 
The first thing to notice is that the ‘tables’ we’re querying are either dimensions or measure groups. A dimension ‘table’ has the naming convention [CubeName].[$DimensionName] (note the dollar sign) and a measure group ‘table’ has the naming convention [CubeName].[MeasureGroupName]. We can obviously do joins between them using the NATURAL JOIN syntax; we can also do GROUP BYs and use functions like AGGREGATE (which I assume aggregates the measure value by its cube aggregation function), KEY, NAME and MEMBERVALUE (which as far as I can see allow you to retrieve the key, name and membervalue properties associated with a dimension attribute). My memory might not be entirely accurate on this but I’m fairly sure that none of the above could be done with AS2000 SQL. You can also do WHERE clause filtering too, but it looks like you can only AND conditions and not OR them, so

SELECT AGGREGATE ( [Adventure Works].[Internet Sales].[Internet Sales Amount] )AS [Sales0_0], KEY ( [Adventure Works].[$Product].[Product],0 ) AS [oduct1_0], [Adventure Works].[$Product].[Product],
[Adventure Works].[$Date].[Calendar Year]
FROM [Adventure Works].[Internet Sales]
NATURAL JOIN
[Adventure Works].[$Product]
NATURAL JOIN
[Adventure Works].[$Date]
WHERE [Adventure Works].[$Product].[Product] = ‘Mountain-100 Black, 48’
AND [Adventure Works].[$Date].[Calendar Year]=’CY 2002′
GROUP BY [oduct1_0], [Adventure Works].[$Date].[Calendar Year]

 

runs, whereas
 
SELECT AGGREGATE ( [Adventure Works].[Internet Sales].[Internet Sales Amount] )AS [Sales0_0], KEY ( [Adventure Works].[$Product].[Product],0 ) AS [oduct1_0], [Adventure Works].[$Product].[Product],
[Adventure Works].[$Date].[Calendar Year]
FROM [Adventure Works].[Internet Sales]
NATURAL JOIN
[Adventure Works].[$Product]
NATURAL JOIN
[Adventure Works].[$Date]
WHERE [Adventure Works].[$Product].[Product] = ‘Mountain-100 Black, 48’
OR [Adventure Works].[$Date].[Calendar Year]=’CY 2002′
GROUP BY [oduct1_0], [Adventure Works].[$Date].[Calendar Year]
 
produces an error, which limits its usefulness. Nor can I get any query which uses the COUNT function to work, for example:

SELECT COUNT(*)
FROM [Adventure Works].[Internet Sales]
WHERE [Adventure Works].[$Product].[Product] = ‘Mountain-100 Black, 48’
AND [Adventure Works].[$Date].[Calendar Year]=’CY 2002′

 
gives an error. So while we’ve got the potential to do some useful things here, it isn’t exactly the most useful implementation of SQL I’ve ever seen. There are some other pointers to other functions that are supported in the Analysis Services 2005 cartridge, found in
C:\Program Files\Microsoft SQL Server\MSSQL.5\OLAP\bin\Cartridges\as90.xsl
on my machine. This is the file that contains the instructions for AS on how to generate the SQL used for processing cubes, and there are what looks like several other functions mentioned in here that could be worth looking at.
 
But do we want to use SQL to query Analysis Services anyway? I’ve talked about this before, here:
I don’t think so, and it seems to me that although there are some things that are easier to express in SQL than MDX the acutal subset of SQL that is implemented in here is crippled in some important respects. MDX is so much better suited for BI queries and although I know a lot of people struggle with it at first, it’s definitely worth the pain in the end. 
 
 

Multiple Time Utility Dimensions

Time Utility Dimensions (or, strictly speaking with AS2005 attribute hierarchies) are useful things. I first learnt about them from George Spofford through various newsgroup postings and the first version of ‘MDX Solutions’ and they have become one of the most commonly used design techniques in Analysis Services cubes; indeed, they have been built into the product in the form of the ‘Define Time Intelligence’ functionality of the ‘Add Business Intelligence’ wizard.

 

So what is there that’s new to say about them? Well, sometimes it’s useful to have more than one. Take for example the following bit of MDX Script created by the wizard that creates two calculated members (Year-To-Date and Year-Over-Year Growth) on the [Year-Month Period Calculations] attribute on a simple time dimension:

 

/*

  Begin Time Intelligence script for the [Period].[Year-Month] hierarchy.

*/

 

Create Member

  CurrentCube.[Period].[Year-Month Period Calculations].[Year to Date]

  As "NA" ;

 

Create Member

  CurrentCube.[Period].[Year-Month Period Calculations].[Year Over Year Growth]

  As "NA" ;

 

 

Scope(

       {

         [Measures].[Sales]

       }

) ;

 

// Year to Date

  (

    [Period].[Year-Month Period Calculations].[Year to Date],

    [Period].[Year].[Year].Members,

     [Period].[Month].Members

  ) =

 

  Aggregate(

             { [Period].[Year-Month Period Calculations].DefaultMember } *

             PeriodsToDate(

                            [Period].[Year-Month].[Year],

                            [Period].[Year-Month].CurrentMember

             )

  ) ;

 

 

// Year Over Year Growth 

  (

     [Period].[Year-Month Period Calculations].[Year Over Year Growth],

    [Period].[Year].[Year].Members ( 1 ) : Null,

     [Period].[Month].Members

  ) =     

 

  ( [Period].[Year-Month Period Calculations].DefaultMember ) –

  ( [Period].[Year-Month Period Calculations].DefaultMember,

    ParallelPeriod(

                    [Period].[Year-Month].[Year],

                    1,

                    [Period].[Year-Month].CurrentMember

    )

  ) ;

 

  (

    [Period].[Year-Month Period Calculations].[Year Over Year Growth],

    [Period].[Year].[Year].Members ( 0 ),

    [Period].[Month].Members

  ) = Null ;

 

 

End Scope ;

 

/*

  End Time Intelligence script for the [Period].[Year-Month] hierarchy.

*/

 

This works fine, but what happens when your users ask to be able to look at Year-Over-Year Growth for the Year-To-Date values? Of course you could create a third calculated member on the hierarchy which implemented this, but you can probably guess where I’m going with this: the more combinations of calculations that you need, the greater the number of calculated members you need to create and maintain.

 

Here’s where the value of multiple time utility attribute hierarchies comes in because they can help you control this explosion of calculated members – you can make them work together instead. To do this, first you need to create a new attribute hierarchy on your time dimension by dragging and dropping the same column as was used to create your original time utility attribute hierarchy; you need to make sure to set its IsAggregatable property to false so that it doesn’t have an All Member. Then you need to decide which calculations should go on which hierarchy, which of course dictates which combinations of calculations you have available. Here’s the MDX Script example from earlier rewritten so that the two calculated members are now on two different attribute hierarchies called [Period].[Year-Month Period Calculations] and [Period].[Year-Month Period Calculations2]:

/*

  Begin Time Intelligence script for the [Period].[Year-Month] hierarchy.

*/

 

Create Member

  CurrentCube.[Period].[Year-Month Period Calculations].[Year to Date]

  As "NA" ;

 

Create Member

  CurrentCube.[Period].[Year-Month Period Calculations 2].[Year Over Year Growth]

  As "NA" ;

 

 

 

Scope(

       {

         [Measures].[Sales]

       }

) ;

 

// Year to Date

  (

    [Period].[Year-Month Period Calculations].[Year to Date],

    [Period].[Year].[Year].Members,

     [Period].[Month].Members

  ) =

 

  Aggregate(

             { [Period].[Year-Month Period Calculations].DefaultMember } *

             PeriodsToDate(

                            [Period].[Year-Month].[Year],

                            [Period].[Year-Month].CurrentMember

             )

  ) ;

 

 

// Year Over Year Growth 

  (

     [Period].[Year-Month Period Calculations 2].[Year Over Year Growth],

    [Period].[Year].[Year].Members ( 1 ) : Null,

     [Period].[Month].Members

  ) =     

 

  ( [Period].[Year-Month Period Calculations 2].DefaultMember ) –

  ( [Period].[Year-Month Period Calculations 2].DefaultMember,

    ParallelPeriod(

                    [Period].[Year-Month].[Year],

                    1,

                    [Period].[Year-Month].CurrentMember

    )

  ) ;

 

  (

    [Period].[Year-Month Period Calculations 2].[Year Over Year Growth],

    [Period].[Year].[Year].Members ( 0 ),

    [Period].[Month].Members

  ) = Null ;

 

 

End Scope ;

 

/*

  End Time Intelligence script for the [Period].[Year-Month] hierarchy.

*/

 

Now the user can crossjoin the two calculated members in their queries to get the Year-Over-Year Growth for the Year-To-Date, as well as use them separately.

 

In practice it’s actually quite tricky to determine which calculated members should go on each hierarchy and deal with the associated problem of solve order: for example if we added a 12 Month Moving Average calculation to this script, would we want to see the average of the Year-Over-Year growth or the growth of the average, or both or neither? There’s also another downside to this approach which is that if users found it hard to understand the concept of one time utility attribute hierarchy then they are guaranteed to be confused when they have more than one, though this should be balanced against the usability and maintainability problems inherent in having more than about ten combination-style calculated members on one attribute hierarchy. Overall, I’ve used this technique three or four times now (once even using three separate hierarchies) and it’s proved to be very successful when either the users have been very sophisticated or when the cube is only queried via Reporting Services.

Post SP1 cumulative hotfix package now available

See
A lot of fixes relevant to AS in there, including I notice some of the changes to solve order that I’ve blogged about before.
 
UPDATE: Euan Garden has a good explanation of what this release is and what it means here:

Microsoft, BI, Search and Advertising

A few months ago I posted the following entry on Microsoft, BI and Search:
Most of it was guesswork. However, while checking out the following SEO blog:
I did a double-take at the following entry:
Now, surely that is the infamous Proclarity decomposition tree control? Take a look at it in action here:
In fact, a lot of the MS Ad Center links in the blog above seem to suggest that AS2005 is being used quite heavily…

More thoughts on stored procedures

I know I promised more content on stored procedures a few months ago, but, well, you know… Once I found out that you couldn’t actually run an MDX query from them (unless you used the ADOMD.Net client library and opened a connection from within the sproc, which seems a pretty silly thing to do) and can’t do stuff like dynamically create calculated members or named sets with them, then I realised I couldn’t implement any of my cool ideas.
 
Anyway, I have been thinking about them again quite a lot recently. For example, I had some contact with Mark Mrachek about this post on his blog about drillthough:
I had already been contacted by someone having the same problem, and thinking some more about we came up with a possible solution using an action which calls an AS sproc similar to the one I posted a while ago to find the currentmember on every dimension (see http://spaces.msn.com/cwebbbi/blog/cns!7B84B0F2C239489A!586.entry), and which in turn passes the keys of each member through to a SQL sproc to do the drillthrough. Mark has promised to blog about the full solution when he’s had time to implement it.
 
Similarly, today on Charlie Maitland’s blog he talks about how to filter dimension members using wildcards (see http://charliem.wordpress.com/2006/04/26/wild-card-mdx-searching/). It seems to me that this would be a prime candidate for a sproc – there is a lot of string functionality in .NET that could be very useful in MDX; another example would be the way that you couldn’t use the VBA REPLACE function in MDX either.
 
Finally, there are some things which are very complex in MDX which could be simplified no end if they were put into a sproc. Two examples would be the discussion on this blog last year about tuning YTD-style calculations (see http://spaces.msn.com/cwebbbi/Blog/cns!1pi7ETChsJ1un_2s41jm9Iyg!107.entry and http://spaces.msn.com/cwebbbi/Blog/cns!1pi7ETChsJ1un_2s41jm9Iyg!111.entry) and problems such as finding the count of members on an attribute which have the same first letter in their name as the currentmember on that attribute (see http://groups.google.co.uk/group/comp.databases.olap/msg/75fb29730b96f23b for how to do it).
 
So, to get to the point, I was thinking that solving individual problems and posting code up on this blog was not the best way to go. I’m not the world’s greatest .NET coder by any means, and rather than just being able to see the code it would be much better if there was one dll which people could download to get at all this useful stuff. Surely it would be much better if everyone who was interested could collaborate on producing this dll, perhaps using something like a gotdotnet community (http://www.gotdotnet.com/workspaces/docs/about.aspx), so it would be much easier to add functionality and fix bugs. What does everyone think about this? Is there someone out there with a solid coding background who would be willing to help?

SQL Server 2005 SP1 out now

Handling Relative Time Periods

I recently had an interesting exchange with Michael Barrett Jensen and Thomas Pagel on the MSDN Forum about the best way to handle the problem of relative time periods, ie letting the user select a member called something like ‘current month’, which always returns the most recent month’s data (see http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=346119&SiteID=1 and please ignore the posts in the middle of the thread where I talk rubbish about when sets are evaluated). Now for some scenarios it’s a good idea to use named sets to hold the ‘current month’ or the last five months or whatever you want – if you have a client tool which handles sets well, then the advantage of this is that it works as a kind of dynamic selection and you get set of members you want each time. However not all front ends (eg Excel) support sets, they can be a difficult concept for some users to understand, and they are a bit inflexible. So as an alternative here’s the outline of a method for building a relative time dimension to add to your cube – it’s more of a sketch than a robust solution, so I’d be interested in hearing feedback from anyone who tries to implement it.
 
Anyway, the first thing I did was create a new table in Adventure Works DW from which I built my Relative Time dimension. For the purposes of this exercise, I only created two columns/attributes: Relative Year and Relative Quarter. Relative Year had two members, Current Year and Current Year-1, and Relative Quarter had eight members, Current Quarter to Current Quarter-7. I then added this to the Adventure Works cube, but did not make any changes on the Dimension Usage tab, so it had no relationship with any measure group. I then added the following code onto the end of the MDX Script:

//Relative Time

([Relative Time].[Relative Quarter].[Relative Quarter].Members, [Measures].[Internet Sales Amount]) =

([Measures].[Internet Sales Amount],

tail

([Date].[Calendar].[Calendar Quarter].members,

rank

([Relative Time].[Relative Quarter].currentmember, [Relative Time].[Relative Quarter].[Relative Quarter].members)

).

item(0)

);

([Relative Time].[Relative Year].[Relative Year].Members, [Measures].[Internet Sales Amount]) =

([Measures].[Internet Sales Amount],

tail([Date].[Calendar].[Calendar Year].members,

rank

([Relative Time].[Relative Year].currentmember, [Relative Time].[Relative Year].[Relative Year].members)

).

item(0)

);

What this does is overwrite the values displayed in the cube for the measure [Internet Sales Amount] and everything below the All Members of the two attributes on Relative Time – Current Year shows the value of the last member on the [Date].[Calendar].[Calendar Year] level, and so on. You now have a Relative Time dimension in your cube that users can use in their reports, but which has no impact on aggregations or processing. You can change the scope of the assignments to cover all the measures in your cube fairly easily.
 
The advantages of physically building the dimension, rather than using calculated members to do the same thing, are that a) you get drill down from Relative Year to Relative Quarter and b) your MDX Script doesn’t get cluttered with (potentially) hundreds of calculated members.
 
As I said, this is really just a proof of concept. I’m not 100% satisfied with the way this works (for example I’d like to see it return something nicer than repeated values when you crossjoin Relative Year and Calendar Year, and the Root on Relative Time doesn’t show sensible values unless you have as many relative periods as actual periods), but I thought it was worth posting in case anyone was interested.