Can I run SQL against an Analysis Services cube?

I take what is probably an unhealthy interest in the statistics that MSN Spaces generates about visits to this blog, and in particular the searches that people run on Google which land them here. Over the last few months I’ve noticed that the someone has been searching on the phrase "Can I run SQL against an AS cube?" on a fairly regular basis; since I mentioned the fact that you can in a post a while ago, I assume that’s why they always come here, but I thought it would be good to answer the question in more depth. Never let it be said that I’m not responsive to the needs of my audience – although I’ll draw the line at pandering to the many people who seem to be looking for pictures of "hot bi action"…
 
In fact, you’ve always been able to run SQL against Analysis Services right back to the days of OLAP Services. Traditionally this was only useful when you wanted to create local cubes from server cubes and the subset of SQL supported by AS was so limited you wouldn’t want to use it for anything else; the fact that this stuff isn’t documented anywhere, possibly intentionally, didn’t help. However, when I started to do some research into AS2005 SQL I was pleasantly surprised at how much I could do. My starting point was to look at some of the SQL queries that are generated when AS2005 creates local cubes. Take the following MDX statement which uses the CREATE GLOBAL CUBE syntax to create a local cube from Adventure Works:
 
CREATE GLOBAL CUBE [Test]
STORAGE ‘c:\MyCube.cub’
FROM [Adventure Works]
(MEASURE [Adventure Works].[Internet Sales Amount],DIMENSION [Adventure Works].[Geography],DIMENSION [Adventure Works].[Product])
 
If you run a profiler trace while this executes, you’ll see a whole bunch of SQL statements are generated in the background to get the data to populate the local cube. Here are two representative examples:
 
SELECT
  DISTINCT
  KEY ( [Adventure Works].[$Product].[End Date],0 )
  AS [oduct0_0], NAME ( [Adventure Works].[$Product].[End Date] )
  AS [oduct0_1], MemberValue ( [Adventure Works].[$Product].[End Date] )
  AS [oduct0_2]
   FROM [Adventure Works].[$Product]
 
SELECT  AGGREGATE ( [Adventure Works].[Internet Sales].[Internet Sales Amount] )
  AS [Sales0_0], KEY ( [Adventure Works].[$Product].[Product],0 )
  AS [oduct1_0]
   FROM [Adventure Works].[Internet Sales]
    NATURAL JOIN
   [Adventure Works].[$Product]
  GROUP BY [oduct1_0]
 
The first thing to notice is that the ‘tables’ we’re querying are either dimensions or measure groups. A dimension ‘table’ has the naming convention [CubeName].[$DimensionName] (note the dollar sign) and a measure group ‘table’ has the naming convention [CubeName].[MeasureGroupName]. We can obviously do joins between them using the NATURAL JOIN syntax; we can also do GROUP BYs and use functions like AGGREGATE (which I assume aggregates the measure value by its cube aggregation function), KEY, NAME and MEMBERVALUE (which as far as I can see allow you to retrieve the key, name and membervalue properties associated with a dimension attribute). My memory might not be entirely accurate on this but I’m fairly sure that none of the above could be done with AS2000 SQL. You can also do WHERE clause filtering too, but it looks like you can only AND conditions and not OR them, so

SELECT AGGREGATE ( [Adventure Works].[Internet Sales].[Internet Sales Amount] )AS [Sales0_0], KEY ( [Adventure Works].[$Product].[Product],0 ) AS [oduct1_0], [Adventure Works].[$Product].[Product],
[Adventure Works].[$Date].[Calendar Year]
FROM [Adventure Works].[Internet Sales]
NATURAL JOIN
[Adventure Works].[$Product]
NATURAL JOIN
[Adventure Works].[$Date]
WHERE [Adventure Works].[$Product].[Product] = ‘Mountain-100 Black, 48’
AND [Adventure Works].[$Date].[Calendar Year]=’CY 2002′
GROUP BY [oduct1_0], [Adventure Works].[$Date].[Calendar Year]

 

runs, whereas
 
SELECT AGGREGATE ( [Adventure Works].[Internet Sales].[Internet Sales Amount] )AS [Sales0_0], KEY ( [Adventure Works].[$Product].[Product],0 ) AS [oduct1_0], [Adventure Works].[$Product].[Product],
[Adventure Works].[$Date].[Calendar Year]
FROM [Adventure Works].[Internet Sales]
NATURAL JOIN
[Adventure Works].[$Product]
NATURAL JOIN
[Adventure Works].[$Date]
WHERE [Adventure Works].[$Product].[Product] = ‘Mountain-100 Black, 48’
OR [Adventure Works].[$Date].[Calendar Year]=’CY 2002′
GROUP BY [oduct1_0], [Adventure Works].[$Date].[Calendar Year]
 
produces an error, which limits its usefulness. Nor can I get any query which uses the COUNT function to work, for example:

SELECT COUNT(*)
FROM [Adventure Works].[Internet Sales]
WHERE [Adventure Works].[$Product].[Product] = ‘Mountain-100 Black, 48’
AND [Adventure Works].[$Date].[Calendar Year]=’CY 2002′

 
gives an error. So while we’ve got the potential to do some useful things here, it isn’t exactly the most useful implementation of SQL I’ve ever seen. There are some other pointers to other functions that are supported in the Analysis Services 2005 cartridge, found in
C:\Program Files\Microsoft SQL Server\MSSQL.5\OLAP\bin\Cartridges\as90.xsl
on my machine. This is the file that contains the instructions for AS on how to generate the SQL used for processing cubes, and there are what looks like several other functions mentioned in here that could be worth looking at.
 
But do we want to use SQL to query Analysis Services anyway? I’ve talked about this before, here:
I don’t think so, and it seems to me that although there are some things that are easier to express in SQL than MDX the acutal subset of SQL that is implemented in here is crippled in some important respects. MDX is so much better suited for BI queries and although I know a lot of people struggle with it at first, it’s definitely worth the pain in the end. 
 
 

Project REAL Code and Docs

Finally, the complete Project REAL is available for download here:
 
From the download page, the contents are:
The kit contains:

1. A set of instructions for setting up the environment
2. Guidance on how to explore the implementation
3. A sample relational data warehouse database (a subset of the Project REAL data warehouse)
4. A sample source database (from which we pull incremental updates)
5. SSIS packages that implement the ETL operations
6. An SSAS cube definition and scripts for processing the cube from the sample warehouse
7. Sample SSRS reports
8. Sample data mining models for predicting out-of-stock conditions in stores
9. Sample client views in briefing books for the Proclarity and Panorama BI front-end tools

 
 

Panorama Industry-Specific Solutions

A lot of people, me included, have been wondering what the future holds for Panorama after the Proclarity acquistion. Well it looks like Microsoft have tried to make amends in the form of a partnership to develop industry-specific BI applications:
There’ll be solutions for retail, credit management, manufacturing and other areas. Quite apart from the fact that the new apps will be nice to have, I’m glad that Panorama hasn’t been left to sink – it has some good products and the diversity of the MS BI client tool ecosystem has always been one of its strengths in my opinion, so anything that Microsoft can do to maintain that diversity should be welcomed.

Office Business Application Services

More new stuff from the Office Business Applications team? I’m not really sure what it is, and the comment "Think of OBA as the platform support for “business mash-ups” in Office" is a bit vague; but apparently it will be announced later this week. See
 
UPDATE: here’s the press release announcing this and Line-Of-Business Interoperability:
 

Mondrian MDX

Nick Goodman, director of BI solutions at Pentaho, has posted a short tutorial explaining how to run MDX queries in Mondrian (the open source OLAP engine) here:
I’m going to spend some time looking at Mondrian’s implementation of MDX in the near future, which I’m looking forward to; it’ll be interesting to see how it differs from AS MDX (if at all).
 

PASS Germany BI Bootcamp

Markus Fischer has asked me to mention that PASS Germany are running a (German language) BI Bootcamp this September. More details here:

Office PerformancePoint Server 2007

So Biz# gets a proper name at last…
 
There’s a conf call (which I don’t have an invite to, alas) later to announce all this stuff properly, but here are some links to press coverage:
 
UPDATE: I missed the conf call (even though I found I didn’t need an invite – prior commitments I’m afraid) but there are already some good blog entries out there on what was announced:
and here’s the official site:
 
 

Office Ultimate

I’m sure you all know by now that I’m prone to the odd bout of wild speculation, and here comes another one… A new flavour of Office has been announced, Office Ultimate. See
for more details. As this article says, it’s pricey and MS will need to offer new functionality to justify that price tag. Now we know that it’s the Office group that have bought Proclarity and what I’m thinking is that the Proclarity fat client tool would fit nicely into Office Ultimate as exactlly that kind of justification: $679 per licence is cheap when you think of what companies have been paying for BI client tools like Proclarity up to now (and you get Word, Excel, Powerpoint etc thrown in!), plus its inclusion will drive sales of SQL Server and the other Office BI apps like BSM and (perhaps) the forthcoming Biz#. Hmm… let’s wait and see…

Improving Performance of Analysis Services-Sourced Reporting Services Reports

This is something I picked up on the other week, when I was doing a job tuning some Reporting Services reports which were running off Analysis Services (there were other, more interesting findings but I’ll leave them for a later date). When you create MDX queries using the RS query builder, the reports look something like this:

SELECT NON EMPTY { [Measures].[Internet Sales Amount] } ON COLUMNS, NON EMPTY { ([Date].[Calendar Year].[Calendar Year].ALLMEMBERS ) } DIMENSION PROPERTIES MEMBER_CAPTION, MEMBER_UNIQUE_NAME ON ROWS FROM [Adventure Works] CELL PROPERTIES VALUE, BACK_COLOR, FORE_COLOR, FORMATTED_VALUE, FORMAT_STRING, FONT_NAME, FONT_SIZE, FONT_FLAGS

 
My customer was creating some very large reports that were up to several hundred pages long, and in some cases they took over a minute to render. What I realised was that the above query contains a number of cell properties that aren’t actually needed, such as BACK_COLOR, FORE_COLOR, FORMAT_STRING, FONT_NAME, FONT_SIZE and FONT_FLAGS. They can safely be removed from the query, as follows:

SELECT NON EMPTY { [Measures].[Internet Sales Amount] } ON COLUMNS, NON EMPTY { ([Date].[Calendar Year].[Calendar Year].ALLMEMBERS ) } DIMENSION PROPERTIES MEMBER_CAPTION, MEMBER_UNIQUE_NAME ON ROWS FROM [Adventure Works] CELL PROPERTIES VALUE, FORMATTED_VALUE

 

There’s no change to the results returned or how most reports will behave (assuming you’re not somehow referencing these properties somewhere), and in my case it made a noticeable difference in the amount of time taken to render the reports – reports that previously took 60 seconds to run now took 50 seconds. I would guess that this is because for very large queries, there’s a significant overhead involved with including all these unnecessary property values in the resultset

Some Time Intelligence Wizard Calculations *still* don’t work

You may remember a post from late last year where I talked about how some calculations generated by the Time Intelligence Wizard in BIDs didn’t work:
Well, the other week I checked to see if this had been fixed in SP1 and indeed it had, using the solution that Mosha outlines in his comment on that post.
 
BUT, to my horror, I found that some of the other calculations now don’t work (and indeed don’t even deploy) in some circumstances. Everything seems to work ok on Adventure Works, but on one of my cubes I got the following MDX for a 12 Month Moving Average:

Create Member

  CurrentCube.[Period].[Year-Month Period Calculations].[Twelve Month Moving Average]

  As "NA" ;

 

 

Scope(

       {

         [Measures].[Sales]

       }

) ;

 

// Twelve Month Moving Average 

  (

     [Period].[Year-Month Period Calculations].[Twelve Month Moving Average],

    [Period].[Month].[Month].Members,

     [Period].[Month].Members

  ) =

   

  Avg(

       {

         ParallelPeriod(

                         [Period].[Year-Month].[Month],

                         11,

                         [Period].[Year-Month].CurrentMember

         ) : [Period].[Year-Month].CurrentMember

       },

      [Period].[Year-Month Period Calculations].DefaultMember

  ) ;

 

 

End Scope ;

I’ve highlighted the problem in bold/red: the [Period].[Month] attribute is mentioned twice in the subcube definition; apparently it happens when the target level of the calculation is also the key attribute of the dimension. The fix is fairly simple, though, and is to remove the second mention of the attribute ie in this case the reference to [Period].[Month].Members. This has been confirmed as a bug and will, I’m promised, be fixed in SP2.