Joining the results of two MDX queries together

One question I get asked occasionally is whether it’s possible to join the results of two MDX queries together. Although I seem to remember this kind of functionality is mentioned in the OLEDB for OLAP spec it certainly isn’t supported in Analysis Services MDX and I don’t expect it ever will be; therefore, as all good consultants know, when you’re faced with a request for functionality that doesn’t exist what you have to do is look closely at the requirement to see if there’s a different way of solving the problem to get the result the customer wants…

What people usually want to do when they think about joining MDX queries is this: they want to create a query that shows members from two different hierarchies side-by-side on the same axis. For example, in Adventure Works you might want to see a query with Calendar Years on Rows and Countries followed by Product Categories on Columns, something like this:
           

  Australia Canada Bikes Clothing
CY 2002 $2,154,284.88 $621,602.38 $6,530,343.53 (null)
CY 2003   $3,033,784.21 $535,784.46 $9,359,102.62 $138,247.97

It’s clear we can get the results we need by running two different queries, as follows:

SELECT {[Customer].[Country].&[Australia],[Customer].[Country].&[Canada]}  ON 0,
{[Date].[Calendar Year].&[2002],[Date].[Calendar Year].&[2003]} ON 1
FROM [Adventure Works]
WHERE([Measures].[Internet Sales Amount])

SELECT {[Product].[Category].&[1],[Product].[Category].&[3] }  ON 0,
{[Date].[Calendar Year].&[2002],[Date].[Calendar Year].&[2003]} ON 1
FROM [Adventure Works]
WHERE([Measures].[Internet Sales Amount])

Depending on the tool we’re using, we could try to put the results next to each other to make them more easily comparable. What we can’t of course do is something like the following query:

SELECT {[Customer].[Country].&[Australia],[Customer].[Country].&[Canada],[Product].[Category].&[1],[Product].[Category].&[3]}  ON 0,
{[Date].[Calendar Year].&[2002],[Date].[Calendar Year].&[2003]} ON 1
FROM [Adventure Works]
WHERE([Measures].[Internet Sales Amount])

This will result in the following error message:
Members belong to different hierarchies in the  function.
…for the very good reason that we have violated one of the fundamental rules of MDX – a set has to contain members of the same dimensionality, and here we have a set containing Countries and Product Categories.

What can we do to make the query work? Well, there is a simple MDX solution: create a set of tuples containing Countries and Product Categories:

SELECT
{
CROSSJOIN({[Customer].[Country].&[Australia],[Customer].[Country].&[Canada]},{[Product].[Category].[All Products]}),
CROSSJOIN({[Customer].[Country].[All Customers]},{[Product].[Category].&[1],[Product].[Category].&[3]}) 
}
ON 0,
{[Date].[Calendar Year].&[2002],[Date].[Calendar Year].&[2003]} ON 1
FROM [Adventure Works]
WHERE([Measures].[Internet Sales Amount])

What I’ve done here is created a set using two Crossjoins. The first returns a set containing the Countries we want crossjoined with the All Member from Product Categories; the second returns a set containing the All Member from Countries crossjoined with the Product Categories we’re interested in; we can then union them together and use them on the same axis because the tuples in the set have the same dimensionality, ie (Country, Product Category). Here’s what you get back:

image

It’s not quite what we wanted, but it’s all the data we need in a single query and we can probably get the user to ignore the All Members, or possibly hide them in the client tool somehow. The only problem with this approach is that it becomes unwieldy the greater the number of different hierarchies we want to display on columns.

If we’re using SSRS 2008 to display the results of our query, there’s another possible approach: we can use the new Tablix control to create the style of layout we’re after instead quite easily. You need to start by using the query designer and paste in a version of the query above with Years, Countries and Product Categories on Rows and Internet Sales Amount on columns:

SELECT
[Measures].[Internet Sales Amount] ON 0,
{
CROSSJOIN({[Customer].[Country].&[Australia],[Customer].[Country].&[Canada]},{[Product].[Category].[All Products]}),
CROSSJOIN({[Customer].[Country].[All Customers]},{[Product].[Category].&[1],[Product].[Category].&[3]}) 
}
*
{[Date].[Calendar Year].&[2002],[Date].[Calendar Year].&[2003]} ON 1
FROM [Adventure Works]

You then create a new matrix, drop Calendar Year onto the row group, Internet Sales Amount into the Data area, Country onto column group, then right click on the rightmost column and select Add Group->Column Group->Adjacent Right, to create a new column group, set it to group by Product Categories and again drop Internet Sales Amount into the data area:

image

Then, for each Column Group you need to make sure that you don’t see aggregated values for the All Members (which of course in SSRS are returned not with the All Member’s name, but with blank names); You do this by setting a filter on each group property, using an expression like:
=Fields!Country.Value IS Nothing
In this case [Country] is the name of the Country in the report, and if this expression returns False we have a Country name and we’re therefore not looking at the All Member.

Anyway, you then get an output like this, which is what we wanted:

image

Here’s one last impractical but fun way to solve the problem. While playing around with DMX recently it occurred to me that the SHAPE statement could also be useful in solving this problem, and a lot of help on the syntax from my friend and DMX (as well as SSIS) guru Mr Allan Mitchell, I came up with the following:

SELECT FLATTENED t.*
FROM
                [Sequence Clustering] — arbitrary just has to be a mining model
NATURAL PREDICTION JOIN
SHAPE
{
SELECT {[Customer].[Country].&[Australia],[Customer].[Country].&[Canada]}  ON 0,
{[Date].[Calendar Year].&[2002],[Date].[Calendar Year].&[2003]} ON 1
FROM [Adventure Works]
WHERE([Measures].[Internet Sales Amount])
}
APPEND
(
                {
SELECT {[Product].[Category].&[1],[Product].[Category].&[3] }  ON 0,
{[Date].[Calendar Year].&[2002],[Date].[Calendar Year].&[2003]} ON 1
FROM [Adventure Works]
WHERE([Measures].[Internet Sales Amount])
                }
RELATE [[Date]].[Calendar Year]].[Calendar Year]].[MEMBER_CAPTION]]]
TO [[Date]].[Calendar Year]].[Calendar Year]].[MEMBER_CAPTION]]]
) AS MyNestedTable as t

image

To get this to work you just need to have a mining model in your SSAS database so you can put it in the FROM clause; it doesn’t matter what it is because it’s going to be ignored. I was able to join the queries on the MEMBER_CAPTION field from [Date].[Calendar Year], which contained the names of the Years on rows, although it was a struggle to work out how and where to add all the extra opening and closing square brackets that are needed in the RELATE clause! Notice, though, that we can just paste the MDX queries we need in there – usually SHAPE is used with OPENQUERY, but of course even though this is a DMX query we’re staying within the same SSAS database to get the data so that’s not necessary. Not the most elegant solution, of course, but interesting nonetheless.

Google Squared

Today Google announced an interesting new product: Google Squared. Here are some links:
http://www.theregister.co.uk/2009/05/13/google_squared/
http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9132972
Basically it will return data from search results in spreadsheet format. And of course, when you’ve got data in Google spreadsheet format you can do all kinds of cool stuff with it, like stick Panorama’s pivot table gadget on top of it.

This, plus moves towards support of RDFa also announced today:
http://www.techcrunch.com/2009/05/12/google-takes-steps-towards-a-more-structured-web/
means that there’s going to be some really interesting possibilities for doing BI direct from data sourced from the web.

Oh, and let’s not forget about Wolfram Alpha, also coming soon and equally exciting from a web/data/BI point of view. Imagine, instead of it being able to tell you things like the distance between the Earth and the Moon right now, having your business modelled in it and then letting end users query this model using a search-engine interface.

Metadata? Complex Event Processing?

In part one of today’s ‘Interesting BI-related stuff I saw on the web today’ posts…

After MDM finally reared its head, it seems like Microsoft is working on some kind of metadata tool as well:
http://www.tdwi.org/News/display.aspx?ID=9434
More news later this year apparently. Interesting comment:
”He did disclose that Microsoft’s still-percolating metadata management effort will encompass both its MDM and search assets.”

AND it seems like Microsoft is entering the Complex Event Processing market:
http://www.biztalkgurus.com/blogs/biztalksyn/archive/2009/05/11/microsoft-announces-entry-into-complex-event-processing-cep-market.aspx
Since other CEP vendors support some kind of OLAP on top of their data (eg SQLStream/Mondrian, Aleri) I wonder if Microsoft have a story for SSAS and CEP?

UPDATE: more details on MS CEP here:
http://www.dbms2.com/2009/05/13/microsoft-announced-cep-this-week-too/

SQL2008 R2 Site Live

So the announcements are starting to flow at TechEd – for instance, Microsoft’s long-awaited master data managment solution, now called Master Data Services, will be available as part of the SQL2008 R2 (what was known as Kilimanjaro) release. More details on this and other BI-related features can be found here:
http://www.microsoft.com/sqlserver/2008/en/us/R2.aspx

Looks like SSRS will be getting some new stuff – certainly the collaboration features brought in by the 90 Degree Software acquisition look like they’re going to be added to Report Builder. Perhaps we’ll finally see the Officewriter/SSRS functionality too?

UPDATE: one other thing, mentioned by Teo here:
http://prologika.com/CS/blogs/blog/archive/2009/05/11/transmissions-for-teched-usa-2009-day-1.aspx

Gemini will be able to source data from SSRS reports, and SSRS will be able to expose data as ‘data feeds’ (ie have a new RSS/ATOM rendering extension?).

UPDATE #2: Rob Kerr has a very good write-up and analysis of what was shown of Gemini here:
http://www.robkerr.com/post/2009/05/Microsoft-BI—Gemini.aspx

It looks like there’s been the official announcement of a feature I’ve heard rumours about, namely that Gemini will have its own language for defining multidimensional calculations called DAX. As Rob says, it’ll be interesting to see whether it suffers the same fate as that other attempt to simplify MDX, PEL…

 

New Gemini Demos Dissected

On the BI Blog on Monday a new set of Gemini demos were posted; they’re also available on YouTube. They look like the same demos I saw at PASS Europe a few weeks ago and while they don’t show much in the way of different functionality compared to what was shown late last year, I think there are a few interesting points to note. Unfortunately the quality of the picture is so poor you can’t make out much detail on the screen, so I can only really comment on what Donald Farmer (who’s presenting) specifically points out.

Let’s step through each demo and I’ll give you a running commentary on them…

  • 0:11 Note that Gemini isn’t built into Excel, it’s an Excel addin. I’m not sure whether it will work in Excel 2007 or only Excel 2010 (or whatever it’s called) but this is significant for another reason: it means that Gemini release cycles are not tied to Office release cycles, so potentially new releases of Gemini can appear reasonably regularly.
  • 0:35 We start off with sourcing data from a data warehouse – probably intentionally, to forestall some of the hostility that was seen when Gemini was first announced, when Gemini was seen as being yet another “you don’t need a data warehouse” type tool.
  • 1:05 Arggh, why can’t I see the buttons on the Data-Cleaning ribbon? It looks like there’s a lot of stuff there, although it might not be all working properly yet.
  • 1:45 The obligatory boast about how much data you can work with – in this case 20 million rows – on a regular desktop machine. In my experience that’s the average number of rows I see in a fact table underneath SSAS (though of course it can handle way more than that), so the number was probably deliberately chosen for that reason, as well of course to get the Excel users out there salivating. Suddenly posts like this seem less funny, more like a chilling prediction of things to come…
  • 2:48 You can copy data into Gemini from the clipboard. Note that you don’t seem to be able to link to the data directly in Excel, at least not yet. Donald also mentions that ‘other data sources’ will be supported – it’ll be interesting to see which ones.
  • 3:18 Creating a pivot table. We seem to be back in regular Excel here and out of the Gemini addin, although Donald says that ‘in Gemini we have some cool pivot tables we can handle’. Perhaps what we’re seeing here are Excel 2010 pivot tables.
  • 3:50 Pointing out the inferred relationship between tables. I suspect this relationship was inferred well before this point; we already know you are going to be able to set these relationships up manually.
  • 4:26 Showing data as a percentage of total. There seems to be a big button to do this; are there going to be any other easy calculations available? Where are the calculations taking place, and how are they expressed – in Excel or the underlying Gemini cube?
  • 4:50 New slicer bars – mentioned as a ‘new feature in Excel’, specifically for Gemini but also available for other Excel users. So this must be new generic Excel pivot table, rather than Gemini functionality. This looks really good; I like the way they are aware of each other too, and aware of what data is available, though I wonder how exactly they know whether data is available and how this would work with cubes containing calculations etc.
  • 0:20 Set theme – whoa, so you can apply a theme to an Excel spreadsheet? Hmm, turns out you can already do this. But it is a powerful feature when you want to create a report.
  • 0:30 Publishing to Sharepoint, but notice how Donald mentions that publishing a model containing 20 million rows would take a bit of time. How long exactly? Minutes? Hours?
  • 0:40 The Sharepoint report centre. OK, so we can rate reports with stars, yeah that’s going to be useful… but other ‘social tools for collaboration’ might be interesting.
  • 1:03 Setting a refresh rate. Basically how often the local cube underneath Gemini gets processed, I suppose. How long will a refresh take though?
  • 1:21 Seeing the report in a thin client. This is Excel Services, I think…? This will only make it harder to choose between Excel/Excel Services/Gemini on one hand and SSRS on the other. It would be nice if there was some kind of story linking the two.
  • 2:57 The operations dashboard – again, I wish I could see more detail of what’s on screen. I can see some of the stuff you’d expect, like metrics on CPU usage and query response times. It’s all done in Excel Services again – I wonder if there’s a cube behind it all storing the performance data?
  • 3:46 Upgrading and formalising a popular app. But notice that the option is ‘upgrade to PerformancePoint’…? The focus is on upgrading for better maintenance and management rather than performance; I guess in PerformancePoint you’ve got IT control over the report design. Possibly, when server-side SSAS gets the Gemini storage engine, you’ll be able to push the Gemini cube into an instance of SSAS. But when you’ve done this will you still be able to use the performance metrics dashboard we’ve just seen?

SSIS Trace File Source Adapter now available

Hurray! At long last, my friends at www.sqlis.com have got round to making their Trace File Source Adapter for SSIS available:
http://www.sqlis.com/post/Trace-File-Source-Adapter.aspx

What’s this got to do with Analysis Services? Well, as the post mentions, there are tons of cool things you can use this for when you’ve got some Analysis Services trace files: cache warming and usage monitoring among other things. It works for SQL Server relational engine traces too, if you care about that sort of thing…

Microsoft Solver Foundation

Via SoCalDevGirl, I’ve just discovered another interesting piece in the somewhat fragmented Microsoft Business Intelligence story: Microsoft Solver Foundation. Here’s the official website:
http://www.solverfoundation.com/Default.aspx

What is it then? Ahem, well, if you can’t make much sense of the blurb on the website (like me) it’s probably not aimed at you. Here’s a sample quote:

Solver Foundation is a Microsoft framework designed to deliver critical business insight tools to CxOs, quantitative analysts and developers of mission-critical systems. Traditionally referred to as mathematical programming, these tools provide business intelligence and planning support to organizations seeking maximal competitive advantage.

I suggest you read the full overview to get a better idea of what it does. What I do understand, though, is that anyone who uses this is going to be interested in using the rest of the Microsoft BI stack; I sincerely hope that the Solver Foundation team is talking to the other BI teams and that some kind of coherent BI strategy will emerge. If one does it’s clearly going to be Excel-centric (which makes a lot of sense): Solver Foundation has an Excel addin; there’s also the data mining addin; there’s SSAS’s own integration with Excel; and of course Gemini will be surfaced through Excel, tying up SSAS, some kind of data cleansing functionality, and possibly some data mining functionality too into one compelling package.

Speaking next week in Zurich

I do quite a lot of work in Switzerland: I lived in Basel for three years and still have a lot of friends and business contacts there as a result. I’ll be there next week, in fact, and while I’m there I’ll be speaking at the May meeting of the Swiss PASS chapter in Zurich. I’ll be doing the same session I did at PASS Europe last week (so no prep time needed, luckily) on ‘Designing Effective Aggregations in SSAS 2008’. All the details are here:
http://www.sqlpass.ch/

Hope to see some of you there…

SQLSentry Performance Advisor for Analysis Services

I’m currently in Germany at the PASS European Conference. I ran a pre-conf with Allan Mitchell two days ago on monitoring SSAS, SSRS and SSIS, an expanded version of the material I used at the PASS Summit last year; we talked about how you could build a monitoring solution for the MS BI stack yourself, but for the last few months I’ve also been talking to the guys at SQL Sentry – who are also here exhibiting – about their new product called Performance Advisor for Analysis Services which has just been made public. I’ve had an in-depth demo and my first impression is that this is the first Analysis Services monitoring solution that I would actually want to use; it looks really, really good. Here’s a screenshot:

I believe SQL Sentry have the first solution that will monitor the SQL Server relational engine, Analysis Services, and SSIS and SSRS as well. As I’ve said before, I can’t believe it’s taken this long for a big third party vendor to notice the MS BI monitoring market! You can find out more and sign up for the beta here:
http://www.sqlsentry.net/performance-advisor/sql-server-analysis-services.asp
I’ll be getting hold of a beta version myself soon, and I’ll blog in more detail then.

Of course, collecting this information is one thing – actually understanding what it tells us, and using that information effectively, is something else completely. To be honest, it’s only after the prep work I did for my precon that I’ve started to think about this problem properly. Indeed, it’s only really when this kind of monitoring data is easily accessible that best practices for what to monitor can emerge; perhaps the SQLCat team need to write a white paper on this subject? In our preconf Allan also did a few cool demos using data mining on perfmon data to predict when a server is going to ‘go pop’ (in his words), and I think there are a lot of interesting possibilities here too.

Using MDX to browse the file system

One very obscure feature of Analysis Services that I’ve only ever seen documented in the books “Microsoft SQL Server Analysis Services 2005” and its successor “Microsoft SQL Server 2008 Analysis Services Unleashed” (both highly recommended for advanced SSAS users, by the way) is the fact that you can use MDX to browse the file system of the server that Analysis Services is running on, with some restrictions. Full details on this can be found on P804 of the 2008 book but it neglects to give any real examples of how to do it, so since it took me a few minutes to work out the correct syntax to use I thought it was worth a blog post.

There are four MDX extensions you can use, which can be executed from SQL Management Studio just like any other MDX statement.

  • SystemGetLogicalDrives returns a list of the drives on your AS box, with the amount of free space.
  • SystemGetSubdirs returns a list of directories under a specified path. This only actually works on directories that you have listed on the AllowedBrowsingFolders server property for Analysis Services (which you can find in SQL Management Studio by right-clicking on your instance name in the Object Explorer and clicking Properties) and their parents, so on my machine the following query returns only the Program Files directory:
    SystemGetSubdirs ‘C:\’
  • SystemGetFiles returns all the files in a given directory, again only working for the directories listed in AllowedBrowsingFolders. For example on my machine:
    SystemGetFiles ‘C:\Program Files\Microsoft SQL Server\MSAS10.MSSQLSERVER\OLAP\Backup\’
    returns all the files in my backup directory because that’s automatically included in AllowedBrowsingFolders, whereas
    SystemGetFiles ‘C:\’
    returns an empty result set.
  • SystemGetFileExists returns whether a file exists or not in a directory you’re allowed to browse. So, for example:
    SystemGetFileExists ‘C:\Program Files\Microsoft SQL Server\MSAS10.MSSQLSERVER\OLAP\Backup\MyBackupFile.abf’
    will return 1 if the file MyBackupFile.abf exists, or 0 if it doesn’t.

Clearly this functionality is quite useful if you’re automating things like backups and you don’t have administrative rights to the server file system, although you need to be an Analysis Services administrator to do this.