Will MDX Go Mainstream?

I always look out for Mark Whitehorn’s articles in the Register, if only because I’m tickled to see any mention of MDX on the front page of the UK’s foremost IT news site. Here’s his latest:
http://www.regdeveloper.co.uk/2007/10/22/mdx_intro/

While I agree with his judgement that "MDX will become a highly saleable skill" (it already is for me!), and while I agree with with all his arguments that MDX is a good thing and better than SQL for BI queries, I can’t agree with his central argument that application developers are going to start learning it on a massive scale. I would really, really, really like to believe it but I can’t. A small minority of developers who develop specialist analytical applications will need to learn a bit but everyone else will rely on third-party products like Dundas OLAP Services (what’s going to happen about that with the MS acquisition of the Dundas products? Anyone know?) or the Intelligencia OLAP Controls. It’s not that MDX is difficult per se, but that people who are used to thinking in SQL – and developers are always going to need to know some SQL – find it very difficult to start thinking in MDX, and that’s a big hurdle to overcome. And as the existence of not just the tools I mentioned but the entire AS client tool market proves, it’s also relatively easy to write a generic MDX query generator that will work well on just about any cube whereas you can’t just write a generic SQL query generator that will work on any set of tables without building a metadata layer over the top (eg in the way Report Builder needs its Report Models); and once you’ve built that metadata layer you might as well have built a cube anyway.

Fun and Games with Schema Rowsets

One of the (few) new features in AS2008 are Data Management Views, which give you a table-like way of looking at all kinds of useful metadata and management style information. However, as Darren points out here:
http://geekswithblogs.net/darrengosbell/archive/2007/08/13/SSAS-2008-Data-Management-Views.aspx
…this isn’t really anything you can’t get from schema rowsets at the moment, and the great set of functions that Darren wrote for the Analysis Services Stored Procedure Project (http://www.codeplex.com/ASStoredProcedures/Wiki/View.aspx?title=XmlaDiscover&referringTitle=Home) make it very easy to get at this information in MDX.

I’ve been playing around with schema rowsets a bit over the last week or so, mostly because I’m working on an updated version of my SSIS cache-warmer (more of which soon), but it’s given me a number of interesting ideas for other things that could be possible. For instance, one of the big missing features from AS at the moment in my opinion in a query governor: it’s all too easy for one user to bring your server to its knees by running the query from hell. In the article above, Darren talks about the information in the DISCOVER_SESSIONS and DISCOVER_CONNECTIONS rowsets, and how you can use the ids they return to kill sessions and connections; both of them contain other interesting information (which I think probably need some time spent researching to make sure I know exactly what they mean) on the time since the last command, the CPU time taken by the last command, what the last command was, how long the last command took to run etc. With this information I guess it would be possible to build a rudimentary query governor in SSIS: basically a package that was scheduled to run every ten seconds or something, checked the overall CPU usage (or other perfmon counters) on the machine and if it was greater than a certain threshold looked at the sessions that were open, tried to work out which were the guilty parties and killed them.

There’s other useful stuff available from schema rowsets too that doesn’t seem to be documented anywhere. If you look at the DISCOVER_SCHEMA_ROWSETS rowset (try running call assp.Discover("DISCOVER_SCHEMA_ROWSETS") if you have ASSP installed) you can see what schema rowsets are available. One interesting one I found was DISCOVER_PARTITION_DIMENSION_STAT; for example run

call assp.Discover("DISCOVER_PARTITION_DIMENSION_STAT", "<DATABASE_NAME>Adventure Works DW</DATABASE_NAME><CUBE_NAME>Adventure Works</CUBE_NAME><MEASURE_GROUP_NAME>Internet Sales</MEASURE_GROUP_NAME><PARTITION_NAME>Internet_Sales_2004</PARTITION_NAME>")

…on Adventure Works. What it’s showing is the partitioning and indexing information for an individual partition, information discussed in this SQLCat blog entry from earlier this year:
http://blogs.msdn.com/sqlcat/archive/2007/03/05/ssas-partition-slicing.aspx

This article sends you off to the info*.xml files created when you process a partition to get this information, but here it is ready to use. The last three columns the rowset returns are the important ones: they tell you whether each attribute is indexed in the partition and if so what the maximum and minimum members in the partition are. These maximums and minimums are expressed as DataIDs, AS’s own internal surrogate keys, and you can find out what a member’s DataID is by using the (wait for it) DataID function as follows:

with member measures.showdataid as
DataID([Date].[Date].currentmember)
select measures.showdataid on 0,
[Date].[Date].members on 1
from [adventure works]

When I first found out about this I thought it would be really cool to be able to visualise this information somehow so you could check to see whether your partition slicing was in a healthy state or not. For instance, if you look at the 2001 partition from the Internet Sales measure group:

…you can see there are no indexes at all. This is because there is so little data in it, it falls below the Index Build Threshold server property, and this in turn means that every time you run a query then that partition is going to be scanned. Another thing you might need to watch out for, if you’re not using the Slice property on each partition (and you really should be, even on MOLAP partitions, despite what some people say) is the scenario the SQLCat team describe where your partitions end up with overlapping DataID ranges, meaning that AS ends up scanning partitions unnecessarily when you run queries. I’m sure I could put together some kind of SSRS report showing all the partitions in a measure group side by side, and perhaps using a stacked bar chart or something try to represent the DataID ranges for a given attribute.

SQLBits Sessions now available for download

If you did go to SQLBits (or even if you didn’t) then you may be interested to know that the slides from all the sessions are now available for download from the website:
http://www.sqlbits.com/

Over the next few weeks there’ll be some pictures going up too…

Asymmetric Sets on Columns in Reporting Services

Now I don’t know if I’ve blogged about this before – apologies if I have, but I can’t find any trace of it on Google and since this is a fairly common performance-related problem it deserves to be mentioned. It’s not rocket science MDX either but sometimes I think I focus too much on fun but obscure problems at the expense of real-world scenarios…

As you know, Reporting Services only allows you to have members from the Measures dimension on columns in a dataset. This is rubbish at the best of times since you need to use a matrix control to pivot your results, but it can sometimes present a performance problem. Consider the following query:

select
{([Measures].[Internet Sales Amount], [Date].[Calendar Year].&[2004]),
([Measures].[Internet Sales Amount], [Date].[Calendar Year].&[2003]),
([Measures].[Internet Tax Amount], [Date].[Calendar Year].&[2004])}
on 0,
non empty
[Customer].[Customer].[Customer].members
*
[Product].[Product].[Product].members
on 1
from [Adventure Works]

On the AdventureWorks Simple db (kind of like Adventure Works but with some things removed – you can download it here: http://www.microsoft.com/downloads/details.aspx?FamilyID=975c5bb2-8207-4b4e-be7c-06ac86e24c13&DisplayLang=en) this runs in around 11 seconds and returns 55361*3=166083 cells on a cold cache. It’s typical of the kind of large reports people for some reason like to run in SSRS but the fact that you have an asymmetric set on columns becomes a problem when you try to convert it for use with a matrix: when you pivot Year to appear on rows you are automatically returning extra data you didn’t want to use in your query, in this case values for Internet Tax Amount and 2003:

select
{[Measures].[Internet Sales Amount],[Measures].[Internet Tax Amount]}
on 0,
non empty
[Customer].[Customer].[Customer].members
*
[Product].[Product].[Product].members
*
{[Date].[Calendar Year].&[2003],[Date].[Calendar Year].&[2004]}
on 1
from [Adventure Works]

In this case it doesn’t make much of a difference, but it can result in a big increase in the size of the resultset and a corresponding increase in query time especially if you have measures from different measure groups and your new query touches extra partitions as a result of the pivot.

You may think that the answer is to create a calculated measure to display the value of Internet Sales Amount for 2003 and put that on columns along with the regular Internet Sales Amount and Internet Tax Amount measures, and put the Year in the Where clause:

with member measures.inetsalesprevyear as
([Measures].[Internet Sales Amount],
[Date].[Calendar Year].currentmember.prevmember)
select
{[Measures].[Internet Sales Amount],
measures.inetsalesprevyear,
[Measures].[Internet Tax Amount]}
on 0,
non empty
[Customer].[Customer].[Customer].members
*
[Product].[Product].[Product].members
on 1
from [Adventure Works]
where([Date].[Calendar Year].&[2004])

But if you try running this you’ll find that performance is much, much worse: in fact on my laptop I killed the above query after it had run for more than a minute. The problem is that AS can’t optimise this query in the same way as it has the previous queries, and a quick look in Perfmon confirms this as you can see the Total Cells Calculated counter going through the roof when it runs.

What can we do? You might think Non_Empty_Behavior is the way to go but I’ve not got it to make any difference even with a hard-coded tuple; in fact we what we need to do is optimise the Non Empty rather than the calculated measure. Instead of looking for non empty rows where one of the columns represents a calculated measure, we can use the NonEmpty function to be specific about what rows we want to appear:

with
member measures.inetsalesprevyear as
([Measures].[Internet Sales Amount],
[Date].[Calendar Year].&[2003])
select
{[Measures].[Internet Sales Amount],
measures.inetsalesprevyear,
[Measures].[Internet Tax Amount]}
on 0,
nonempty(
[Customer].[Customer].[Customer].members
*
[Product].[Product].[Product].members
, {([Measures].[Internet Sales Amount],
[Date].[Calendar Year].&[2004].prevmember)
, ([Measures].[Internet Sales Amount],
[Date].[Calendar Year].&[2004])
, ([Measures].[Internet Tax Amount],
[Date].[Calendar Year].&[2004])})
on 1
from [Adventure Works]
where([Date].[Calendar Year].&[2004])

What I’m doing here is sticking with the same calculated measure but in the filter set for NonEmpty using the same set of tuples that I originally wanted on columns; the calculated measure is then only evaluated when we know that the row contains a non empty value for one of the three columns. Performance is exactly the same as the original query as far as I can see, and it can all be parameterised nicely.

Now if Reporting Services had proper support for MDX we wouldn’t need to go into all this, but don’t get me started on that topic again…

Precision Considerations for Analysis Services white paper

Another week, another white paper from the SQLCat team (do these guys ever sleep?) on the topic of inconsistent numbers being returned from different queries against the same cube as a result of precision issues:

http://www.microsoft.com/downloads/details.aspx?familyid=bae8beec-9892-4ecd-a9db-292254895f9c&displaylang=en

It’s not as bad as it sounds, but definitely something to read before you get hauled up in front of the end users when they notice this!

Maximising Attribute Relationship Aggregations in Changing Dimensions article

Erik Veerman has written a nice article for the Solid Quality Mentors (who I also do training for) newsletter on attribute relationships and the RelationshipType property:

http://www.solidq.com/na/TechArticleDetail.aspx?Id=11

As he says on his blog, if you want to see more articles like this you can sign up for the Solid Quality newsletter here:

http://www.solidq.com/na/Newsletter.aspx

SAP to buy Business Objects

If you care about this, I’m sure you’ve already heard that SAP have agreed to buy Business Objects:

http://www.eweek.com/article2/0,1895,2192841,00.asp

The general consensus seems to be that this is bad news for Business Objects customers and that the last few remaining independents (eg Cognos) will go the same way too. Will Microsoft be buying anyone? MS aways gets mentioned as a potential buyer in these kind of situations (see for example Mark Rittman) but I just don’t see it happening myself – it would be completely out of character. IBM on the other hand…

SQLBits

Today was the SQLBits conference in Reading and it went really well. We had 321 attendees, some great sessions, loads of freebies (on top of which everyone got a £30 book voucher, as well as the free evening event) and everyone seemed to be having a good time. Thanks are due to Care Parr of Microsoft for her help organising it, Tony, Simon and Martin for doing the majority of the leg work, and the sponsors. I think all the slide decks will be up on www.sqlbits.com early next week and there might be some photos posted there soon too.

Hopefully the bigger event we’ve got planned for June/July next year will be equally successful…

Contact details

A few months ago Live Spaces introduced a new feature which has allowed people to send messages via the blogs it hosts. Lots of people contact me in this way to ask me questions etc, and that’s fine – but at least 30% of messages I get I can’t reply to because of how the person sending the message has set up their Windows Live Profile. So, if you’d like to get in touch with me please email me direct (you can find my email address at http://www.crossjoin.co.uk/contact.html) instead! Peter, if you’re reading…

OK, after having received my latest message I can’t reply to, I’m just going to switch off this functionality. It’s a bit stupid of the Live Spaces team to let people send messages that can’t be replied to.

BI Virtual Conference

Looks like there’s going to be a ‘BI Virtual Conference’ on Octobr the 4th:

http://events.unisfair.com/index.jsp?eid=212&seid=28&code=tnflash

Justa marketing exercise for PerformancePoint of course, but some of the presentations might be interesting.

Chris Webb's BI Blog

Will MDX Go Mainstream?

Like this:

Fun and Games with Schema Rowsets

Like this:

SQLBits Sessions now available for download

Like this:

Asymmetric Sets on Columns in Reporting Services

Like this:

Precision Considerations for Analysis Services white paper

Like this:

Maximising Attribute Relationship Aggregations in Changing Dimensions article

Like this:

SAP to buy Business Objects

Like this:

SQLBits

Like this:

Contact details

Like this:

BI Virtual Conference

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: