SQL2008 July CTP Released

You can get it from Connect, as always:

https://connect.microsoft.com/SQLServer/content/content.aspx?ContentID=5395

I’m not going to install this CTP because a) I’m about to go on holiday and b) there isn’t a VPC available yet, although one is coming in a few weeks.

What’s new in AS? Well, according to the docs nothing much – just an improvement to the time series algorithm in data mining. Is that it, I hear you cry? Well at the moment there are two possibilities as far as I can see: either the AS team have done their usual bad job of telling anyone what they’ve implemented, or given the new policy of only releasing finished code in the CTPs they’ve not released anything because it’s not finished. I was expecting to see some new features in this release (along the lines of what Vidas Matelis blogs about here: http://www.ssas-info.com/VidasMatelisBlog/28_ssas-2008-katmai-upcoming-changes-in-the-next-ctp) but I’ll find out what’s happened.

Some news on the RS front though:

Reporting Services – New Server and Designer

Improvements represent the two major infrastructure changes for Reporting Services. Reporting Services enhances the processing engine and rendering extensions to enable new functionality, such as Tablix support, and scalability as well as remove the dependency on IIS.  Additionally, new report designer and configuration tool are provided that improve usability and workflow for RS customers.

Oh, and if you’re one of the few people who’ve been using Notification Services – bad luck, it’s dead:

 

5.1 SQL Server Notification Services Removed from SQL Server 2008

SQL Server Notification Services will not be included as a component of SQL Server 2008, but will continue to be supported as part of the SQL Server 2005 product support life-cycle. Moving forward, support for key notification scenarios will be incorporated into SQL Server Reporting Services. Existing Reporting Services functionality, such as data driven subscriptions, addresses some of the notification requirements. Features to support additional notification scenarios may be expected in future releases.

When are named sets in the WITH clause evaluated?

Following on from my postings of last week about dynamically-generated sets, I had a really interesting email yesterday from my colleague at Solid Quality Mentors Francesco De Chirico. He sent me a variant of one of my queries which worked when he wasn’t expecting it to work and to be honest when I looked at it, I couldn’t work out why it worked either. However after a bit of testing I realised he’d discovered something quite important that I at least didn’t know; as with my last post on sets in the MDX Script it all makes sense when you think about it though (see also Mosha’s comment on my last post for some explanation).

Consider the following query:

with
set mycheapset as {[Department].[Departments].&[2],[Department].[Departments].&[3]}
select {} on 0,
mycheapset on 1
from [Adventure Works]

It returns in a second on a cold cache on my laptop, as you’d expect. Now consider the same query with an extra, expensive set declaration in it that isn’t referenced in the query:

with
set myexpensiveset as bottomcount(
{[Department].[Departments].&[2],[Department].[Departments].&[3]}
*
{[Scenario].[Scenario].&[1],[Scenario].[Scenario].&[2] }
*
[Account].[Account].[Account].members
*
[Date].[Date].[Date].members
, 10,[Measures].[Amount])
set mycheapset as {[Department].[Departments].&[2],[Department].[Departments].&[3]}
select {} on 0,
mycheapset on 1
from [Adventure Works]

This now executes on a cold cache in 13 seconds on my laptop despite returning exactly the same results. Even a query which doesn’t reference any of these sets executes in a minimum of 13 seconds:

with
set myexpensiveset as bottomcount(
{[Department].[Departments].&[2],[Department].[Departments].&[3]}
*
{[Scenario].[Scenario].&[1],[Scenario].[Scenario].&[2] }
*
[Account].[Account].[Account].members
*
[Date].[Date].[Date].members
, 10,[Measures].[Amount])
set mycheapset as {[Department].[Departments].&[2],[Department].[Departments].&[3]}
select {[Measures].[Internet Sales Amount]} on 0
from [Adventure Works] 

As with named sets in the MDX Script, named sets in the WITH clause are being evaluated when the query runs regardless of whether they’re referenced or not.

This has obvious implications for performance tuning: if you’ve got an MDX query which has a lot of expensive set definitions in the WITH clause that aren’t used, then you really need to delete them! However you might have queries in custom-built applications or something like SSRS where sets are present which may or may not be necessary, depending on certain parameters. Here’s an example of the type of thing I’m talking about:

with
set myexpensiveset as extract(bottomcount(
{[Department].[Departments].&[2],[Department].[Departments].&[3]}
*
{[Scenario].[Scenario].&[1],[Scenario].[Scenario].&[2] }
*
[Account].[Account].[Account].members
*
[Date].[Date].[Date].members
, 10,[Measures].[Amount]), [Department].[Departments])

set mycheapset as {[Department].[Departments].&[2],[Department].[Departments].&[3]}

member measures.test as
iif([Department].[Departments].currentmember is [Department].[Departments].&[2],
count(myexpensiveset), count(mycheapset))

select {[Measures].test} on 0,
[Date].[Calendar Year].members on 1
from [Adventure Works]
where([Department].[Departments].&[3])

In this case the WHERE clause might be parameterised and could contain either Department 2 (which would return the count of the expensive set) or Department 3 (which would return the count of the cheap set). You can rewrite this to be much more efficient by taking advantage of the fact that sets are evaluated after the WHERE clause so you only evaluate the expensive set when you need to:

with
set myiifset as
iif([Department].[Departments].currentmember is [Department].[Departments].&[2],
extract(bottomcount(
{[Department].[Departments].&[2],[Department].[Departments].&[3]}
*
{[Scenario].[Scenario].&[1],[Scenario].[Scenario].&[2] }
*
[Account].[Account].[Account].members
*
[Date].[Date].[Date].members
, 10,[Measures].[Amount]), [Department].[Departments]),
{[Department].[Departments].&[2],[Department].[Departments].&[3]})

member measures.test as count(myiifset)

select {[Measures].test} on 0,
[Date].[Calendar Year].members on 1
from [Adventure Works]
where([Department].[Departments].&[3])

Generating Excel 2007 workbooks linked to SSAS using SSRS and OfficeWriter

Here’s the second of my (paid-for) articles for the folks at Softartisans:

http://officewriter.softartisans.com/officewriter-410.aspx

This time they wanted me to write something about using OfficeWriter with Analysis Services, and after a bit of thinking I came up with the idea of trying to generate Excel workbooks using OfficeWriter and Reporting Services that use the new Excel 2007 cube functions so that they have live links back to Analysis Services. After a lot of sweat I got it to work, and I have to thank Excel MVP Charley Kyd (the guy behind the Excel BI site http://www.exceluser.com/) for giving me the solution to the problem of OfficeWriter writing values to the cell Value rather than Formula property. Hopefully this is the sort of thing that could be made easier when/if OfficeWriter gets integrated with SSRS in Katmai.

Some Time Intelligence Wizard Calculations are Inefficient

Ahh, my old friend the Time Intelligence Wizard…. some of calculations it produces didn’t work at all at RTM, some were still buggy in SP1 and now I see from the following threads on the MSDN Forum started by David Beavonn it seems to be generating inefficient MDX:

http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1285248&SiteID=1
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1290367&SiteID=1
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1290538&SiteID=1

To be fair, the reasons why the code is inefficient aren’t completely clear but there’s at least one important rule that has emerged and that is if you can hard-code a unique name instead of using the DefaultMember function you should. David says a bit more than just this though, and I’ll try to summarise.

If you do use the wizard to create a year-to-date calculation you’ll get a bit of MDX Script looking something like this:

/*
Begin Time Intelligence script for the [Dim Time].[Hierarchy] hierarchy.
*/

Create Member
CurrentCube.[Dim Time].[Hierarchy Dim Time Calculations].[Year to Date]
As "NA" ;

Scope(
{
[Measures].[Order Quantity],
[Measures].[Sales Amount]
}
) ;

// Year to Date
(
[Dim Time].[Hierarchy Dim Time Calculations].[Year to Date],
[Dim Time].[Calendar Year].[Calendar Year].Members,
[Dim Time].[Dim Time].Members
) =

Aggregate(
{ [Dim Time].[Hierarchy Dim Time Calculations].DefaultMember } *
PeriodsToDate(
[Dim Time].[Hierarchy].[Calendar Year],
[Dim Time].[Hierarchy].CurrentMember
)
) ;

End Scope ;

/*
End Time Intelligence script for the [Dim Time].[Hierarchy] hierarchy.
*/

After various experiments and hints from PSS, he says that you need to make all of the following changes to improve the performance of these calculations (note that I’ve not reproed the poor performance myself – it’s not apparent on AdventureWorks – but he seems to know what he’s talking about so I’ll take his word for it):

  • Replace any use of .DefaultMember with a hard-coded unique name
  • Replace the use of the * operator with the Crossjoin function. This is interesting: in one of the threads above Mosha mentions that * can either mean crossjoin or scalar multiplication, and in some circumstances what it’s meant to mean is ambiguous; resolving this ambiguity hurts performance. But as David rightly points out, in the MDX above the context surely isn’t ambiguous: the first parameter of Aggregate() always takes a set, we’ve got braces around the reference to the default member and on the right hand side the PeriodsToDate function also always returns a set. So I’m wondering whether it might be safer to always use Crossjoin…?
  • Replace the use of the Aggregate function with the Sum function. Mosha rightly points out that you can only do this when all of your measures are additive and is sceptical about whether it makes a significant impact on performance anyway.

As a result, the above section of script should look like this:

/*
Begin Time Intelligence script for the [Dim Time].[Hierarchy] hierarchy.
*/

Create Member
CurrentCube.[Dim Time].[Hierarchy Dim Time Calculations].[Year to Date]
As "NA" ;

Scope(
{
[Measures].[Order Quantity],
[Measures].[Sales Amount]
}
) ;

// Year to Date
(
[Dim Time].[Hierarchy Dim Time Calculations].[Year to Date],
[Dim Time].[Calendar Year].[Calendar Year].Members,
[Dim Time].[Dim Time].Members
) =

Sum(
Crossjoin(
{ [Dim Time].[Hierarchy Dim Time Calculations].&[Current Dim Time] },
PeriodsToDate(
[Dim Time].[Hierarchy].[Calendar Year],
[Dim Time].[Hierarchy].CurrentMember
)
)
) ;

End Scope ;

/*
End Time Intelligence script for the [Dim Time].[Hierarchy] hierarchy.
*/

I’d be interested to hear from anyone else out there who does manage to reproduce a massive improvement in performance after making these changes on their cube.

Second Blog Birthday

Today is my blog’s second birthday; two years is pretty ancient in blog years I think. Thanks everyone for sticking with me this long!

Professionally, this year has seen a lot of changes for me: I’ve left the world of permie work and set up my own company, been made an MVP and seen my name on the front cover of a book for the first time. Working as a freelancer has certainly provided me with a lot of good material for this blog (watch out for some interesting stuff on cache warming and query parallelism in the next few weeks) even if it’s meant I’ve had much less time to write it up; the blog has, in turn, proved to be a good source of advertising for my consultancy work. I’m really enjoying myself at the moment – I hope 2007 is as good as 2006 has been.

Resolutions for next year:

  • Post more on the Analysis Services MSDN Forum and microsoft.public.sqlserver.olap. I’ve been a bit slack over the last few months.
  • Get myself some of these new BI certifications that have come out.
  • Stretch myself more, technically. It’s too easy to stick around in my Analysis Services comfort zone. I need to improve my SSIS and C# skills for instance.
  • Get into PerformancePoint.
  • Lose weight. I was with a customer a few weeks ago and one of the guys there remarked that I looked older and fatter than I did in the picture on my blog (what Jon calls my ‘BI in the Jungle’ picture), although I looked thinner than the last time he’d seen me earlier in the year. Hmm, this is the result of working from home and not getting enough exercise.
  • Keep having fun…

PerformancePoint CTP

I was away last week, so I missed the news that the first CTP of PerformancePoint was available for download from Connect. Since the world and his dog have subsequently blogged about it I haven’t bothered until now; I have to admit that I’ve not got round to downloading or installing it yet either (though it will be providing me with justification for buying a new server to install it on in the sales). Other bloggers like Patrick Husting, Charlie Maitland and Ian Tien are probably better bets for the latest news in this area. I’m a little bit suspicious of the amount of hype surrounding it too… hmm, well, we’ll see what it’s like next summer.

Client Tools: The Next Generation

After the Proclarity acquisition earlier this year, my thinking was that while it was a good thing overall for the Microsoft BI community it would have the negative effect of reducing choice and innovation. But in the months since I’ve been surprised to see a steady stream of new third party applications enter the market. Some, like RSInteract, I’ve mentioned before, but since there are quite a few now that are worthy of mentioning I thought I’d round up a few in a single posting.

Radius, from 90 Degree Software, has got a bit of exposure over the last few days thanks to Russell Christopher’s blog entry. Russell sums up its features pretty well and I have to say I was quite impressed by the various webcasts I’ve seen – I like the idea of being able to reuse pieces of other people’s reports while building your own. I’m not sure it supports AS as a data source at the moment, alas, but I was told that they are planning on doing so soon. While the general rubbishness of Report Builder can only help the prospects of Radius it also highlights the point I want to make here quite well: although Microsoft have now killed the market for traditional, vanilla BI apps, if you are innovative and execute your ideas efficiently there are still going to be openings for you.

Of course visualisation has been another area where client tools have sought to set themselves apart from the pack over the last year. I blogged about Tableau a while ago and Fractal:Edge the other week, and more and more other tools are improving their visualisation capabilities. Take a look at the pie-chart tree report or the data mining report demos at http://www.reportportal.com/ for instance. It’s not surprising that the likes of Dundas (disclosure: I recently signed up to their partner program and got a freebie chess set in the post from them this morning) are also entering this space – Dundas OLAP 5.5 just got released, and has some cool features like the ability to draw freehand over or add comments and arrows to a report; I see Chart FX have something similar, Chart FX OLAP, but I’ve not checked it out yet.

Another way to distinguish your product is to focus on a particular niche. I saw a demo of Intelligencia the other week and initially couldn’t see what it offered beyond any other more general-purpose tool. However it’s the details that are important, and this tool is targetted specifically at financial users who want to create printable reports for regulatory (Sarbanes-Oxley etc) reasons. This is something that you’d struggle to do with, say, Reporting Services, and having the tool as a Word 2007 addin makes a lot of sense; the OLAP querying functionality is also aimed at financial-style reports, making it easy to build up asymmetric sets of tuples for example. Also in the financial apps space is the more general CPM tool Calumo; looks impressive and is probably more sophisticated than Biz#/PerformancePoint will be, and I suppose the advantage of being a small software company in this area is that you can respond quickly to your customers’ requests and try to stay one step ahead of Microsoft.

So, then, a thousand flowers are blooming and all that. Having cool features does not necessarily entail commercial success but I wish all these tools well and I’m pleased to see that there’s still a lot of diversity out there – it can only benefit us all. And remember, if you’ve got a product that uses Analysis Services and you want a bit of free publicity on this blog then please send me details…

Processing Dimensions By Attribute and By Table

It was recently pointed out to me (by Hugo Lopes and Paulo Faria of Enabler in Portugal, who I had the pleasure of working with recently) that setting the ProcessingGroup property of a dimension can have quite a big impact on the amount of time it takes to process a dimension. By default this property is set to By Attribute, which means that each attribute in the dimension is processed separately using its own SQL Select statement which returns the distinct values for each member on the attribute. Here’s an example query used to process an attribute based on the Adventure Works Product dimension:

SELECT
DISTINCT
[dbo_DimProduct].[Color] AS [dbo_DimProductColor0_0]
FROM [dbo].[DimProduct] AS [dbo_DimProduct]

If you set it to By Table, however, in some circumstances (when your dimension is built from a single table?) only one SQL Select statement per table in the dimension structure is issued, reading all the data in, with AS presumably finding the distinct values itself.

When do you use which setting? I suppose it depends on a lot of things (performance of your data source, size and structure of your dimension, whether you’re processing other objects at the same time) so it’s going to be one of those things you’ll have to experiment with. Obviously most dimensions process extremely quickly anyway, but if you’ve got a large dimension which you need to process as quickly as possible this might come in handy.

NON_EMPTY_BEHAVIOR and sets

Late last year, in the middle of an email correspondence with Mosha, I included the following piece of an MDX Script containing a calculated member definition generated by BIDS when working in form view:

CREATE MEMBER
CURRENTCUBE.[MEASURES].[Demo]
AS [Measures].[Sales]*2,
FORMAT_STRING = "#,#",
NON_EMPTY_BEHAVIOR = { [Sales] },
VISIBLE = 1  ;

Mosha commented that putting braces round the measure [Sales] in the NON_EMPTY_BEHAVIOR property in this case would ‘do more harm than good’ and, although he didn’t expand on why this was (a good subject for a blog entry Mosha?) ever since then I’ve dutifully removed the braces that BIDS puts in but never noticed much impact. Until yesterday, when a query I was tuning which was running in 45 seconds started running in 8 seconds simply as a result of doing this. Hmmm…

While we’re here, it’s a personal hobby horse of mine to insist on using full unique names in all MDX calculations. So, in this case, I would use [Measures].[Sales] rather than [Sales]. Not only is it more readable but if you’re using dimension security you might run into problems if you don’t, as the following thread on the MSDN Forum demonstrates:
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=753483&SiteID=1

So, just to be clear, if you’re using NON_EMPTY_BEHAVIOR and have created your calculated member in form view, always be sure to change it from the format above to be something like this:

… NON_EMPTY_BEHAVIOR = [Measures].[Sales] …

UK BI User Group Evening – 29th November

As I mentioned the other week there’s going to be another BI User Group evening event at TVP in Reading next month, held in association with Tony Rogerson and the UK SQL Server User Community. I’ve finally got the agenda together and you can register here:
http://sqlserverfaq.com/?eid=83

First up we’ve got David Parker, a Visio MVP, speaking about the new BI features in Visio 2007. I’ve been interested in this subject ever since Nick Barclay first brought it to my attention; David is actually writing a book on the subject (see http://www.visualizinginformation.com/) which I’m looking forward to reading when it comes out. Next we’ve got Sanjay Nayyar of IMGroup talking about the various ways you can use Microsoft BI tools with SAP. Finally, we have Andrew Sadler of Proclarity/Microsoft talking about PerformancePoint Server, probably the hot topic of the moment. I’m sure there’ll be some interesting discussions afterwards…