SSRS and SSAS-Sourced Parameter Dataset Performance Issues

Ahhh, Reporting Services and Analysis Services integration – a never ending source of pain and frustration! Although I shouldn’t complain because it’s been quite lucrative for me in terms of consultancy work…

Anyway, recently I was tuning a SSRS report based on a SSAS cube for a customer. Looking at Profiler I could see a number of slow queries being executed, which was strange given that it was a fairly simple report. It turned out that during the design phase this report had had a number of parameters created in the SSAS query designer that had later been deleted; however, when you delete a parameter in the SSAS query designer BIDS does not delete the hidden dataset that it is bound to. What’s worse is that when an SSRS report is executed all dataset queries are also executed, even if the datasets aren’t used anywhere in the report, which means you get the overhead of running extra queries. It’s an easy mistake to make and in this case the execution of unused datasets was adding several seconds to the execution time of the report.

You can reproduce this problem very easily by creating a simple report based on the Adventure Works cube. Once you’ve created the data source, open the query designer, drag the Internet Sales Amount measure onto the grid, drag the Customer hierarchy from the Customer dimension onto the filter pane and check the Parameters box:

image

Now close the query designer and reopen it, then remove the Customer hierarchy from the filter pane, close the query designer again and delete the report parameter. When you Preview the report you’ll see the following in Profiler:

image

The highlighted row shows the hidden dataset is being executed. What you need to do to fix this is to right-click on your data source and check the Show Hidden Datasets option:

image

You’ll then see the offending, automatically-generated, hidden dataset and you can delete it:

image

Luckily, BIDS Helper has functionality to find unused datasets in your reports for you:
http://bidshelper.codeplex.com/wikipage?title=Dataset%20Usage%20Reports&referringTitle=Home

And there’s more! What I found really interesting about this parameter dataset query was how long it was taking to execute. In this example 2.5 seconds, even on a warm cache, seems like a very long time to me even though there are a lot of members on the Customer hierarchy. Once the report is deployed that goes down to a consistent 2.1 seconds, and when I run the same query through SQL Management Studio it goes down to 1.5 seconds. Why the difference in execution times? I’m not sure, but I suspect it’s a combination of the connection string properties used and the use of a flattened rowset. In any case, 1.5 seconds is still slow and it’s certainly not good if you actually do want to use a query like this in a dataset bound to a parameter.

Luckily, if our parameter datasets are causing performance problems, we can usually rewrite the queries involved to make them faster. Here’s the original query from the parameter in the example:

WITH
MEMBER [Measures].[ParameterCaption] AS
[Customer].[Customer].CURRENTMEMBER.MEMBER_CAPTION
MEMBER [Measures].[ParameterValue] AS
[Customer].[Customer].CURRENTMEMBER.UNIQUENAME
MEMBER [Measures].[ParameterLevel] AS
[Customer].[Customer].CURRENTMEMBER.LEVEL.ORDINAL
SELECT
{[Measures].[ParameterCaption],
[Measures].[ParameterValue],
[Measures].[ParameterLevel]}
ON COLUMNS ,
[Customer].[Customer].ALLMEMBERS
ON ROWS
FROM [Adventure Works]

If we decide that we can make do without the All Member and the level-based indenting that goes on in the parameter dataset (this is an attribute hierarchy, after all, so there’s just one level), we can use the following query in the dataset instead:

WITH MEMBER MEASURES.DUMMY AS NULL
SELECT
{MEASURES.DUMMY}
ON COLUMNS ,
[Customer].[Customer].[Customer].MEMBERS
DIMENSION PROPERTIES MEMBER_CAPTION, UNIQUE_NAME
ON ROWS
FROM [Adventure Works]

Once the first query above has been replaced with the second, and the report parameter has been hooked up to use the new fields, Profiler shows that the time taken to execute the parameter dataset has gone down to around 0.7 seconds:

image

That is, of course, almost 2 seconds faster than the original query in Preview mode and almost 1.5 seconds faster than the original query in the deployed report. Not a lot on its own but certainly noticeable, and if you have more than one large parameter the cumulative gain could be quite significant. If you create a separate OLEDB connection to the cube and use the second query in a dataset, the execution time is even faster, going down to around 0.45 seconds:

image

Incidentally, some of the more astute may be asking why I need to include MEASURES.DUMMY in the query above when I can use an empty set on the columns axis instead. Two reasons: one, if you use an empty set on columns in the OLEDB connection you get no rows returned; two, I noticed when the query was being executed in SSRS a Query Subcube event was raised suggesting measure data was being requested from the cube – this didn’t happen when I ran the query in SQL Management Studio. I suspect both problems are something to with SSRS using a flattened rowset, so I’ll investigate and post back here when I get an answer.

NodeXL, Network Graphs and the Eurovision Song Contest

Via the Perceptual Edge blog, earlier this week I came across a cool new open-source Excel addin, developed by Microsoft Research and various people at universities around the world, called NodeXL. You can download it from Codeplex here:
http://nodexl.codeplex.com/
Marc Smith’s blog also has a lot of information on it here:
http://www.connectedaction.net/

Basically it’s a tool for displaying and analysing network graphs. That sounds a lot more complicated than it actually is – really all it means is that you can use it for analysing the relationships between things. For example, if you had a list of people who were registered on a social networking site like Facebook, you could use NodeXL to display which people were friends with with other people; in a business setting you could use it to display which products were bought together in the same ‘basket’. Although it’s somewhat buggy and crashed on me a few times, it’s a lot of fun to use and I can see that there are a lot of potential uses in the BI space, especially now that Excel is being pushed as Microsoft’s BI client tool of choice.

So anyway, after I downloaded it I had a think about what data I could with it. There are built-in options to import data from Outlook (to analyse which email contacts appear together on the To and CC lines of emails) and social networking sites like Twitter, Flickr and YouTube. But I wanted something a bit more fun – and then it suddenly occurred to me, why not use it to analyse voting patterns in the Eurovision Song Contest? Apologies to readers outside Europe who don’t know what this is: basically it’s an annual competition where each country in Europe enters the worst pop song they can possibly come up with, and the winner is chosen by a vote; what everyone knows, of course, is that countries vote for the other countries they are most friendly to rather than on the basis of the songs themselves, for example with Greece and Cyprus always voting for each other (see here and here for some examples of detailed analyses of voting patterns).

I downloaded the raw data for voting in the 2009 competition from here, and with a bit of hacking got it into the template that’s bundled with NodeXL. Here’s an example of the output, using the impressively-named Fruchterman-Reingold layout, with the UK highlighted to show who voted for the UK and who the UK voted for:

image 

I’ll admit this particular graph is a bit busy, but in NodeXL itself you can zoom in and out, filter and analyse the relationships very easily; even here, though, we can see here things like the fact that the UK voted for Switzerland but Switzerland didn’t vote for the UK. Here’s a circular layout with the winners, Norway, highlighted:

image

Last of all, here’s the data filtered to show only Greece, with the points that Greece awarded to each other country shown on the vertices:

image

There are a lot of different options and this is a very complex product, so the fact it’s free is particularly amazing. It’s this kind of thing that makes Excel such a rich platform for data analysis – imagine using this with PowerPivot or the data mining addin, or other open-source tools like Sparklines for Excel.

Subselects and Calculated Members in R2

As Darren noted the other week, there was a recent thread on the MSDN Forum that detailed the few enhancements for traditional SSAS users in SQL 2008 R2. I thought I’d pick up on the one MDX-related change, which is to do with calculated members and subselects. From a pure language point of view you’d be right in thinking that this doesn’t sound all that exciting, but it does fix one long-running issue that has caused a lot of people a lot of pain over the last couple of years – namely the way that Excel can’t handle calculated members on non-measures dimension. For a bit of background, see:
http://sqlblog.com/blogs/marco_russo/archive/2007/01/31/excel-2007-pivottable-with-calculated-members.aspx
http://sqlblog.com/blogs/marco_russo/archive/2007/03/07/ssas-2005-sp2-breaks-excel-calculated-member-selection.aspx
http://sqlblog.com/blogs/marco_russo/archive/2008/12/08/ssas-2008-calculated-members-still-don-t-love-excel-2007.aspx
http://blogs.msdn.com/excel/archive/2008/02/05/common-questions-around-excel-2007-OLAP-PivotTables.aspx

If you’re using Excel 2010 with Analysis Services 2008 R2 you’ll now be able to filter on individual calculated members again. Frankly, it’s a disgrace that it’s taken this long to fix and that you have to use Office 2010 and R2 before it works (especially when every other client tool worth its salt does not have this problem), but I suppose we should be grateful that it at least has been fixed.

Anyway, let’s take a look at how this change has actually been implemented. First of all, you only get the new behaviour when you use the Subqueries connection string property. It’s been possible to set Subqueries=1, which allows the use of calculated members in subselects, since SSAS 2008 (subqueries=0, which is the default, does not allow this) but I’m told this didn’t solve all of the Excel team’s problems; therefore the new setting Subqueries=2 was introduced in R2.

Now let’s add a calculated member to the Adventure Works cube as follows:

CREATE MEMBER CURRENTCUBE.[Date].[Calendar].[Calendar Year].&[2004].CALC1
AS 111;

As you can see, it’s on the Calendar hierarchy of the Date dimension, on the Calendar Semester level underneath the year 2004.

If we run the following query with no special connection string properties set:

SELECT {[Measures].[Internet Sales Amount]} ON 0,
[Date].[Calendar].[Calendar Semester].ALLMEMBERS
ON 1
FROM
(SELECT
[Date].[Calendar].[Calendar Year].&[2004].[CALC1]
ON 0
FROM [Adventure Works])

We get the error “A set has been encountered that cannot contain calculated members”. However, as you would expect, when you set Subqueries=1 or Subqueries=2 the query runs successfully and you get the following results:

image

Now, if we change the query to ask for all the members at the Year level as follows:

SELECT {[Measures].[Internet Sales Amount]} ON 0,
[Date].[Calendar].[Calendar Year].MEMBERS
ON 1
FROM
(SELECT
[Date].[Calendar].[Calendar Year].&[2004].[CALC1]
ON 0
FROM [Adventure Works])

When we use Subqueries=1 we get an empty set returned on rows:

image

When we use Subqueries=2 we get the year 2004, the parent of the calculated member, and a null for the measure value:

image

Why the null? Although there is data for 2004 in the cube, in our subselect we’ve only included a calculated member child of 2004, and calculated members’ values don’t aggregate up to their parents.

For the following query, where the year is in the subselect and the semesters are on rows:

SELECT {[Measures].[Internet Sales Amount]} ON 0,
[Date].[Calendar].[Calendar Semester].ALLMEMBERS
ON 1
FROM
(SELECT
[Date].[Calendar].[Calendar Year].&[2004]
ON 0
FROM [Adventure Works])

For both Subqueries=1 and Subqueries=2 you get the following result:

image

Interestingly, if you include a Year and Semester in the subselect as follows:

SELECT {[Measures].[Internet Sales Amount]} ON 0,
[Date].[Calendar].[Calendar Semester].ALLMEMBERS
ON 1
FROM
(SELECT
{[Date].[Calendar].[Calendar Year].&[2004],
[Date].[Calendar].[Calendar Semester].&[2004]&[2]}
ON 0
FROM [Adventure Works])

For Subqueries=1 you get this:

image

And for Subqueries=2 you get this:

image

I won’t go on (though there are more examples I could give) because I’m sure no-one outside the Excel team will ever care about any of this, but it’s interesting to note nonetheless and I doubt it will ever get properly documented anywhere. I’ve also been told there are some changes to how the DrillDownX family of functions work with regard to calculated members in R2, but I’ll save that for another post.

Tableau Public

In a clever marketing move, Tableau have just released a new free version of their tool called Tableau Public. I’ve been a fan of Tableau since I saw it a while back (was it really 2005?) but never seen it actually used at one of my customers, I suppose because of its hefty price. Anyway, what Tableau have done is created a basic version which can only connect to Access, Excel and Text files and can only save to Tableau’s own web gallery; however, it’s then possible to embed the visualisations you create in your own website/blog etc for sharing with the world. The stated aim is for this site to be the YouTube of data visualisation; coming from Tableau this is a half-credible claim because they really do understand this area and have some great tools. Also, I’m sure the extra publicity it will generate for the company will do no harm for sales of the paid version either.

And I’m sure it will be great when they get over their teething difficulties, because at the time of writing I can’t actually save anything… maybe they underestimated the amount of interest this would generate?

OK, it’s working now. But of course Windows Live Spaces doesn’t allow me to embed a ‘viz’ in a blog post (grr); instead, here’s a simple example I just uploaded using stats on the last 30 days of traffic on the front page of this blog:
http://public.tableausoftware.com/views/Blogstats/Sheet1

Here’s what it looks like in the client:

image

So Tuesday is the best day for page loads, but Thursday is slightly better for unique visitors… and so on. But it’s a cool tool and definitely worth checking out.

SQLBits VI Dates Announced

It’s SQLBits time yet again! After a long absence we’re back in the South East (of England, that is) and we’ll be going to central London for the first time: Church House Conference Centre. It’s also going to be a one day event, but still free, and it’s going to be on Friday April 16th rather than a Saturday. As always, check http://www.sqlbits.com/ for more details in the coming weeks…

OData and Microsoft BI

I first came across OData last year when Jamie Thomson blogged about it; since then I hadn’t really thought about it much until I came across Douglas Purdy’s blog and specifically his post from yesterday which really brought home how important it is to the future of Microsoft BI. I would urge you to watch the video “OData: The Movie” that he mentions in his post because it gives a really good introduction to the way that OData can be used in BI scenarios; if you don’t have the time, all you really need to know is that it’s OData that makes it possible for PowerPivot to consume data from SSRS reports and Sharepoint lists.

Just watching this video made my mind boggle with the possibilities of OData, although since I’m a long way from being an expert in this area I won’t bother to detail all of these fantasies as they’re probably rubbish. However, just consider how much easier life would be for the PowerPivot user of the future if the internet was full of sites that supported OData; certainly, when I’ve looked at the new UK government websites such as the recently-launched http://data.gov.uk/ and http://data.london.gov.uk/, or even sites like the Guardian Data Store that I blogged about last year, I’ve felt that the lack of a standard format to consume this data (a badly-formatted Google spreadsheet is not ideal) has seriously limited these sites’ usefulness. And what if Bing came up with a service like the sadly-useless Google Squared, where you could search for data on the web and return it in a structured, OData format?

I’d also love to see Analysis Services support OData too, in some shape or form. Perhaps it could be used to solve some of the same problems with XMLA that Julian Hyde lists in his recent post on xmla4js. Wouldn’t it also be cool if there was a standard interface for publishing an MDX query up to SSAS from any client tool, and then be able to consume it via OData (similar to what I suggested here)? You’d then enable scenarios like this: power user creates query using an advanced SSAS client tool, then publishes it up to SSAS whereupon it becomes something similar to a SQL stored proc with the results available as an OData feed, and therefore can be consumed either in your own code (eg in a website), by PowerPivot users, or by something like SSRS (which would then know nothing about the query used but would just be used to format the resultset).

SQLBits V Videos Now Available

Those nice people at MS sent a film crew to record all of the sessions at SQLBits V last November, and we’re pleased to announce that the videos are now available on the site for viewing/download:

http://www.sqlbits.com/News.aspx?Title=SQLBits%20Videos%20available%20now%20available

Enjoy!

PASS European Conference and the Microsoft Architect Conference

Two new speaking engagements to mention: firstly, I’m happy to announce I’ll be doing two sessions at the PASS European Conference, which is taking place on April 21st-23rd in Dusseldorf, on DAX and SSAS cache-warming. You can view the whole agenda here:
http://www.sqlpass.org/summit/eu2010/Agenda.aspx

I’m also going to be speaking on PowerPivot and how it fits into the wider corporate BI picture at the Microsoft Architect Insight Conference in London on the 31st of March. You can see the agenda for that event here:
http://msdn.microsoft.com/en-gb/architecture/ee959262.aspx

As always, say hello if you see me…

Connecting to SQL Azure from Excel 2007

Sometimes I have an idea, spend a bit of time testing it out, and it ends up being a dead end. That’s what happened this evening but there’s at least one interesting bit of info that resulted so I thought I’d blog anyway…

My original thought was this:
* Excel 2007 can connect to SQL Server to retrieve data for use in reports in a worksheet
* SQL Azure is SQL Server in the cloud
* Office Web Apps gives us the ability to view Excel spreadsheets in the cloud
* So can I create an Excel spreadsheet that connects to SQL Azure, build a report using data from there, upload it to the Excel Web App and then refresh the connection so that my spreadsheet in the cloud displays live data from the cloud?

The short answer is no, at least not at the moment. But after a bit of trial-and-error I did get halfway there and manage to hook Excel 2007 up to SQL Azure (nb I’m not talking about using PowerPivot, which does work with SQL Azure, but the built-in Excel functionality). Here’s how:

  • None of the built-in functionality for connecting to SQL Server from Excel 2007 seems to work. However it is possible to connect to ODBC data sources from Excel and SQL Azure can be exposed as an ODBC data source.
  • So, in Excel, go to the Data tab and click on the “From Other Data Sources” and “From Data Connection Wizard”.
  • Select “Other/Advanced” and then the Microsoft OLE DB Provider for ODBC Data Sources and then click Next
  • Select the ‘Use Connection String’ option and paste the ODBC connection string that SQL Azure generates for you into the box.
  • Paste the value of the Uid property in the User name box, and put your password in the Password box. Delete the Uid and Pwd properties from the connection string.
  • Type the name of your database into the Initial Catalog box, then click OK
  • Finally a list of tables in your SQL Azure database appears; choose one, create an Excel data source and then create either a table or pivot table from the data. Click OK, enter your password one last time, and bingo!

Unfortunately, as I said, when I uploaded the resulting spreadsheet to the Excel Web App, I got the following error message:

image

Shame – I can understand why it makes sense for most external data connections not to be supported, but in this case, when the external data you’re connecting to is also in the cloud, it would be nice if an exception could be made.

One day, though, I’m sure a scenario like this will work. When I think about what Microsoft’s story for cloud BI might be like, the Excel Web App is the obvious candidate for the reporting tool. Whether you’re reporting direct from relational data stored in SQL Azure, or from some kind of cube (PowerPivot in the cloud is another obvious direction), Excel is going to be the easiest way to do it for the largest number of people. I do see a role for some kind of SSRS in the cloud too, but even in the Microsoft BI stack at the moment there’s a lot of overlap between SSRS and Excel/Excel Services for reporting; I wonder if this will be rationalised at some point? For example Report Builder has never really caught on as a way of letting end-users build their own reports, so why not forget it, develop Excel for this purpose and somehow extend SSRS’s rich functionality for managing and scheduling reports to work with Excel-based reports? Just a thought.

Cell Security and Calculated Members Defined in the WITH Clause or Session

I was asked an interesting question today about cell security – how can you get it to work with calculated members defined in the WITH clause or the session? If, for example, you create a role on the Adventure Works database with the following expression in the Read Permissions box for cell security:

[Geography].[Geography].currentmember is [Geography].[Geography].[Country].&[Australia]
AND
( [Measures].currentmember IS [Measures].[Reseller Sales Amount]
OR
[Measures].currentmember IS [Measures].[Reseller Order Count])

If you then run the following query when connecting via the role:

WITH
MEMBER MEASURES.TEST AS
[Measures].[Reseller Sales Amount] * [Measures].[Reseller Order Count]
SELECT
{MEASURES.TEST, [Measures].[Reseller Sales Amount], [Measures].[Reseller Order Count], [Measures].[Reseller Gross Profit]}
ON 0,
[Geography].[Geography].MEMBERS
ON 1
FROM [Adventure Works]

You’ll see the the following results:

image

This is pretty much what you’d expect – you only see values for Australia for Reseller Sales Amount and Reseller Order Count. The problem here is how you can grant access so the user can see the measure TEST which has been defined in the WITH clause. If you try the following expression in the role:

[Geography].[Geography].currentmember is [Geography].[Geography].[Country].&[Australia]
AND
( [Measures].currentmember IS [Measures].[Reseller Sales Amount] OR [Measures].currentmember IS [Measures].[Reseller Order Count] OR [Measures].currentmember IS [Measures].TEST)

You see the following error in a few cells in the query above:
#Error CellPermission (3, 162) The member ‘[TEST]’ was not found in the cube when the string, [Measures].[TEST], was parsed.

And in any case, how can you know in advance what the name of the calculated measures you want to grant access to are going to be anyway?

What we actually want to do is retain control of all measures, calculated or otherwise, that are defined on the cube, but allow access to any calculated measures defined in the WITH clause or the session. The way to do it is to rely on the fact that cell security is evaluated after the MDX Script and do the following. First define a named set on the cube at the end of the MDX Script something like this:

CREATE SET CURRENTCUBE.ALLMEASURES AS MEASURES.ALLMEMBERS;

This set contains all of the measures, including calculated measures, that have been created on the MDX Script – except those that have their Visible property set to False (these you’d have to add to the set manually).

Then use an expression such as this in your role:

[Geography].[Geography].currentmember is [Geography].[Geography].[Country].&[Australia]
AND
( [Measures].currentmember IS [Measures].[Reseller Sales Amount] OR [Measures].currentmember IS [Measures].[Reseller Order Count] OR
Count(Intersect({[Measures].currentmember},{ALLMEASURES}))=0)

The last line is the new part: it checks to see whether the currentmember on Measures is included in the set we’ve just defined. If it is, we can be sure it’s defined on the cube. If it isn’t, it must be defined either in the WITH clause or in the session so we can grant access to it. If you then run the query again, you get the following results:

image

As you can see, we can now see the value for TEST. Of course you also need to be extremely careful you’re not using a security hole here by doing this, test thoroughly and use Read Contingent permissions as appropriate.