BI.Quality

Here’s something interesting that I’ve just seen on Hilmar Buchta’s blog: a new, open source testing suite for the Microsoft BI stack (including SSAS) called BI.Quality. It’s available for download from Codeplex here:
http://biquality.codeplex.com/

Unfortunately, at the time of writing, the documentation’s only in German although an English user manual is promised. From what I can make out, though, it looks like it could be really useful.

Google Public Data Explorer

Google made yet another tentative step into the world of web-based BI with the launch of Public Data Explorer, a tool for analysing at least some of those public data sets that have been appearing thick and fast over the last year or so. Although it’s very fairly basic compared to other tools out there like (to pick two examples from many) Tableau Public or Timetric, it’s actually quite fun and much better than other Google efforts in this area like Fusion Tables. There’s a very limited number of data sets available at the moment and there aren’t many options for slicing and dicing, but the functionality that is there is quite slick – I especially like the way you can zoom in and out along the time dimension. Here’s a screenshot showing some European unemployment data:

image

Here’s a link to that analysis if you’d like to play with it yourself.

Of course, this will remain a toy until you can get more data into it: all the public data that’s available anywhere on the web, data that’s in Google spreadsheets, and maybe any data in more ‘difficult’ formats such as html tables in web pages (though that’s when you really need a complex tool like Kapow to extract it) or even any data that’s returned when you run a search (which Google has been doing to a limited extent for a year or so now; Public Data Explorer builds on this).

I’m surprised Wolfram Alpha hasn’t done something like this already; I wonder if Microsoft has something up its sleeve here too? After all it has a partnership with Wolfram Alpha to get data from there, and in Bing a search engine in search of differentiating features. Maybe Live Labs Pivot is part of the story? And given that it’s likely there’ll be some kind of cloud-based Analysis Services in the future, which I’m willing to bet would actually be more like a cloud-based PowerPivot and which will presumably work with Excel on the web, I can see Microsoft could have quite a strong story if it wanted. But this is all conjecture, of course. I wonder if there’s anyone on the Bing team who understands BI? No, I’m really making wild guesses now, so I think I’ll finish here…!

Thoughts on how PowerPivot and SSAS could work together

After yesterday’s stream of consciousness on how PowerPivot could be used in SSRS, here’s a follow-up post on how PowerPivot and ‘traditional’ SSAS could be integrated. Hold on, you say, surely that’s a no-brainer? Surely all that would need to happen would be that Vertipaq would become a new storage mode inside SSAS, along with MOLAP, ROLAP and HOLAP, and everyone would be happy? Well, maybe. But here’s alternative idea that I bounced off some friends a while back and got good feedback on, which I thought I’d air here.

Before I go on, let me state my position on some things:

  • I like PowerPivot, and the more I use PowerPivot the more I like it.
  • I really like the power of the Vertipaq engine, and I want to be able to use it in a corporate BI environment.
  • I really like DAX, and I want to be able to use it in a corporate BI environment.
  • BUT SSAS as we have it today is a very mature, rich tool that I don’t want to lose. PowerPivot models will always be a little rough-and-ready; a good SSAS cube is a lot more ‘finished’ and user-friendly (I always liken building a cube to building a UI). SSAS dimension security is, for example, an absolute killer feature in many corporate BI solutions; PowerPivot won’t have anything like this until at least the next version, whenever that will be.
  • I also love MDX and I don’t want to lose it. MDX Script assignments, calculated members on non-measures dimensions, all of the things that PowerPivot and DAX can’t do (and probably won’t ever do) are things that I use regularly and in my opinion are essential for many complex, enterprise BI implementations.
  • I don’t want the dev team to abandon corporate SSAS, and neither do I want the dev team to waste time re-implementing things in PowerPivot that we already have in corporate SSAS. Already people are asking when they can have security and partitioning in PowerPivot. I want new stuff though!

So, like all users I want absolutely everything possible. How could it be done? Here’s my basic idea: let us be able to build regular SSAS cubes using PowerPivot models as data sources, with SSAS working in something similar to ROLAP mode so every request for data from the cube is translated into an MDX (or SQL – remember SSAS, and presumably PowerPivot, supports a basic version of SQL) query against the PowerPivot model.

In more detail, let’s imagine we have an instance of SSAS running in Vertipaq mode and an instance of SSAS running in normal mode. You’d be able to do the following:

  • Fire up BIDS and create a new SSAS project.
  • Create a data source, which was a PowerPivot database on your Vertipaq instance of SSAS.
  • Create a new Data Source View, which showed all of the tables in your PowerPivot database already joined. Alternatively, here I can imagine connecting to other data sources like SQL Server, creating a Data Source View as normal and then taking the DSV and deploying it as a PowerPivot model onto the Vertipaq instance of SSAS. So in effect, the DSV designer becomes a development environment for PowerPivot models.
  • Create a regular SSAS cube in the usual way, only using the PowerPivot tables in the DSV.
  • Set the storage mode of your dimensions and partitions to the new ROLAP-like storage mode; each SSAS partition could then be based on a separate PowerPivot table. This would mean that when you queried the cube, the SSAS instance issued MDX or SQL queries against the Vertipaq instance of SSAS, just as it issues SQL queries in ROLAP mode today. I suppose though there would be an overhead to making an out-of-process call, so maybe it would be better if you only had one instance of SSAS that could host both Vertipaq and regular SSAS databases at the same time, so all these requests could stay in-process.

The first, obvious, point here is that with this approach we get the traditional, rich SSAS cubes that we know and love and the raw speed of Vertipaq. So one objective is achieved. But I think there would be a lot of other benefits:

  • You’d get two cubes for the price of one: the PowerPivot cube and the SSAS cube. You could choose which one to query depending on your needs.
  • The ability to turn DSVs into PowerPivot models also gives you a proper development environment for creating PowerPivot models, integrated with BIDS and Visual Studio (so you also get source control). The current Excel-based UI is all very well, but us developer types want a nice visual way of creating relationships between tables.
  • You’re able to use all of the new data sources that PowerPivot can work with in traditional SSAS. Imagine being able to create a planning and budgeting solution where users wrote values into an Excel Services spreadsheet, which then fed into PowerPivot via the new Excel Services REST API, which then in turn fed into a SSAS planning and budgeting cube complete with custom rollups and all the complex financial calculations you can only do in MDX.
  • If your users have already built an existing PowerPivot model that they like and want to turn into an ‘official’ BI solution, you can very easily take that model as the starting point for building your cube by importing it into a DSV.
  • It would also make it relatively easy to upgrade existing SSAS projects to use PowerPivot storage – you’d just convert your existing DSV into a PowerPivot model.
  • SSAS drillthrough would be much, much faster because you’d be drilling through to the PowerPivot model and not the underlying relational source.
  • You’d also have the possibility of working in something like HOLAP mode. Vertipaq may be fast, but with really large data volumes some pre-calculated aggregations are always going to be useful.
  • You could define calculated measures in DAX in the PowerPivot model, and then expose them as measures in the SSAS cube. Probably you’d need some special way of handling them so they didn’t get aggregated like regular measures, but in some cases you’d want to take a calculated measure and sum it up like a regular measure (kind of like SQL calculations defined in named calculations today); many more calculations, like year-to-dates, can be treated as semi-additive measures. Effectively this means you are performing some multidimensional calculations outside the Formula Engine, in the SSAS Storage Engine (which in this case is PowerPivot), in the same way I believe that measure expressions work at the moment.
  • For such additive and semi-additive calculations, it also opens up the possibility of parallelism since these calculations can be done in parallel in each partition and the result summed at the end. It also means you get the option to use either DAX or MDX, and can choose the right language for the job.
  • There’s no duplication of dev work needed. For users of PowerPivot who want features like security, partitioning or parent/child relationships, you tell them they have to upgrade to regular SSAS; PowerPivot becomes something like SSAS Express. For users of SSAS who want the speed of Vertipaq, you tell them they have to use a PowerPivot database as their data source. The two complement each other nicely, rather like twins… now where have I heard that analogy before?
  • You also have a convincing story for BI professionals who are sceptical/hostile to PowerPivot to win them over: traditional, corporate SSAS does not go away but is able to build on the new features of PowerPivot.

So there we have it, another fantasy on the future of the MS BI stack sketched out. You may be wondering why I’ve taken the time to write these two posts – after all, I don’t work for Microsoft and I’m sure plenty of people on the dev team have their own ideas on what features they want to implement for Denali. Well, as the saying goes, if you don’t ask you don’t get! And with Kilimanjaro almost out of the door now’s the time to ask. If you agree with what I’ve said here, or you disagree, or you have a better idea, please leave a comment…

PowerPivot/Excel/Sharepoint and SSRS – should they merge?

I’ve been doing a fair amount of work with SSRS over the last few days, and with PowerPivot also fresh in my mind it got me thinking about the amount of overlap between SSRS and the PowerPivot/Excel/Sharepoint stack. Of course anyone who’s had to try to sell a MS BI solution to a potential customer over the last few years will have had to deal with conversations like this:

Customer: So, what is Microsoft’s solution for building BI dashboards?
Consultant: Well there’s SSRS, or if you want to build an SSAS cube you can use PerformancePoint, or maybe Excel and Excel Services, or you can go with any of these 50 third-party tools…it depends…
Customer: I’m confused already!

But just about any large software company has a certain amount of overlap between their products, that’s just life. However, that doesn’t mean that sometimes some rationalisation of products isn’t a good idea.

Let’s take a look at some of the things you’d want to do when building a dashboard, and how you can achieve them with both stacks:

Requirement SSRS PowerPivot/ Excel/ Sharepoint Comments
Get data from a number of different sources Create data sources and then datasets to return the data you want Import data into PowerPivot ‘tables’ from Excel, RDBMSes, OData feeds There’s a slight difference between the data sources here, but the most important case is always going to be getting data from a RDBMS, which both do well.

The key difference, though, is that in general with SSRS you get data on demand through parameterised queries, whereas with PowerPivot you import all the data you’re ever likely to need up front.

Integrate that data No real solution here, though SSRS developers have wanted to be able to do joins on datasets for a while. The new R2 lookup functions partly address this. Create joins between PowerPivot tables PowerPivot has the obvious advantage here, although for SSRS you can argue that in most cases any data integration should be happening upstream in your ETL.
Perform calculations on that data Use SSRS expressions Use DAX calculations I’d say that SSRS expressions, while as not powerful as DAX, are easier for most people to understand; however there are a lot of things that only DAX can do.
Create reports from that data Use BIDS if you’re a developer, or Report Builder if you’re a power user Use Excel or any client tool that speaks MDX (including SSRS) For developers, BIDS is a great tool for creating reports. However SSRS has always struggled with Report Builder – in my experience users find it too difficult. And that’s where Excel comes into its own: it’s a powerful tool and most end-users are familiar with it.
Publish reports to a wider audience Deploy up to your SSRS server Publish to Excel Services/Sharepoint The advantage SSRS has here is that most companies have no problem with the IT department setting up an SSRS server. On the other hand, Sharepoint is a Big Deal. If your company has a Sharepoint strategy, and is planning on installing Sharepoint 2010 Enterprise Edition, you’ll be fine with PowerPivot. If not, and I guess many companies are in this position, you have a problem.
Export reports to a variety of formats SSRS handles exporting to a lot of different formats Export to Excel isn’t a problem, but other formats are a bit trickier (though doable) SSRS has the clear advantage here
Schedule report refresh Again, SSRS has a lot of options for controlling when reports are refreshed PowerPivot’s functionality for scheduling when data is refreshed is a bit v1.0 SSRS has the advantage again

Anyway, you get the idea – there’s a fair amount of overlap and some things are done better by one tool, some things are done better by the other. Isn’t it, though, a bit of Microsoft’s time, money and energy to develop two parallel BI stacks? If they could merge in some way, it would mean less effort spent developing duplicate functionality and a more coherent message for customers to hear.

How could this be done, you ask? Well here are some vague ideas I had about what you’d need:

  • Inside SSRS – BIDS as well as Report Builder – in addition to the existing functionality for bring data into datasets, and possibly in the long term as a replacement for it, you get the option of building a PowerPivot model to act as the main source of data for your reports. For Report Builder especially I think this would be a winner, given that the PowerPivot UI for building models is already aimed at the same power users that Report Builder is aimed at.
  • The fact that you need to load all of your data into a PowerPivot model upfront is both an advantage and a disadvantage, depending on your scenario. When you know the data’s not going to change much, or you’ve got relatively small amounts of data, it’s good because you get joins and fast query performance. But if the data changes a lot or you don’t want the overhead of loading it into PowerPivot then you’d need the option to pass requests straight through PowerPivot back to your sources – so maybe PowerPivot would need something like ROLAP mode, or proactive caching, or the ability to make its tables work like existing SSRS datasets and send parameters to them.
  • Include proper support for MDX queries in SSRS reports (my old hobby horse). This would involve developing a proper, fully-functional MDX query builder (not the rubbish one SSRS has right now – a standard MDX query generator across all MS products which was also available as a control for custom development would be ideal) and the ability to bind the results of an MDX query direct to a tablix (and no messing around with mapping field names to rowgroupthingies in the tablix control please). If power users didn’t have to worry about tablixes and could just build their queries as easily as they could in an Excel pivot table, Report Builder would be a much more popular tool. I think many developers would appreciate it too. Once all the data you need for your report is in a PowerPivot model, and you have full MDX support in SSRS, the business of report design is much easier. You also no longer need to join datasets because the data is already joined in PowerPivot, you have a powerful calculation language in DAX, and query performance is extremely fast. Oh, and with this functionality in place you could probably kill off PerformancePoint too and no-one would complain…
  • Blur the line between Excel and SSRS. There’s been talk about being able to author SSRS reports in Excel for a long time (whatever happened to the Softartisans technology MS licensed?), but nothing’s ever come of it. Why not also have the option within SSRS to take a range from Excel Services and make that the body of your report? So your report is essentially still a fragment of an Excel worksheet, but it’s just surfaced via SSRS which then handles the refreshing and rendering to different formats.
  • You’d also need SSRS to be able to schedule the refresh of your PowerPivot model, but that should be very doable; it would be great if it could refresh different parts of the model at different times. SSRS would also maintain control over report security, rendering, folders etc etc.

The end result would be that this PowerPivot/Excel/SSRS hybrid would give you the best of both worlds. I also have some ideas about how PowerPivot and SSAS should be integrated which I might get round to blogging about soon too, that would fit nicely with this vision of the future.

What are the chances of all this happening? Practically zero. It would involve the SSRS team, the SSAS team and the Excel team setting aside their differences, co-operating, and possibly sacrificing large chunks of different products. But it’s a nice thought to kick around…

SSRS and SSAS-Sourced Parameter Dataset Performance Issues

Ahhh, Reporting Services and Analysis Services integration – a never ending source of pain and frustration! Although I shouldn’t complain because it’s been quite lucrative for me in terms of consultancy work…

Anyway, recently I was tuning a SSRS report based on a SSAS cube for a customer. Looking at Profiler I could see a number of slow queries being executed, which was strange given that it was a fairly simple report. It turned out that during the design phase this report had had a number of parameters created in the SSAS query designer that had later been deleted; however, when you delete a parameter in the SSAS query designer BIDS does not delete the hidden dataset that it is bound to. What’s worse is that when an SSRS report is executed all dataset queries are also executed, even if the datasets aren’t used anywhere in the report, which means you get the overhead of running extra queries. It’s an easy mistake to make and in this case the execution of unused datasets was adding several seconds to the execution time of the report.

You can reproduce this problem very easily by creating a simple report based on the Adventure Works cube. Once you’ve created the data source, open the query designer, drag the Internet Sales Amount measure onto the grid, drag the Customer hierarchy from the Customer dimension onto the filter pane and check the Parameters box:

image

Now close the query designer and reopen it, then remove the Customer hierarchy from the filter pane, close the query designer again and delete the report parameter. When you Preview the report you’ll see the following in Profiler:

image

The highlighted row shows the hidden dataset is being executed. What you need to do to fix this is to right-click on your data source and check the Show Hidden Datasets option:

image

You’ll then see the offending, automatically-generated, hidden dataset and you can delete it:

image

Luckily, BIDS Helper has functionality to find unused datasets in your reports for you:
http://bidshelper.codeplex.com/wikipage?title=Dataset%20Usage%20Reports&referringTitle=Home

And there’s more! What I found really interesting about this parameter dataset query was how long it was taking to execute. In this example 2.5 seconds, even on a warm cache, seems like a very long time to me even though there are a lot of members on the Customer hierarchy. Once the report is deployed that goes down to a consistent 2.1 seconds, and when I run the same query through SQL Management Studio it goes down to 1.5 seconds. Why the difference in execution times? I’m not sure, but I suspect it’s a combination of the connection string properties used and the use of a flattened rowset. In any case, 1.5 seconds is still slow and it’s certainly not good if you actually do want to use a query like this in a dataset bound to a parameter.

Luckily, if our parameter datasets are causing performance problems, we can usually rewrite the queries involved to make them faster. Here’s the original query from the parameter in the example:

WITH
MEMBER [Measures].[ParameterCaption] AS
[Customer].[Customer].CURRENTMEMBER.MEMBER_CAPTION
MEMBER [Measures].[ParameterValue] AS
[Customer].[Customer].CURRENTMEMBER.UNIQUENAME
MEMBER [Measures].[ParameterLevel] AS
[Customer].[Customer].CURRENTMEMBER.LEVEL.ORDINAL
SELECT
{[Measures].[ParameterCaption],
[Measures].[ParameterValue],
[Measures].[ParameterLevel]}
ON COLUMNS ,
[Customer].[Customer].ALLMEMBERS
ON ROWS
FROM [Adventure Works]

If we decide that we can make do without the All Member and the level-based indenting that goes on in the parameter dataset (this is an attribute hierarchy, after all, so there’s just one level), we can use the following query in the dataset instead:

WITH MEMBER MEASURES.DUMMY AS NULL
SELECT
{MEASURES.DUMMY}
ON COLUMNS ,
[Customer].[Customer].[Customer].MEMBERS
DIMENSION PROPERTIES MEMBER_CAPTION, UNIQUE_NAME
ON ROWS
FROM [Adventure Works]

Once the first query above has been replaced with the second, and the report parameter has been hooked up to use the new fields, Profiler shows that the time taken to execute the parameter dataset has gone down to around 0.7 seconds:

image

That is, of course, almost 2 seconds faster than the original query in Preview mode and almost 1.5 seconds faster than the original query in the deployed report. Not a lot on its own but certainly noticeable, and if you have more than one large parameter the cumulative gain could be quite significant. If you create a separate OLEDB connection to the cube and use the second query in a dataset, the execution time is even faster, going down to around 0.45 seconds:

image

Incidentally, some of the more astute may be asking why I need to include MEASURES.DUMMY in the query above when I can use an empty set on the columns axis instead. Two reasons: one, if you use an empty set on columns in the OLEDB connection you get no rows returned; two, I noticed when the query was being executed in SSRS a Query Subcube event was raised suggesting measure data was being requested from the cube – this didn’t happen when I ran the query in SQL Management Studio. I suspect both problems are something to with SSRS using a flattened rowset, so I’ll investigate and post back here when I get an answer.

NodeXL, Network Graphs and the Eurovision Song Contest

Via the Perceptual Edge blog, earlier this week I came across a cool new open-source Excel addin, developed by Microsoft Research and various people at universities around the world, called NodeXL. You can download it from Codeplex here:
http://nodexl.codeplex.com/
Marc Smith’s blog also has a lot of information on it here:
http://www.connectedaction.net/

Basically it’s a tool for displaying and analysing network graphs. That sounds a lot more complicated than it actually is – really all it means is that you can use it for analysing the relationships between things. For example, if you had a list of people who were registered on a social networking site like Facebook, you could use NodeXL to display which people were friends with with other people; in a business setting you could use it to display which products were bought together in the same ‘basket’. Although it’s somewhat buggy and crashed on me a few times, it’s a lot of fun to use and I can see that there are a lot of potential uses in the BI space, especially now that Excel is being pushed as Microsoft’s BI client tool of choice.

So anyway, after I downloaded it I had a think about what data I could with it. There are built-in options to import data from Outlook (to analyse which email contacts appear together on the To and CC lines of emails) and social networking sites like Twitter, Flickr and YouTube. But I wanted something a bit more fun – and then it suddenly occurred to me, why not use it to analyse voting patterns in the Eurovision Song Contest? Apologies to readers outside Europe who don’t know what this is: basically it’s an annual competition where each country in Europe enters the worst pop song they can possibly come up with, and the winner is chosen by a vote; what everyone knows, of course, is that countries vote for the other countries they are most friendly to rather than on the basis of the songs themselves, for example with Greece and Cyprus always voting for each other (see here and here for some examples of detailed analyses of voting patterns).

I downloaded the raw data for voting in the 2009 competition from here, and with a bit of hacking got it into the template that’s bundled with NodeXL. Here’s an example of the output, using the impressively-named Fruchterman-Reingold layout, with the UK highlighted to show who voted for the UK and who the UK voted for:

image 

I’ll admit this particular graph is a bit busy, but in NodeXL itself you can zoom in and out, filter and analyse the relationships very easily; even here, though, we can see here things like the fact that the UK voted for Switzerland but Switzerland didn’t vote for the UK. Here’s a circular layout with the winners, Norway, highlighted:

image

Last of all, here’s the data filtered to show only Greece, with the points that Greece awarded to each other country shown on the vertices:

image

There are a lot of different options and this is a very complex product, so the fact it’s free is particularly amazing. It’s this kind of thing that makes Excel such a rich platform for data analysis – imagine using this with PowerPivot or the data mining addin, or other open-source tools like Sparklines for Excel.

Subselects and Calculated Members in R2

As Darren noted the other week, there was a recent thread on the MSDN Forum that detailed the few enhancements for traditional SSAS users in SQL 2008 R2. I thought I’d pick up on the one MDX-related change, which is to do with calculated members and subselects. From a pure language point of view you’d be right in thinking that this doesn’t sound all that exciting, but it does fix one long-running issue that has caused a lot of people a lot of pain over the last couple of years – namely the way that Excel can’t handle calculated members on non-measures dimension. For a bit of background, see:
http://sqlblog.com/blogs/marco_russo/archive/2007/01/31/excel-2007-pivottable-with-calculated-members.aspx
http://sqlblog.com/blogs/marco_russo/archive/2007/03/07/ssas-2005-sp2-breaks-excel-calculated-member-selection.aspx
http://sqlblog.com/blogs/marco_russo/archive/2008/12/08/ssas-2008-calculated-members-still-don-t-love-excel-2007.aspx
http://blogs.msdn.com/excel/archive/2008/02/05/common-questions-around-excel-2007-OLAP-PivotTables.aspx

If you’re using Excel 2010 with Analysis Services 2008 R2 you’ll now be able to filter on individual calculated members again. Frankly, it’s a disgrace that it’s taken this long to fix and that you have to use Office 2010 and R2 before it works (especially when every other client tool worth its salt does not have this problem), but I suppose we should be grateful that it at least has been fixed.

Anyway, let’s take a look at how this change has actually been implemented. First of all, you only get the new behaviour when you use the Subqueries connection string property. It’s been possible to set Subqueries=1, which allows the use of calculated members in subselects, since SSAS 2008 (subqueries=0, which is the default, does not allow this) but I’m told this didn’t solve all of the Excel team’s problems; therefore the new setting Subqueries=2 was introduced in R2.

Now let’s add a calculated member to the Adventure Works cube as follows:

CREATE MEMBER CURRENTCUBE.[Date].[Calendar].[Calendar Year].&[2004].CALC1
AS 111;

As you can see, it’s on the Calendar hierarchy of the Date dimension, on the Calendar Semester level underneath the year 2004.

If we run the following query with no special connection string properties set:

SELECT {[Measures].[Internet Sales Amount]} ON 0,
[Date].[Calendar].[Calendar Semester].ALLMEMBERS
ON 1
FROM
(SELECT
[Date].[Calendar].[Calendar Year].&[2004].[CALC1]
ON 0
FROM [Adventure Works])

We get the error “A set has been encountered that cannot contain calculated members”. However, as you would expect, when you set Subqueries=1 or Subqueries=2 the query runs successfully and you get the following results:

image

Now, if we change the query to ask for all the members at the Year level as follows:

SELECT {[Measures].[Internet Sales Amount]} ON 0,
[Date].[Calendar].[Calendar Year].MEMBERS
ON 1
FROM
(SELECT
[Date].[Calendar].[Calendar Year].&[2004].[CALC1]
ON 0
FROM [Adventure Works])

When we use Subqueries=1 we get an empty set returned on rows:

image

When we use Subqueries=2 we get the year 2004, the parent of the calculated member, and a null for the measure value:

image

Why the null? Although there is data for 2004 in the cube, in our subselect we’ve only included a calculated member child of 2004, and calculated members’ values don’t aggregate up to their parents.

For the following query, where the year is in the subselect and the semesters are on rows:

SELECT {[Measures].[Internet Sales Amount]} ON 0,
[Date].[Calendar].[Calendar Semester].ALLMEMBERS
ON 1
FROM
(SELECT
[Date].[Calendar].[Calendar Year].&[2004]
ON 0
FROM [Adventure Works])

For both Subqueries=1 and Subqueries=2 you get the following result:

image

Interestingly, if you include a Year and Semester in the subselect as follows:

SELECT {[Measures].[Internet Sales Amount]} ON 0,
[Date].[Calendar].[Calendar Semester].ALLMEMBERS
ON 1
FROM
(SELECT
{[Date].[Calendar].[Calendar Year].&[2004],
[Date].[Calendar].[Calendar Semester].&[2004]&[2]}
ON 0
FROM [Adventure Works])

For Subqueries=1 you get this:

image

And for Subqueries=2 you get this:

image

I won’t go on (though there are more examples I could give) because I’m sure no-one outside the Excel team will ever care about any of this, but it’s interesting to note nonetheless and I doubt it will ever get properly documented anywhere. I’ve also been told there are some changes to how the DrillDownX family of functions work with regard to calculated members in R2, but I’ll save that for another post.

Tableau Public

In a clever marketing move, Tableau have just released a new free version of their tool called Tableau Public. I’ve been a fan of Tableau since I saw it a while back (was it really 2005?) but never seen it actually used at one of my customers, I suppose because of its hefty price. Anyway, what Tableau have done is created a basic version which can only connect to Access, Excel and Text files and can only save to Tableau’s own web gallery; however, it’s then possible to embed the visualisations you create in your own website/blog etc for sharing with the world. The stated aim is for this site to be the YouTube of data visualisation; coming from Tableau this is a half-credible claim because they really do understand this area and have some great tools. Also, I’m sure the extra publicity it will generate for the company will do no harm for sales of the paid version either.

And I’m sure it will be great when they get over their teething difficulties, because at the time of writing I can’t actually save anything… maybe they underestimated the amount of interest this would generate?

OK, it’s working now. But of course Windows Live Spaces doesn’t allow me to embed a ‘viz’ in a blog post (grr); instead, here’s a simple example I just uploaded using stats on the last 30 days of traffic on the front page of this blog:
http://public.tableausoftware.com/views/Blogstats/Sheet1

Here’s what it looks like in the client:

image

So Tuesday is the best day for page loads, but Thursday is slightly better for unique visitors… and so on. But it’s a cool tool and definitely worth checking out.

SQLBits VI Dates Announced

It’s SQLBits time yet again! After a long absence we’re back in the South East (of England, that is) and we’ll be going to central London for the first time: Church House Conference Centre. It’s also going to be a one day event, but still free, and it’s going to be on Friday April 16th rather than a Saturday. As always, check http://www.sqlbits.com/ for more details in the coming weeks…

OData and Microsoft BI

I first came across OData last year when Jamie Thomson blogged about it; since then I hadn’t really thought about it much until I came across Douglas Purdy’s blog and specifically his post from yesterday which really brought home how important it is to the future of Microsoft BI. I would urge you to watch the video “OData: The Movie” that he mentions in his post because it gives a really good introduction to the way that OData can be used in BI scenarios; if you don’t have the time, all you really need to know is that it’s OData that makes it possible for PowerPivot to consume data from SSRS reports and Sharepoint lists.

Just watching this video made my mind boggle with the possibilities of OData, although since I’m a long way from being an expert in this area I won’t bother to detail all of these fantasies as they’re probably rubbish. However, just consider how much easier life would be for the PowerPivot user of the future if the internet was full of sites that supported OData; certainly, when I’ve looked at the new UK government websites such as the recently-launched http://data.gov.uk/ and http://data.london.gov.uk/, or even sites like the Guardian Data Store that I blogged about last year, I’ve felt that the lack of a standard format to consume this data (a badly-formatted Google spreadsheet is not ideal) has seriously limited these sites’ usefulness. And what if Bing came up with a service like the sadly-useless Google Squared, where you could search for data on the web and return it in a structured, OData format?

I’d also love to see Analysis Services support OData too, in some shape or form. Perhaps it could be used to solve some of the same problems with XMLA that Julian Hyde lists in his recent post on xmla4js. Wouldn’t it also be cool if there was a standard interface for publishing an MDX query up to SSAS from any client tool, and then be able to consume it via OData (similar to what I suggested here)? You’d then enable scenarios like this: power user creates query using an advanced SSAS client tool, then publishes it up to SSAS whereupon it becomes something similar to a SQL stored proc with the results available as an OData feed, and therefore can be consumed either in your own code (eg in a website), by PowerPivot users, or by something like SSRS (which would then know nothing about the query used but would just be used to format the resultset).