SDS: the new relational features announced

After all the rumours, here’s the official announcement of the new relational features that are coming to SQL Data Services:
http://blogs.msdn.com/ssds/archive/2009/03/10/9469228.aspx

Given that the team have already made noises about adding BI features to SDS soon, I can’t wait to see what form they’ll take. Of course there are already lots of ways of doing BI with data stored online as my last blog entry showed; there are also couple of startups like Birst and GoodData who do very sophisticated BI things in the cloud already. But I hope Microsoft has something up its sleeve, and that I can run an MDX query against it…

Guardian Data Store – free data, and some ideas on how to play with it

I was reading the Guardian (a UK newspaper) online today and saw that they have just launched something called Open Platform, basically a set of tools that allow you to access and build applications on top of their data and content. The thing that really caught my eye was the Data Store, which makes available all of the numeric data they would usually publish in tables and graphs in the paper in Google Spreadsheet format. Being a data guy I find free, interesting data irresistible: I work with data all day long, and building systems to help other people analyse data is what I do for a living, but usually I’m not that interested in analysing the data I work with myself because it’s just a company’s sales figures or something equally dull. However give me information on the best-selling singles of 2008 or crime stats for example, I start thinking of the fun stuff I could do with it. If you saw Donald Farmer’s fascinating presentation at PASS 2008 where he used data mining to analyse the Titantic passenger list to see if he could work out the rules governing who survived and who didn’t, you’ll know what I mean.

Given that all the data’s in Google Spreadsheets anyway, the first thing I thought of doing was using Panorama’s free pivot table gadget to analyse the data OLAP-style (incidentally, if you saw it when it first came out and thought it was a bit slow, like I did, take another look – it’s got a lot better in the last few months). Using the data I mentioned above on best-selling singles, here’s what I did to get the gadget working:

  1. Opened the link to the spreadsheet: http://spreadsheets.google.com/pub?key=phNtm3LmDZEP4i_r7RdVhUg
  2. Followed the link at the very bottom of the page to edit the page.
  3. On the new window, clicked File/Create a Copy on the menu to open yet another window, this time with a version of the data that can be edited (the previous window contained only read-only data)
  4. Right-clicked on column J and selected Insert 1 Right, to create a new column on the right-hand side.
  5. Added a column header, typed Count in the header row, and then filled the entire column with the value 1 by typing 1 into the first row and then dragging it down. I needed this column to create a new measure for the pivot table.
  6. Edited the ‘Artist(s)’ column to be named ‘Artist’ because apparently Panorama doesn’t like brackets
  7. Selected the whole data set (the range I used was Sheet1!B2:K102) and then went to Insert/Gadget and chose Analytics for Google Spreadsheets. It took me a moment to work out I had to scroll to the top of the sheet to see the Panorama dialog that appeared.
  8. Clicked Apply and Close, waited a few seconds while the cube was built, ignored the tutorial that started, spent a few minutes learning how to use the tool the hard way having ignored the tutorial, and bingo! I had my pivot table open. Here’s a screenshot showing the count of singles broken down by gender and country of origin.

image 

Of course, this isn’t the only way you can analyse data in Google spreadsheets. Sisense Prism, which I reviewed here a few months ago, has a free version which can connect to Google spreadsheets and work with limited amounts of data. I still have it installed on my laptop, so I had a go connecting – it was pretty easy so I won’t go through the steps, although I didn’t work out how to get it to recognise the column headers as column headers and that polluted the data a bit. Here’s a screenshot of a dashboard I put together very quickly:

image

Lastly, having mentioned Donald Farmer’s Titanic demo I thought it would be good to do some data mining. The easiest way for me was obviously to use the Microsoft Excel data mining addin: there are two flavours of this: the version (available here) that needs to be able to connect to an instance of Analysis Services, and the version that can connect to an instance of Analysis Services in the cloud (available here; Jamie MacLennan and Brent Ozar’s blog entries on this are worth reading, and there’s even a limited web-based interface for it too). Here’s what I did:

  1. Installed the data mining addin, obviously
  2. In the copy of the spreadsheet, I clicked File/Export/.xls to export to Excel, then clicked Open
  3. In Excel, selected the data and on the Home tab on the ribbon clicked the Format as a Table button
  4. The Table Tools tab having appeared on the ribbon automatically, I then pressed the Analyze Key Influencers button
  5. In the dialog that appeared, I chose Genre from the dropdown to try to work out which of the other columns influenced the genre of the music
  6. Clicked I Agree and Do Not Remind Me Again on the Connecting to the Internet dialog
  7. Added a report comparing Pop to Rock

Here’s what I got out:

image

From this we can see very clearly that if you’re from the UK or under 25 you’re much more likely to be producing Pop, Groups are more likely to produce Rock, and various other interesting facts.

So, lots of fun certainly (at least for a data geek like me), but everything I’ve shown here is intended as a serious business tool. It’s not hard to imagine that, in a few years time when more and more data is available online through spreadsheets or cloud-based databases, we’ll be doing exactly what I’ve demonstrated here with that boring business data you and I have to deal with in our day jobs.

Analysis Services and the System File Cache

Earlier this week Greg Galloway sent me an email about some new code he’d added to the Analysis Services Stored Procedure Project to clear the Windows system file cache:
http://asstoredprocedures.codeplex.com/Wiki/View.aspx?title=FileSystemCache

I thought this was quite interesting: several times when I’ve been doing performance tuning I’ve noticed that the same query, running on a cold Analysis Services cache, runs much slower when the cube has just been processed. This I put down to some form of caching happening at the OS level or below that; I also vaguely knew that it was a good idea to limit the system file cache, having seen the following on Jesse Orosz’s blog:
http://jesseorosz.spaces.live.com/blog/cns!E322FD91218E57CF!295.entry

Anyway, doing some more research on this subject I came across the following blog entry that discusses the problem of excessive caching in more detail:
http://blogs.msdn.com/ntdebugging/archive/2009/02/06/microsoft-windows-dynamic-cache-service.aspx
…and announces a new tool called the Microsoft Windows Dynamic Cache Service that aims to provide a better way of managing the system file cache:
http://www.microsoft.com/downloads/details.aspx?FamilyID=e24ade0a-5efe-43c8-b9c3-5d0ecb2f39af&displaylang=en

Has anyone got any experience with this? From what I can see, installing the Dynamic Cache Service on a 64-bit SSAS box with a big cube on looks like a good idea – has anyone tried it? If you have, or are willing to, can you let me know how you get on? Comments are welcome…

Madison and SSAS?

On Monday Microsoft announced "SQL Server Fast Track", a set of reference architectures for data warehousing with SQL Server. This has all been blogged very well by others, so if you’re interested in finding out more I suggest you read Peter Koller:
http://peterkol.spaces.live.com/Blog/cns!68755AEAC31F9A6C!1022.entry
and Curt Monash:
http://www.dbms2.com/2009/02/23/microsoft-sql-server-fast-track/

A couple of accompanying white papers have also been released, though, and I was reading this one:
http://msdn.microsoft.com/en-us/library/dd458815.aspx
When I noticed the following statement:
Project code name "Madison" is the upcoming Microsoft scale-out solution for very large data warehouses (VLDW). Madison is based on the MPP technology developed by DATAllegro and the proven SQL Server 2008 database. Madison expands the DATAllegro hub-and-spoke solution to include not only MPP appliances but also standard symmetric multi-processing (SMP) instances of SQL Server 2008 and SQL Server Analysis Services, (SSAS), allowing either to be viewed as nodes within a grid.

and also:
With the upcoming release of Madison, MPP scalability and grid connectivity can be taken to a new level. Madison expands the DATAllegro hub-and-spoke solution to include not only MPP appliances but also standard SMP instances of SQL Server 2008 and SSAS to be viewed as nodes within a grid. A grid of SMP databases and MPP appliances can be used as the basis for any large-scale data warehouse environment or architecture. However, it is particularly suitable for a hub-and-spoke architecture.

So Analysis Services is clearly going to be supported as part of Madison somehow, as a ‘node within a grid’. What does this mean exactly? I’m not really sure. The focus of the paper is the ‘Hub and Spoke’ architecture and how Madison will enable this through its ability to transfer large amounts of data quickly via ‘high speed, parallel transfers’ over its grid. The following DATAllegro white paper offers some more detail on this:
http://www.datallegro.com/grid/faq_datallegro_grid.pdf

Maybe I’m reading too much into the specific references to SSAS above, but it does seem like something is afoot with Madison and SSAS even if it is just that we’ll get a quick way of moving SSAS databases around. I suppose we’ll find out soon enough…

SQL Server Conference in Italy

I’m doing some work with Marco Russo and Alberto Ferrari at the moment, and for the benefit of any Italian (or Italian-speaking) readers of this blog I thought I’d mention that they are involved in organising a SQL Server conference near Milan. You can find out more and register here:
http://www.sqlconference.it/

It looks like they’ve got a lot of good BI/Analysis Services content…

Implementing Real Analysis Services DrillDown in a Reporting Services Report

Sean Boon recently blogged about an approach to implement drilldown on charts with Reporting Services when Analysis Services is used as the data source, and it got me thinking about ways to implement drilldown in Reporting Services in general. There are two standard methods used to do this that are widely known about:

  • The first can be described as "fetch all the data you’re ever going to need to display and then hide the stuff that hasn’t been drilled down on yet" – this article describes it well, albeit for SQL data sources. It’s easy to implement but has has big problem: if the amount of data your report could ever possibly display is massive then the report will be very slow to run, for example if your dataset query returns millions of rows.
  • The second is more scalable, in that you have multiple reports for each level of granularity you want to display and when you drill down or drill up you click on a member in the Report and pass it as a parameter to another report. This also works well but has a different problem: you now have multiple copies of what is essentially the same report to maintain and keep in synch. This approach can also only display one level of granularity at a time, and sometimes it’s nice to be able to see multiple granularities in the same report.

Wouldn’t it be good to have drilldown capabilities in Reporting Services just like you have in any other Analysis Services client? That’s to say, you’d see a list of members on rows in your report, you’d click on one and then see all its children, then click again and its children would disappear? Well, it is possible and it’s a problem I’ve tackled numerous times myself. The last time was when I was writing the samples for Intelligencia Query, but I’ve just come up with an even better approach which I thought I’d blog about. I’ve implemented it for the standard Analysis Services data source although I’ll be honest it took me a few goes to get it to work properly (there would have been much fewer hacky workarounds if I’d been using Intelligencia Query!) and I’m not sure it’s 100% robust; hopefully someone will find this useful though.

What I’ve done is basically a variation on the second approach above, but instead of using multiple reports I’ve created a single report which calls itself when you drill down on a member. The really tricky part is how you manage the set of members you’ve drilled up and down on, and this is where I’d struggled in the past – the solution I’ve got here uses a hidden parameter to manage that set, which is then passed to the main dataset and used with the DrillDownMember function.

Here are the steps to get it working:

  1. Create a new Reporting Services report with a data source pointing to Adventure Works.
  2. Create three new report parameters in this order:
    1. MemberClicked – tick "Allow Blank Values" and set the default value to [Customer].[Customer Geography].[All Customers]. This parameter will hold the unique name of the member the user clicked on to drill down.
    2. PreviousDrillDowns – again tick "Allow Blank Values" and set the default value to [Customer].[Customer Geography].[All Customers], and tick "Allow Multiple Values". This parameter will hold the list of members the user drilled down on before the last drill down.
    3. DrillDowns – again tick "Allow Blank Values" and tick "Allow Multiple Values". This parameter will hold the complete list of members drilled down on for the current report.
  3. Create a new Dataset in the report called DrillDowns. Use the following MDX for the query:

    WITH
    MEMBER MEASURES.CUSTUNAME AS
    [Customer].[Customer Geography].CURRENTMEMBER.UNIQUENAME
    SET DRILLDOWNS AS
    UNION({[Customer].[Customer Geography].[All Customers]},
    IIF(
    //CLICKED MEMBER HAS NO CHILDREN, SO IGNORE IT
    ISLEAF(STRTOMEMBER(@MemberClicked)), STRTOSET(@PreviousDrillDowns),
    IIF(
    COUNT(INTERSECT(
    STRTOSET(@PreviousDrillDowns),STRTOSET(@MemberClicked)
    ))=0,
    //DRILL DOWN
    UNION(STRTOSET(@PreviousDrillDowns),STRTOSET(@MemberClicked)),
    //DRILL UP
    EXCEPT(STRTOSET(@PreviousDrillDowns),STRTOSET(@MemberClicked))))
    )
    SELECT {MEASURES.CUSTUNAME} ON 0,
    DRILLDOWNS ON 1
    FROM [Adventure Works]

    What this does is take the set of previously drilled down members, and if the member we’ve just drilled down on is not in there return the set of all previously drilled down members plus the new member (for drilling down); if it is present, return the set of all previously drilled down members except the new member (for drilling up). If the member we’ve clicked on is a leaf member, we can ignore the click and just return the set of all previously drilled down members.

    You’ll need to hook up the two parameters @PreviousDrillDowns and @MemberClicked to the report parameters you’ve previously declared. To do this, first of all in the query designer declare the parameters but just fill in the names and a default, such as [Customer].[Customer Geography].[All Customers] (see here, towards the end, for more detailed steps). Then exit the query designer but stay in the Dataset Properties dialog and create two dataset parameters with the names PreviousDrillDowns and MemberClicked and hook them up to the appropriate report parameters.

  4. Go to the report parameter called DrillDowns and set the default value to be the CUSTUNAME field from the dataset you’ve just created.
  5. Create a second dataset called DisplayQuery with the following MDX:

    WITH
    MEMBER MEASURES.CUSTNAME AS
    Space([Customer].[Customer Geography].CURRENTMEMBER.LEVEL.ORDINAL) +
    [Customer].[Customer Geography].CURRENTMEMBER.NAME
    MEMBER MEASURES.CUSTUNAME AS
    [Customer].[Customer Geography].CURRENTMEMBER.UNIQUENAME
    SELECT {MEASURES.CUSTNAME, MEASURES.CUSTUNAME, [Measures].[Internet Sales Amount] } ON COLUMNS,
    DRILLDOWNMEMBER({[Customer].[Customer Geography].[All Customers]},
    StrToSet(@DrillDowns, CONSTRAINED), RECURSIVE)
    DIMENSION PROPERTIES MEMBER_CAPTION, MEMBER_UNIQUE_NAME ON ROWS
    FROM [Adventure Works]

    This query simply displays the measure Internet Sales Amount on columns, and on rows uses the DrillDownMember function to drilldown on the All Member on Customer Geography plus any other visible members that are present in the set returned from the DrillDowns parameter.

    Once again you’ll have to hook up the @DrillDowns parameter to the DrillDowns report parameter.

  6. Now, in the report, create a table and bind it to the DisplayQuery dataset. Only use the CUSTNAME field to display the members for the Customer Geography hierarchy on rows – this means you have a single field that can contain members from all levels of the hierarchy.
    image
  7. Finally, open the Textbox Properties dialog for the cell bound to the CUSTNAME field and set up an Action to jump to the report we’re currently building. We also need to pass two parameters: one that sends the value of the CUSTUNAME field (note this is the unique name of the member clicked on, not the CUSTNAME field which is just the member name) to the MemberClicked parameter, and one that send the value of the DrillDowns parameter to the PreviousDrillDowns parameter. It’s not actually obvious how to pass the values of a multi-value parameter through an Action, but I found the answer here; the expression you’ll need to use for this report is:
    =Split(Join(Parameters!DrillDowns.Value, ","),",")

Here’s what you’d expect to see when you first run the report:

image

Click on Australia and then South Australia and you get this:

image

Click on Australia again and you’d go back to what’s shown in the first screenshot.

I realise these steps are pretty complex, so I’ve created a sample report in SSRS2008 format and uploaded it to my SkyDrive here:
http://cid-7b84b0f2c239489a.skydrive.live.com/embedrowdetail.aspx/Public/RealDrilldown.zip

I dream of the day when SSRS will do all this stuff for me automatically…

UPDATE: you can now view the sample report online (minus the indenting for members on different levels, for some reason) here –
http://reportsurfer.com/CustomContentRetrieve.aspx?ID=170467

SQLBits IV Registration Open!

Registration for SQLBits IV (the UK’s – and perhaps the world’s – largest free SQL Server tech conference), which will be taking place on March 28th in Manchester is now open:
http://www.sqlbits.com/

We’ve got four tracks of top-notch presentations including some very strong BI sessions. I’ll be speaking, and among other speakers we’ve got SSIS-superstar Jamie ‘twoblogs’ Thomson for the first time. You can see the agenda here:
http://www.sqlbits.com/information/NewAgenda.aspx

I’m also doing a pre-conference seminar "Introduction to MDX":
http://www.sqlbits.com/Information/TrainingDay.aspx?seminar=Introduction%20to%20MDX
It’s basically day one of the MDX training course that I’ve run successfully as a private course for the last few years (see http://www.crossjoin.co.uk/training.html). We’ll be covering the basics of MDX – sets, tuples, members, popular functions, right up to building the most common types of calculated member; we won’t be covering any advanced stuff like MDX Script assignments or performance tuning. So if you’ve always meant to learn MDX but been thoroughly confused by it, come along!

Songsmith and Data Audiolization for BI

Data audiolization is clearly a real subject that someone, somewhere is researching… and after the fad for data visualisation, why shouldn’t we be thinking about how to represent data with sound? Anyway, I’ll cut to the chase. This video has been doing the rounds on Facebook, it made me laugh and if I didn’t have a hundred better things to be doing I’d be downloading a copy of Microsoft Songsmith right now and working out how to hook it up to Analysis Services:
http://www.youtube.com/watch?v=2-BZfFakpzc

 

Adventure Works the musical, anyone?

More on Oracle 11g and MDX

Following on from reports last year that Simba Technologies had built a 2005-flavour OLEDB for OLAP provider for Oracle’s OLAP option, here are some more details:
http://blogs.simba.com/simba_technologies_ceo_co/2009/02/mdx-query-language-support-for-oracle-olap-11g.html
and here’s a slide deck on it:
http://www.simba.com/docs/Oracle-OLAP-MDX-Driver.pdf

One other interesting point made on their slides is that they’re planning to do the same thing for Cognos and SAP/Business Objects too.

PASS European Conference 2009 and Analysis Services Monitoring

Last November, at the PASS Summit in Seattle, I presented a session on building a monitoring solution for Analysis Services, Integration Services and Reporting Services which seemed to go down pretty well. I was lucky in that the SQLCat team presented a very similar session, although just covering Analysis Services, the next day – so at least I got to present first! Anyway, I see that they’ve just got round to publishing their material on this subject here:
http://sqlcat.com/toolbox/archive/2009/02/05/a-solution-for-collecting-analysis-services-performance-data-from-many-sources-for-performance-analysis.aspx

Meanwhile, I’ll be working on expanding my material (which was a bit rough-and-ready) into a full day pre-conference seminar which I’ll be presenting with Allan Mitchell at the PASS European Conference, on April 22nd-24th in Neuss in Germany:
http://www.european-pass-conference.com/default.aspx

Allan, being the SSIS expert, will be covering that side of things and rewriting my packages so they’re rather more robust; that will allow me to concentrate on the SSAS/SSRS side of things, which I know better. We have a vague plan to release all of our code on Codeplex or somewhere similar; I know a lot of people are also interested in this area.

With a bit of luck I’ll also be speaking at the main conference, but I don’t think the full agenda hasn’t been decided yet. I had a good time there last year and hopefully I’ll see some of you there this year too!