Metadata? Complex Event Processing?

In part one of today’s ‘Interesting BI-related stuff I saw on the web today’ posts…

After MDM finally reared its head, it seems like Microsoft is working on some kind of metadata tool as well:
More news later this year apparently. Interesting comment:
”He did disclose that Microsoft’s still-percolating metadata management effort will encompass both its MDM and search assets.”

AND it seems like Microsoft is entering the Complex Event Processing market:
Since other CEP vendors support some kind of OLAP on top of their data (eg SQLStream/Mondrian, Aleri) I wonder if Microsoft have a story for SSAS and CEP?

UPDATE: more details on MS CEP here:

SQL2008 R2 Site Live

So the announcements are starting to flow at TechEd – for instance, Microsoft’s long-awaited master data managment solution, now called Master Data Services, will be available as part of the SQL2008 R2 (what was known as Kilimanjaro) release. More details on this and other BI-related features can be found here:

Looks like SSRS will be getting some new stuff – certainly the collaboration features brought in by the 90 Degree Software acquisition look like they’re going to be added to Report Builder. Perhaps we’ll finally see the Officewriter/SSRS functionality too?

UPDATE: one other thing, mentioned by Teo here:

Gemini will be able to source data from SSRS reports, and SSRS will be able to expose data as ‘data feeds’ (ie have a new RSS/ATOM rendering extension?).

UPDATE #2: Rob Kerr has a very good write-up and analysis of what was shown of Gemini here:—Gemini.aspx

It looks like there’s been the official announcement of a feature I’ve heard rumours about, namely that Gemini will have its own language for defining multidimensional calculations called DAX. As Rob says, it’ll be interesting to see whether it suffers the same fate as that other attempt to simplify MDX, PEL…


Guardian Data Store – free data, and some ideas on how to play with it

I was reading the Guardian (a UK newspaper) online today and saw that they have just launched something called Open Platform, basically a set of tools that allow you to access and build applications on top of their data and content. The thing that really caught my eye was the Data Store, which makes available all of the numeric data they would usually publish in tables and graphs in the paper in Google Spreadsheet format. Being a data guy I find free, interesting data irresistible: I work with data all day long, and building systems to help other people analyse data is what I do for a living, but usually I’m not that interested in analysing the data I work with myself because it’s just a company’s sales figures or something equally dull. However give me information on the best-selling singles of 2008 or crime stats for example, I start thinking of the fun stuff I could do with it. If you saw Donald Farmer’s fascinating presentation at PASS 2008 where he used data mining to analyse the Titantic passenger list to see if he could work out the rules governing who survived and who didn’t, you’ll know what I mean.

Given that all the data’s in Google Spreadsheets anyway, the first thing I thought of doing was using Panorama’s free pivot table gadget to analyse the data OLAP-style (incidentally, if you saw it when it first came out and thought it was a bit slow, like I did, take another look – it’s got a lot better in the last few months). Using the data I mentioned above on best-selling singles, here’s what I did to get the gadget working:

  1. Opened the link to the spreadsheet:
  2. Followed the link at the very bottom of the page to edit the page.
  3. On the new window, clicked File/Create a Copy on the menu to open yet another window, this time with a version of the data that can be edited (the previous window contained only read-only data)
  4. Right-clicked on column J and selected Insert 1 Right, to create a new column on the right-hand side.
  5. Added a column header, typed Count in the header row, and then filled the entire column with the value 1 by typing 1 into the first row and then dragging it down. I needed this column to create a new measure for the pivot table.
  6. Edited the ‘Artist(s)’ column to be named ‘Artist’ because apparently Panorama doesn’t like brackets
  7. Selected the whole data set (the range I used was Sheet1!B2:K102) and then went to Insert/Gadget and chose Analytics for Google Spreadsheets. It took me a moment to work out I had to scroll to the top of the sheet to see the Panorama dialog that appeared.
  8. Clicked Apply and Close, waited a few seconds while the cube was built, ignored the tutorial that started, spent a few minutes learning how to use the tool the hard way having ignored the tutorial, and bingo! I had my pivot table open. Here’s a screenshot showing the count of singles broken down by gender and country of origin.


Of course, this isn’t the only way you can analyse data in Google spreadsheets. Sisense Prism, which I reviewed here a few months ago, has a free version which can connect to Google spreadsheets and work with limited amounts of data. I still have it installed on my laptop, so I had a go connecting – it was pretty easy so I won’t go through the steps, although I didn’t work out how to get it to recognise the column headers as column headers and that polluted the data a bit. Here’s a screenshot of a dashboard I put together very quickly:


Lastly, having mentioned Donald Farmer’s Titanic demo I thought it would be good to do some data mining. The easiest way for me was obviously to use the Microsoft Excel data mining addin: there are two flavours of this: the version (available here) that needs to be able to connect to an instance of Analysis Services, and the version that can connect to an instance of Analysis Services in the cloud (available here; Jamie MacLennan and Brent Ozar’s blog entries on this are worth reading, and there’s even a limited web-based interface for it too). Here’s what I did:

  1. Installed the data mining addin, obviously
  2. In the copy of the spreadsheet, I clicked File/Export/.xls to export to Excel, then clicked Open
  3. In Excel, selected the data and on the Home tab on the ribbon clicked the Format as a Table button
  4. The Table Tools tab having appeared on the ribbon automatically, I then pressed the Analyze Key Influencers button
  5. In the dialog that appeared, I chose Genre from the dropdown to try to work out which of the other columns influenced the genre of the music
  6. Clicked I Agree and Do Not Remind Me Again on the Connecting to the Internet dialog
  7. Added a report comparing Pop to Rock

Here’s what I got out:


From this we can see very clearly that if you’re from the UK or under 25 you’re much more likely to be producing Pop, Groups are more likely to produce Rock, and various other interesting facts.

So, lots of fun certainly (at least for a data geek like me), but everything I’ve shown here is intended as a serious business tool. It’s not hard to imagine that, in a few years time when more and more data is available online through spreadsheets or cloud-based databases, we’ll be doing exactly what I’ve demonstrated here with that boring business data you and I have to deal with in our day jobs.

R and F#

One of my new year’s resolutions – or at least, something that got added to my list of stuff to do in the unlikely event I’ve got some time spare and can be bothered – was to learn more about statistics. I’ve only got a very basic grasp of the subject but, like data mining, it’s one of those things that seems to promise to be incredibly useful in my line of work. However it’s interesting to ponder that I’ve been working in BI for almost a decade and never so far needed to learn much beyond basic stats; my theory is that stats, like data mining, only tends to be used by highly skilled quantitative analysts, whereas the people I work with are business people whose maths skills are very average and who quite rightly don’t trust analysis done using methods they can’t understand.

Anyway, in my browsing on the subject I came across the all-of-a-sudden popular topic of R (see, the statistical programming language. I thought it might make an interesting blog entry, but today I saw John Brookmyre beat me too it so I’ll just link to him instead:

I also got interested in learning about F#, the functional programming language that will be included in VS2010 (for a good overview, see I was struck by some similarities with MDX and began to wonder about how it could be applied to BI; and yet again, a quick Google revealed Aaron Erickson had had the same idea and blogged extensively and intelligently on the subject:

It’ll be interesting to watch the uptake of F# in BI; from what I can see there’s already a lot of activity in the area of data manipulation and stats for F# (see for example Luca Bolognese’s blog) and I’m sure it’s only going to grow. The only complaint I’ve got is that here’s yet another addition to the Microsoft BI toolset and I’m yet to be convinced there’s any kind of company-wide strategy aimed at shaping all these tools into a coherent BI strategy. F# won’t be the language of BI in the way that Aaron wants; it’s more likely to end up as a technology island in the way Aaron specifically doesn’t want. But hey, the .NET guys have arrived at the party! The more the merrier.

Netezza launches data integration strategy for Microsoft BI

Interesting press release from Netezza here:

At the moment it only looks like there’s an OLE DB provider available, but the release says this is only the first part of the strategy. I wonder if Netezza is being considered as a supported data source for Analysis Services so it could be used with cubes in ROLAP mode, as with Teradata today?

Interesting stuff on

Just had one of my occasional looks at Richard Tkachuk’s site, and there’s some interesting new information on the home page. There’s an article on how to handle time intervals in AS (I’ve got some ideas on a different way of handling this, but I’d need to test them out to see how they perform), a note on how the Aggregate function works with calculated measures, and a draft of the AS 2008 Performance Guide that is a must-read: