Microsoft Codename “Social Analytics”

I’ve just seen there’s a new release on the SQL Azure Labs site – Codename “Social Analytics” (that’s a bad codename by the way – did MS finally run out of place names in Washington state?). Full details are here:

It’s a service that allows you to collect social web data and then either use it for business analysis or automate responses; it has an OData endpoint so the obvious tool for doing the analysis is PowerPivot, and I guess it could also be a data source for Data Explorer when that comes along. At the moment it’s very limited in that you can only analyse the results of one of two feeds that aggregate data about either Bill Gates or Windows 8, which is a shame, but it should still be fun to play with.

Scoped Assignments and Multiselect

Something interesting to note regarding how scoped assignments behave with multiselect…

On the Adventure Works cube, add the following code to the MDX Script:


SCOPE(MEASURES.TESTCALC, [Date].[Calendar Year].[Calendar Year].MEMBERS);
THIS = 2;

SCOPE(MEASURES.TESTCALC, [Date].[Calendar Year].[All Periods]);
THIS = 3;

Then run the following query:


FROM [Adventure Works]

WHERE([Date].[Calendar Year].&[2001])

It returns the value 2 as you would expect. Now run the following query where there is a set in the Where clause, giving a multiselect on 2001 and 2002:


FROM [Adventure Works]

WHERE({[Date].[Calendar Year].&[2001],[Date].[Calendar Year].&[2002]})

It returns the value 1 – which, strangely, is the value of the original TESTCALC calculated measure before any of the scoped assignments were applied, even though it would seem that the two scoped assignments between them should cover the cells returned by this query.

Now delete the code you added to the MDX Script and add the following:



THIS = 4;


When you try to run the two queries above you get the value 4 returned in both cases. When I first saw this I thought the results returned by two scenarios were inconsistent and that there was something buggy going on in the first one, but Jeffrey told me there is a specific rule in the engine that dictates this behaviour, so it’s how it’s intended to work – as a result, it’s something that needs to be understood and allowed for in any scoped assignments you write.

Self-service ETL with Data Explorer

One of the most interesting things I saw last week at the PASS Summit was Data Explorer, the cloud-based data transformation and publishing tool that was demoed in the keynote on day 1. While it was roundly dismissed as ‘yet more Excel’ by the disgruntled DBA faction I thought it showed some potential (you can see a walkthrough of what was shown here) – even if the fact that it was a SQL Azure Labs project suggested it was not destined to be a real product.

Today, however, I came across this post on Tim Mallalieu’s blog with a 10 minute video demo of an Excel addin version of Data Explorer, made earlier this year. Tim notes in his blog that:

We still have both the client and the cloud service but we only showed the cloud bits at PASS last week.

I would urge you to go and watch the video, because what’s shown is a very substantial, capable tool: an Excel addin for doing self-service ETL. Tellingly the name of the tool in the demo is “PowerImport” – and although Tim suggests in his blog that “some names and concepts have evolved quite a bit since March”, the choice of name speaks volumes. It looks like this could be to SSIS what PowerPivot is to SSAS, and a big selling point for Microsoft’s self-service BI story if it does get released.

Why has all the data in my cube disappeared?

Here’s an issue that I’ve encountered many, many times over the years on the newsgroup and the SSAS MSDN Forum but which, for some reason, I’ve never blogged about until now. It happens from time to time that when people are developing a cube they find, mysteriously, that all the data has disappeared from it; however, there is data present in the source database and there are no key errors while processing (see here for a recent example). What’s going on?

In almost all cases the cause is that the Calculate statement at the beginning of the cube’s MDX Script has been deleted or commented out by accident when editing other calculations. To simulate this problem, open up the Adventure Works project and check to see that you can browse the cube and see data in there. Assuming you can, then go to the Calculations tab in the Cube Editor, make sure you’re in Script View and you’ll see something like the following:


The first statement in the MDX Script of every cube should be the Calculate statement, shown above. It’s a bit of a hangover from functionality that I remember from beta versions of SSAS 2005 – you could do some interesting things with a Calculate statement back then but the functionality in question got dropped before RTM. It nonetheless still has to be present though, because when SSAS encounters it when the MDX Script is evaluated it triggers the aggregation of data in all the real measures of the cube up from the very lowest level of detail up to the highest.

If you delete it or comment it out, like so:


…then, when you deploy and browse the cube, no aggregation will take place for the measures in the cube. It will look as if there’s no data in the cube at all but that’s not true: if you browse down to the very lowest level of every dimension in a given measure group, you’ll find that there’s data present. For example in the Adventure Works cube the Exchange Rates measure group is dimensioned by the Date and Destination Currency dimensions and if you browse the cube after commenting out the Calculate statement and look at the Average Rate measure you’ll see no values at first (I’ve got the Show Empty Cells option turned on here):


…but if you make sure you’re looking at data from the Date hierarchy of the Date dimension, and the Destination Currency Code hierarchy of the Destination Currency dimension, the two key attributes of the dimensions, you’ll see values are in fact present:


So the moral of this tale is: be careful not to delete or comment out your Calculate statement! After all there’s a good reason why the following warning is put before it on every new cube:

The CALCULATE command controls the aggregation of leaf cells in the cube.
If the CALCULATE command is deleted or modified, the data within the cube is affected.
You should edit this command only if you manually specify how the cube is aggregated.

[Incidentally, I would argue that the last line here is dangerous – I’ve never found a good reason to edit or delete the Calculate statement or even put MDX Script statements before it]

PASS Summit 2011–Day 3 Keynote

The third and final keynote at the PASS Summit, and this morning I’ve been given a space on the official blogger table at the keynote! Actually this just means I’ve got a table to rest my laptop on and access to a power strip, but it’s an honour nonetheless.

There are several things that I saw yesterday that are worth mentioning. Probably the most interesting session was from Cathy Dumas about Tabular: among other things she demoed a DAX Editor plugin for BIDS SQL Server Data Tools that is going to make everyone’s life soooo much easier; it will give us something like an MDX Script editor, intellisense, colour coding and so on. She has blogged about it here and I can’t wait for it to be made available. Also I came across the Data Explorer team site and blog; if you are interested in getting to play with it when it’s ready then you get your email address added to the invite list.

Anyway, back to the keynote and today it’s PASS Summit favourite Dr David DeWitt covering Big Data. It’s not a standard marketing session, more of a lecture, and all the better for it; DeWitt is a very talented speaker and more importantly takes a balanced approach to describing the SQL vs NoSQL debate. Interesting points to note:

  • He thinks that the arrival of NoSQL is not a paradigm shift, in the way that the move from hierarchical databases to relational databases was. The assertion that SQL is not dead, not surprisingly, goes down well with this audience.
  • Hadoop. I’m not even going to try to summarise this section of the talk but it is an excellent introduction to how it works, and if you’re even vaguely interested in Hadoop (which you should be given Thursday’s announcements) then you need to watch this – I believe will be available to view on demand somewhere (the slide deck is here). It is, honestly, the best explanation of all this I’ve ever seen and there are several thousand people here in this room who agree with me…
  • He does a comparison of Hive vs Parallel Data Warehouse on the same queries, same data and same hardware, and shows that PDW can outperform Hive by up to 10x. This demonstrates that a parallel database still has advantages over a NoSQL approach as far as performance goes in many cases, although of course each has its own strengths and weaknesses and performance isn’t the only consideration.

This was not only an enthralling and educational talk, but it was also great marketing from Microsoft’s point of view. You can announce Hadoop for Windows to a room full of SQL Server types and however many whizzy demos you do, and however much woo-hooing goes on, if we don’t really understand the technology we’ll go back to our day jobs and ignore it. On the other hand, teach us what the technology actually does and you’ll get us interested enough to try it out for ourselves and maybe even use it on a project.

Finally, if you’re at the Summit today come and find me at the Birds of a Feather lunch on the SSAS Performance Tuning table, or later on the BI Expert Pod this afternoon.

PASS Summit 2011–Day 2 Keynote

It’s only day 2 of the PASS Summit and I’m already feeling conferenced-out, although that might have something to do with my alcohol consumption over the last few days, the jetlag, and the early starts. I saw a few good sessions yesterday, although for some reason all the BI talks were allotted ridiculously small rooms so I had to stand or sit on the floor for them. Probably the most impressive from my point of view was a talk I saw yesterday afternoon about Power View (Crescent). Now, I’ve seen a lot of Power View demos over the last year or so but it seems to me that over the last few months since CTP3 the product has turned a corner and gone from being something that was kind-of-good-but-I’ll-wait-for-the-next-release to something that is genuinely worth getting excited about. I can’t really put my finger on what has happened – maybe it’s just reached a critical mass in terms of functionality and it doesn’t seem like a typical Microsoft Version 1.0 any more; certainly it was the first time that I’d seen it and thought that I’d be willing to take the pain of installing Sharepoint to use it.

Anyway, back to the stuff I picked up from the keynote…

  • Some nice demos of SSRS, DQS, column store indexes and other non-BI-related stuff, but nothing new and interesting (at least from my point of view)
  • Appliances. There’s now a Dell version of PDW, but in general big boxes with flashing lights don’t get me all that excited.
  • More on PDW and its roadmap. The next update (pre SQL2012) will get various enhancements including a distributed cost-based query optimiser and limited support for stored procedures. I suspect this time next year I’ll be watching a demo of Tabular in DirectQuery mode working on the next full version of PDW and we’ll have a great story for ad-hoc BI on truly large data volumes.
  • Semantic search. Looks very cool; I wonder how this can be integrated into the wider BI stack beyond a few basic SSRS reports? I’ll have to take a closer look at this…
  • An announcement !? It seems like BIDS, ie BI Development Studio, is no longer a separate thing but should be thought of as part of SQL Server Data Tools (aka Juneau). Wonder what this means for the BIDS Helper guys? Somehow “BI bits of SQL Server Data Tools Helper” isn’t such a catchy name.
  • SQL Azure. The Azure Management Portal is getting a Metro UI; there’ll be a new max size for a SQL Azure DB of 150GB.

So nothing massively exciting here from a BI point of view – clearly all the good stuff was announced yesterday. I’ll be back tomorrow; in the meantime, if you’re at PASS make sure you come to my lightning talk this afternoon because I’m doing some MDX and it would be nice to have some people in the audience who have a clue what I’m talking about!

PASS Summit 2011- Day 1 Keynote

So, here we are again in Seattle. I’ve been here for a couple of days already and only just over my jetlag – spending all day inside in the convention centre and meeting rooms is not the best way to adjust to a new time zone. Anyway, there are a lot of cool BI-related announcements coming over the next few days so I thought I’d live blog the keynotes over the next few days; today it’s the turn of Ted Kummert, so let’s see what he’s got to say…

  • The official name of Project Crescent is going to be called “Power View”. I like this name – it has echoes of PowerPivot and (ahem) QlikView.
  • Denali will be known as SQL Server 2012 and will release in the first half of next year
  • Microsoft will be offering its own Hadoop distribution on Windows; MS have forged a partnership with HortonWorks to do this. I guess this means the end for Dryad/LINQ to HPC as a product, but it’s a good decision – the market doesn’t want another MS me-too product, it wants Hadoop. There will also be an ODBC driver and addin for Excel for Apache Hive, so you will be able to get data from Hadoop directly into PowerPivot and SSAS Tabular without having to stage it in a relational database. It’ll be available as an on-premises solution and also there’ll be a CTP of an Azure-based solution by the end of the year. This is today’s first big announcement, clearly. I have a few customers with the kind of data volumes that mean they’ll be interested in this, especially now it’s coming in a friendly, MS-packaged format. Denny Lee has more details on all this here.
  • There’s a new thing coming from SQL Azure Labs called Data Explorer that will be available at the end of this year. This is a web-based data integration tool for working with data from a number of sources: I can see SQL Azure, Excel, Access and the usual sources are supported, and it also generates recommendations of data from the Azure Datamarket that you might be interested in. It allows you to mash up data from various different sources then publish the result as an OData feed – very similar to Yahoo Pipes, as far as I can see, but the cool thing is that there’s only one type of (very PowerPivot friendly) end point. Can’t wait to play with this but, however cool it looks, you have to remember that this is coming from SQL Azure Labs so it’s not going to be a real product any time soon.
  • Amir Netz (just promoted to be a Technical Fellow at MS – a very prestigious role, and great for the profile of BI inside the company I think) is now on stage for a Crescent demo. He confirmed that export to PowerPoint will make it into RTM.
  • …and now Amir moves onto mobile BI. He shows Power View on Windows Mobile and… an IPad and an Android phone! Hurray! It seems like, at last, we might have a credible mobile BI strategy from Microsoft. He spends a lot of time demoing on a Windows tablet as well – I wonder why? The audience wants to see the iPad again; maybe the iPad stuff isn’t quite ready yet, but if you remember that Power View is a Silverlight app then you can imagine that this must have involved quite a lot of hard work to do.

Overall, some very exciting news. The Hadoop and mobile BI announcements fill some glaring gaps in the MS BI story, and I’m certainly a lot happier than I was this time last year..! The DBA crowd here, though, are a bit grumpy because all the announcements have been BI related – my heart bleeds for them, truly.

One last thing: I’ve finally started using Twitter, and if you want more up-to-date news of what’s happening at PASS then I’ll be tweeting throughout the conference. I’m @Technitrain

UPDATE: all references to PowerView changed to “Power View” with a space. That’s the official name.

SQL Server training in London

Over the last few months I’ve been working on setting up a new business to provide advanced SQL Server training courses in London, similar to what’s available at the SQLBits training day but in a more traditional classroom environment, lasting several days and with practical exercises. I’m not quite ready to launch yet (watch this space – the website will be ready soon) but I do have two courses to promote right now. Both are run by guys from COEO, and both cost £990 plus VAT. Here are the details:

SQL Server Developer Workshop with Gavin Payne
1st – 2nd November 2011

A 2-day interactive workshop that drills down into new features, tools and best practices for developers working with SQL Server 2008 or 2008 R2.
More details and registration here.

SQL Server Internals and Troubleshooting Workshop with Christian Bolton
6th – 7th December 2011

The Advanced Troubleshooting Workshop for SQL Server 2005, 2008 and R2 provides attendees with SQL Server internals knowledge, practical troubleshooting skills and a proven methodical approach to problem solving. The workshop will enable attendees to tackle complex SQL Server problems with confidence.
More details and registration here.

I’ll have some more courses, including runs of my MDX and SSAS cube design courses, to announce soon…

Project Trinity

I had an interesting chat with Gary Short at SQLBits last week about social network analysis, and he pointed out that Microsoft Research’s graph database, Project Trinity, was now available to download. You can find out more about it here:

Gary suggested that Trinity might, at some point in the future, follow F# in being a MS Research project that becomes a commercial project and to be honest I think he might be right. If Analysis Services is a database that is optimised for multidimensional data, why not add Trinity to the SQL Server stack as a specialised database for graph data? After all, social network analysis is only going to become more and more important in corporate BI in the future, and that’s just one use case for a graph database. And if Trinity becomes a product, why not include features of a tool like NodeXL in Project Crescent?

%d bloggers like this: