Madison CTP

The release of the first Madison CTP was announced on the Data Platform Insider blog earlier this week:
http://blogs.technet.com/dataplatforminsider/archive/2009/08/24/microsoft-ships-the-first-technology-preview-for-project-code-named-madison.aspx

It didn’t cause much interest in the blogosphere – probably because the CTP isn’t publicly available, and even if the likes of me could download it it’s not the kind of thing I could install on a VM on my laptop…

Anyway, the question I’ve always had about Madison was whether integration with Analysis Services in ROLAP mode was going to be a priority for the first release. I now have it on good authority that it will be, and it should mean that with SSAS + ROLAP + Madison we’ll be able to create multi-terabyte (or larger!) ROLAP cubes that are super-fast. The new in-memory (IMBI) storage mode that Gemini features, and which we can assume will be appearing in the next full release of Analysis Services, will certainly increase the scalability of Analysis Services beyond the terabyte or so of data that’s currently feasible but apparently Madison will be the answer for the really large volumes. I wonder if HOLAP is an option here too? MS will certainly need to do something about the rubbish SQL that SSAS generates for its ROLAP queries before it can get the best out of Madison though.

Oh, and if anyone out there is on the Madison CTP and wants someone to help out testing it with SSAS, I’d be happy to help!

SQLBits V: Registration Open!

Registration for SQLBits V, which will be taking place on November 19th/20th/21st in Newport in Wales, is now open. As always, full details and information on how to register are on the site:
http://www.sqlbits.com

It’s a three day event this time. Thursday November 19th will be a day of pre-conference seminars, as we had in Manchester, and I would like to point out that I’ll be running my ever-popular ‘Introduction to MDX’ course once again:
http://www.sqlbits.com/information/TrainingDay.aspx?seminar=Introduction%20to%20MDX 

Alternatively, you might be interested in attending Donald Farmer’s one-day seminar on Gemini and self-service BI:
http://www.sqlbits.com/information/TrainingDay.aspx?seminar=Self%20Service%20Business%20Intelligence%20–%20making%20it%20real
or one of the other seminars that are running.

Friday the 20th is a new departure for us: a multi-track day similar to the traditional SQLBits event, but with a distinct theme of 2008 and R2, and hand-picked speakers. You’ll have to pay for this (we couldn’t afford to run two free days unfortunately) but it’s very modestly-priced (£99 if you register before the end of September) and it will have a lot of exclusive content such as more sessions from Donald Farmer. Full details are here:
http://www.sqlbits.com/information/Friday.aspx

Saturday the 21st is the regular SQLBits, free-to-attend community day. You can register here:
https://www.regonline.com/SQLBitsV 

There’s a list of sessions already submitted here:
http://www.sqlbits.com/information/PublicSessions.aspx
Session submissions are still open and we really want more abstracts. Remember, we actively encourage people with no previous speaking experience to submit and we always make a point of including new names on the agenda, so if you’re thinking of submitting a session go for it! We’d also like to hear from anyone who’s interested in sponsoring what is now the largest SQL Server-related conference in Europe:
http://www.sqlbits.com/information/Sponsorship.aspx

Interesting Gemini links

I thought I’d do a quick round-up on some of the interesting links that I’ve found concerning Gemini:

Swiss SQL Server Saturday, Zurich, September 19th

I’m doing a lot of speaking this autumn. The first event I’m going to is the first-ever Swiss SQL Saturday, which will be taking place in Zurich on September the 19th. You can find out more about it here:
http://www.sqlsaturday.ch/

It’s a free-to-attend event in the well-known SQL Saturday model and there’s a very strong line-up of speakers; credit is due to Charley Hanania for organising it. I’ll be doing a session on cache-warming strategies for SSAS 2008 – it should be a spur for me to update some of the work I did on this subject a while ago. Hopefully I’ll see some of you there!

Gemini First Thoughts

So after almost a year of hype I’ve finally got my hands on the first CTP of Gemini! I’m currently on holiday (and yes, I take my laptop on holiday, though at least my wife does too so we’re as bad as each other) but I couldn’t resist downloading it and taking a look. Here are my first impressions… and as soon as I get back home I’ll post something more detailed.

  • Installation was pretty straightforward on my Windows 7/Excel 2010 VM. Note that you do need Excel 2010 to use Gemini, as I suspected. The good thing is that it’s an addin rather than native Excel functionality so at least Gemini isn’t tied to the overall Office release cycle. I wonder how long it will be between versions?
  • When you open Excel, you see a new Gemini tab on the ribbon that looks like this:
    GeminiBar1
  • The ‘Load and Prepare Data’ button starts the fun and allows you to pull in data from various sources. You can either select entire tables or write your own SQL, and again I found it all very easy to do what I wanted; clearly a lot of what’s been learned from the SSAS cube design wizard has been applied here to make the process as smooth as possible. You can also get data from SSRS reports using the new Data Feed rendering functionality that Teo discusses here, paste data in from the clipboard, and link a Gemini table to an Excel table (this is what the Create Linked Table button in the screenshot above does).

    I can’t see much evidence yet of data preparation rather than just plain old loading, but that may well be yet to come. I’d also like to see a visual way of managing the relationships between tables, as you get with the SSAS DSV. Gemini doesn’t handle parent/child relationships yet; I’m not quite sure it handles other more complex types of relationship either but I need to play around a bit more here.

    GeminiLP

  • You can create new columns in each of the tables that you load into Gemini and define what values they display using the new Data Analysis eXpressions (DAX) language. Now MS are seemingly keen to stress that DAX isn’t an MDX replacement and I suppose that technically that’s true, but let’s be honest, it’s doing the same job as MDX but trying to be more Excel-user-friendly. I’ve not had a chance to go deep into it at all yet but it certainly looks like there’s a lot to learn here. In the meantime, from the tutorial doc I’ve got, here’s an example of a DAX calculation that returns an average of Sales per State:
    =CALCULATE(
    AVERAGE(‘Total Sales'[SalesAmount]), ALLEXCEPT(‘Total Sales’, ‘Total Sales'[State or Province])
    )
    Is this easier than MDX? Will power users be able to write this? I honestly don’t know yet.

    You can also control whether DAX calculations are updated automatically or only when you click the ‘Calculate’ button on the ribbon.

  • Once this is done it’s back to the worksheet to create a pivot table, and I’ve already blogged about what’s new here. Unlike a SSAS-bound pivot table, however, you can use any column as a measure and choose how it’s aggregated, using either a Sum, Count, Min, Max or Average (data type permitting). From playing around with it a bit more, the new Slicer functionality does really come in useful here. You can flip back and forth between the worksheet and the Gemini UI very easily.
    GeminiPT  
  • The ‘Options and Diagnostics’ button doesn’t do much except to allow you to dump the current state of the Gemini cube to a .trc trace file for further analysis. 

Overall, there aren’t actually any surprises really. As I said DAX is something I’m going to need to study in a lot more detail; I can’t really comment on the scalability and performance because I’m running on a VM and don’t have a large dataset handy; and I don’t have Sharepoint installed so I haven’t checked out the integration there (which in any case is NDA at the moment). So far I like it; it’s also less of a threat to the kind of SSAS/MDX work I do than I thought it might be – it’s a lot simpler than I’d expected and it doesn’t feel cube-like at all but much more relational.

Sets in the Where Clause and Autoexists

I don’t usually blog about bugs, but there are some cases where the dividing line between what’s a bug and what is ‘by design’ is unclear – and in these cases, a warning to the user community is always helpful. This is one of those cases…

The other day I was talking to Peder Ekstrand of DSPanel and he showed me a pair of queries running on SSAS 2008 that had him confused and to be honest, to me looked clearly buggy. I managed to repro the behaviour on Adventure Works on 2008 (friends reproed it on 2005 too) and here are my queries. The first one returns a single cell containing the value $14,477.34, the value of Internet Sales on July 1st 2001, as you’d expect:

select [Measures].[Internet Sales Amount] on 0,
[Date].[Fiscal].[All Periods] on 1
from [Adventure Works]
where([Date].[Calendar].[Date].&[20010701])

The second returns the value $29,358,677.22, the value of Internet Sales Amount across all time periods:

select [Measures].[Internet Sales Amount] on 0,
[Date].[Fiscal].[All Periods] on 1
from [Adventure Works]
where({[Date].[Calendar].[Date].&[20010701]})

…which, incidentally, is the same value you’d get if you removed the Where clause completely. To me this second result makes no sense whatsoever and is extremely confusing.

The only difference between the two queries is that in the first the Where clause contains a single member whereas in the second that member is enclosed in braces, meaning it is now a set containing a single member. The second important thing to point out is that we have members from different hierarchies on the Date dimension on Rows and in the Where clause, meaning that auto-exists is coming into play (see this section on BOL for an explanation of what auto-exists is).

The third thing to note is that it only happens in some cases. So for example when you run the following query which has a member from the Calendar Year hierarchy rather than the Date hierarchy in the set:

select [Measures].[Internet Sales Amount] on 0,
[Date].[Fiscal].[All Periods] on 1
from [Adventure Works]
where({[Date].[Calendar Year].&[2001]})

…you get the value you’d expect, ie the Internet Sales Amount for the Calendar Year 2001. This query, with a Fiscal Year on Rows, also returns the ‘correct’ result, $14,477.34:

select [Measures].[Internet Sales Amount] on 0,
[Date].[Fiscal Year].&[2002] on 1
from [Adventure Works]
where({[Date].[Calendar].[Date].&[20010701]})

Most interestingly, where the members on Rows and in the Where clause from the second query above are swapped, also returns the ‘correct’ result:

select [Measures].[Internet Sales Amount] on 0,
[Date].[Calendar].[Date].&[20010701] on 1
from [Adventure Works]
where({[Date].[Fiscal].[All Periods]})

What’s going on here? Clearly something to do with sets in the Where clause, auto-exists and probably attribute relationships, and something that could easily cause a lot of confusion for users and cube developers alike. I’ve been told that the current behaviour is ‘by design’ but the dev team are aware it’s less than ideal; it’s something to do with maintaining consistency with what happens when there are sets in the Where clause in some scenarios. But for this query:

select [Measures].[Internet Sales Amount] on 0,
[Date].[Fiscal].[All Periods] on 1
from [Adventure Works]
where({[Date].[Calendar].[Date].&[20010701], [Date].[Calendar].[Date].&[20010702]})

…I would expect to see the aggregate of Internet Sales Amount for July 1st and July 2nd 2001.

Anyway, here’s the Connect I opened about this:
https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=481774

The more votes it gets, the more likely it’ll get fixed!