I’ve now had a chance to watch the demos and read all the first-hand accounts of what was announced yesterday (see Marco, Mosha, Tim Kent, Jeremy Kashel, Richard Tkachuk). Here are some unstructured thoughts and questions:
- As per my comment yesterday about Qlikview, self-service BI is undoubtedly what many end users want – but as BI professionals we know only too well that it can be dangerous; in fact just about every blog entry I’ve read on Gemini has made this point. The question of whether it’s a good idea to let your power users do what Gemini lets them do is likely to cause all kinds of heated religious dispute: I was involved in an argument about this at SQLBits recently, and over on Brent Ozar’s blog (see here and here) you can see the same discussion being had. Although I completely understand the scenario that MS describes in its demos of users needing to work with data that isn’t and will never be in a data warehouse, I lean slightly to the side of those who see self-service BI vendors as selling "snake oil". But being a BI consultant I would, wouldn’t I? All this talk of Gemini representing the ‘constellation of twins’, power users and the IT department working together happily, is something of a fairy tale…
- In a comment on my blog yesterday, Mosha stated that there was no cube wizard needed for Gemini. But looking at the demo there’s certainly a step needed where you connect to data sources and choose the tables and fields you want to work with, so whether you call it a cube wizard in the strictest sense you need to have some understanding of your data before you can do anything with it. And whatever the demo says, the application you’re using can only take you part of the way, there’s no way a model can be 100% inferred. What happens if fields that mean the same thing have two different names in two different data sources, or if there are two fields which mean different things which have the same name? And, even for many power users, the question of what a table or a join or even a database actually will still need some explanation.
- While we’re at it – and I know this is a bit of a tangent – expecting power users to understand basic technical concepts is one thing but in many cases (as this excellent blog entry points out) "people have no way of knowing which questions are meaningful ones to ask, and which are meaningless". Not that I’m saying your average BI consultant/IT guy has a better idea either, far from it.
- I was pleased to see mention of data cleaning functionality in the Gemini addin. Is this coming from Zoomix?
- Certainly the Gemini pivot table demo was very impressive. Is this what pivot tables will look like in Office.Next? If so, are we going to see Excel finally grow up to being a full-featured AS client tool for power users in the same way Proclarity Desktop was?
- Moving on, on one hand we’ve got Project Madison, which gives us in SQL Server the ability to query vast amounts of data very quickly. Since this is in SQL Server, I would expect to be able to use AS in ROLAP mode on top. On the other hand we have Project Gemini which will give us a super-fast in-memory storage mode for AS but for slightly smaller data volumes. Where do the two meet? Will we be able to create a HOLAP like solution where your raw data stays in SQL Server/Madison and you can create Gemini-mode aggregations? And can you persist the data in Gemini to disk easily, in case of hardware failure? How long does it take to load data into Gemini?
- Apart from Qlikview, the other product being mentioned in the same breathe as Gemini is TM1, which is of course primarily used for financial apps. So what will the benefits of Gemini be for PerformancePoint and home-grown AS financial cubes? Not only faster storage engine queries, but also faster calculations (although I know only too well that sometimes you can have poor query performance due to calculations even on a warm storage engine cache, even in AS2008). And will you be able to do writeback on a Gemini partition? Now that would be a major performance benefit.
- Having said that the need to be able to write MDX will keep people like me in a job, it’s worth noting that it should indeed be possible to make it easy to write many MDX calculations in Excel. Indeed, one of the cool features of the Intelligencia Query MDX generator is precisely this: the ability to turn spreadsheet style formulas into MDX calculations. And yes, Andrew is in the process of getting this functionality patented.
- I love the idea of Gemini being AS, but I can imagine that some more relationally orientated people would want the ability to query this new data store with SQL. Of course AS actually can be queried with SQL but it’s a very limited subset; it would be great to see tighter integration between AS and the relational engine (along the lines of Oracle’s new cube-based materialised views) so the performance gains that AS gives you can be made available to the relational engine.
- Which thought in turn leads onto whether Madison style MPP can be applied to the Analysis Services engine itself (as I wondered here), either directly or if AS was more tightly integrated with the relational engine. So many permutations of these technologies are possible…
- As with PerformancePoint and Excel Services, there seems to be yet another dependency on Sharepoint here for the management of Gemini models. Of course some central repository is necessary and it makes sense to use Sharepoint rather than reinvent the wheel, but as Microsoft Watch points out this cross-dependency helps MS sell more licenses. And as anyone who has tried to sell a MS BI solution will tell you, selling more server products can be a problem – it’s not necessarily the licence cost but the perfectly valid "we don’t use Sharepoint here, we use X and we don’t want to have to support Sharepoint just for this" response that has to be overcome. I think this issue part-explains why I’ve seen so little use of Excel Services with Analysis Services in my work when it seems such a compelling proposition for almost all companies.
- Lastly, given the current financial crisis, something tells me that when the first CTPs of all this appear next year consultants like me will have plenty of free time to test it out. I know pundits out there are saying that the BI industry will weather any recession because companies will want to compete on information, but I’m sceptical – in my experience most companies don’t make rational decisions in circumstances like these (is that heresy coming from a BI consultant?), they just cut budgets and fire staff without thinking much. And IT consultants, perceived as a cost and of lesser importance to the health of the business than things like, say, the CEO’s bonus, always feels the pain first. Hohum.
In Gemini the matching of the data sources requires no cube wizard the Gemini client does that for you based on different heuristics like names of the column, data matching etc. This was covered in the talk by Amir Netz on the second day of the conference. As a user we was told that you would have the ability to change the connections.
My point is that this can never be a process that is 100% automated, and indeed I remember reading somewhere about Amir\’s demo that it required \’minimal human intervention\’ – so some kind of user input will be required sometimes. And since building a Gemini model is in effect building an AS local cube, I\’d argue that whatever UI is in place to create a Gemini model is a \’cube wizard\’ of some sort. It\’s just playing with words, but anyone claiming they have a process that can work out joins on data successfully every time (and real world data is a lot messier than Adventure Works) without any input from the user at all is definitely a snake oil salesman in my book.
When I commented on "there is no cube wizard", I wanted to make the point that user will never operate with cube concepts such as dimensions, measures, levels, attributes etc. This holds true. What you seem to call "cube wizard" looks more like "DSV wizard", i.e. some UI that will help to define table relationships etc. No argument about it – no software will be able to infer schema automatically. So "cube wizard" – no, "DSV wizard" – yes, some sort of.