Google, Panorama and the BI of the Future

The blog entry I posted a month or so ago about XLCubed where I had a pop at Microsoft for their client tool strategy certainly seemed to strike a chord with a lot of people (see the comments, and also Marco’s blog entry here). It also made me think that it would be worth spending a few blog entries looking at some of the new third party client tools that are out there… I’ve already lined up a few reviews, but if you’ve got an interesting, innovative client tool for Analysis Services that I could blog about, why not drop me an email?

So anyway, the big news last week was of course Google’s announcement of Chrome. And as several of the more informed bloggers (eg Nick Carr, Tim McCoy) the point of Chrome is to be not so much a browser as a platform for online applications, leading to a world where there is no obvious distinction between online and offline applications. And naturally when I think about applications I think about BI applications, and of course thinking about online BI applications and Google I thought of Panorama – who incidentally this week released the latest version of their gadget for Google Docs:
http://www.panorama.com/newsletter/2008/sept/new-gadget.html

Now, I’ll be honest and say that I’ve had a play with it and it is very slow and there are a few bugs still around. But it’s a beta, and I’m told that it’s running on a test server and performance will be better once it is released, and anyway it’s only part of a wider client tool story (outlined and analysed nicely by Nigel Pendse here) which starts in the full Novaview client and involves the ability to publish views into Google Docs for a wider audience and for collaboration. I guess it’s a step towards the long-promised future where the desktop PC will have withered away into nothing more than a machine to run a browser on, and all our BI apps and all our data will be accessible over the web.

This all makes me wonder what BI will be like in the future… What will it be like? Time for some wild, half-formed speculation:

  • Starting at the back, the first objection raised to a purely ‘BI in the cloud’ architecture is that you’ve got to upload your data to it somehow. Do you fancy trying to push what you load into your data warehouse every day up to some kind of web service? I thought not. So I think ‘BI in the cloud’ architecture is only going to be feasible when most of your source data lives in the cloud already, possibly in something something like SQL Server Data Services or Amazon Simple DB or Google BigTable; or possibly in a hosted app like Salesforce.com. This requirement puts us a long way into the future already, although for smaller data volumes and one-off analyses perhaps it’s not so much an issue.
  • You also need your organisation to accept the idea of storing its most valuable data in someone else’s data centre. Now I’m not saying this as a kind of "why don’t those luddites hurry up and accept this cool new thing"-type comment, because there are some very valid objections to be made to the idea of cloud computing at the moment, like: can I guarantee good service levels? Will the vendor I chose go bust, or get bought, or otherwise disappear in a year or two? What are the legal implications of moving data to the cloud and possibly across borders? It will be a while before there are good answers to these questions and even when there are, there’s going to be a lot of inertia that needs to be overcome.
    The analogy most commonly used to describe the brave new world of cloud computing is with the utility industry: you should be able to treat IT like electricity or water and treat it like a service you can plug into whenever you want, and be able to assume it will be there when you need it (see, for example, "The Big Switch"). As far as data goes, though, I think a better analogy is with the development of the banking industry. At the moment we treat data in the same way that a medieval lord treated his money: everyone has their own equivalent of a big strong wooden box in the castle where the gold is kept, in the form of their own data centre. Nowadays the advantages of keeping money in the bank are clear – why worry about thieves breaking in and stealing your gold in the night, why go to the effort of moving all those heavy bags of gold around yourself, when it’s much safer and easier to manage and move money about when it’s in the bank? We may never physically see the money we possess but we know where it is and we can get at it when we need it. And I think the same attitude will be taken of data in the long run, but it does need a leap of faith to get there (how many people still keep money hidden in a jam jar in a kitchen cupboard?). 
  • Once your data’s in the cloud, you’re going to want to load it into a hosted data warehouse of some kind, and I don’t think that’s too much to imagine given the cloud databases already mentioned. But how to load and transform it? Not so much of an issue if you’re doing ELT, but for ETL you’d need a whole bunch of new hosted ETL services to do this. I see Informatica has one in Informatica On Demand; I’m sure there are others.
  • You’re also going to want some kind of analytical engine on top – Analysis Services in the cloud anyone? Maybe not quite yet, but companies like Vertica (http://www.vertica.com/company/news_and_events/20080513) and Kognitio (http://www.kognitio.com/services/businessintelligence/daas.php) are pushing into this area already; the architecture this new generation of shared-nothing MPP databases surely lends itself well to the cloud model: if you need better performance you just reach for your credit card and buy a new node.
  • You then want to expose it to applications which can consume this data, and in my opinion the best way of doing this is of course through an OLAP/XMLA layer. In the case of Vertica you can already put Mondrian on top of it (http://www.vertica.com/company/news_and_events/20080212) so you can already have this if you want it, but I suspect that you’d have to invest as much time and money to make the OLAP layer scale as you had invested to make the underlying database scale, otherwise it would end up being a bottleneck. What’s the use of having a high-performance database if your OLAP tool can’t turn an MDX query, especially one with lots of calculations, into an efficient set of SQL queries and perform the calculations as fast as possible? Think of all the work that has gone into AS2008 to improve the performance of MDX calculations – the performance improvements compared to AS2005 are massive in some cases, and the AS team haven’t even tackled the problem of parallelism in the formula engine at all yet (and I’m not sure if they even want to, or if it’s a good idea). Again there’s been a lot of buzz recently about the implementation of MapReduce by Aster and Greenplum to perform parallel processing within the data warehouse, which although it aims to solve a slightly different set of problems, it nonetheless shows that problem is being thought about.
  • Then it’s onto the client itself. Let’s not talk about great improvements in usability and functionality, because I’m sure badly designed software will be as common in the future as it is today. It’s going to be delivered over the web via whatever the browser has evolved into, and will certainly use whatever modish technologies are the equivalent of today’s Silverlight, Flash, AJAX etc.  But will it be a stand-alone, specialised BI client tool, or will there just be BI features in online spreadsheets(or whatever online spreadsheets have evolved into)? Undoubtedly there will be good examples of both but I think the latter will prevail. It’s true even today that users prefer their data in Excel, the place they eventually want to work with their data; the trend would move even faster if MS pulled their finger out and put some serious BI features in Excel…
    In the short-term this raises an interesting question though: do you release a product which, like Panorama’s gadget, works with the current generation of clunky online apps in the hope that you can grow with them? Or do you, like Good Data and Birst (which I just heard about yesterday, and will be taking a closer look at soon) create your own complete, self-contained BI environment which gives a much better experience now but which could end up being an online dead-end? It all depends on how quickly the likes of Google and Microsoft (which is supposedly going to be revealing more about its online services platform soon) can deliver usable online apps; they have the deep pockets to be able to finance these apps for a few releases while they grow into something people want to use, but can smaller companies like Panorama survive long enough to reap the rewards? Panorama has a traditional BI business that could certainly keep it afloat, although one wonders whether they are angled to be acquired by Google.

So there we go, just a few thoughts I had. Anyone got any comments? I like a good discussion!

UPDATE: some more details on Panorama’s future direction can be found here:
http://www.panorama.com/blog/?p=118

In the months to come, Panorama plans to release more capabilities for its new Software as a Service (SaaS) offering and its solution for Google Apps.  Some of the new functionality will include RSS support, advanced exception and alerting, new visualization capabilities, support for data from Salesforce, SAP and Microsoft Dynamics, as well as new social capabilities.


Discover more from Chris Webb's BI Blog

Subscribe to get the latest posts to your email.

6 thoughts on “Google, Panorama and the BI of the Future

  1. As a Mondrian contributor (and also as someone who spends time making sure it works well with LucidDB), I\’d be interested to hear more about where you\’ve seen problems with efficiency of generated SQL, or MDX-level calculation efficiency.  There are definitely areas where improvements are needed to catch up to AS2008 calculation enhancements (e.g. in-memory-indexed running sum), but in most cases I\’ve seen, the challenge is in getting the underlying warehouse/app schema optimized, not in Mondrian itself.  In the long run, I\’m most interested in getting more pushdown to the DB (e.g. to take advantage of SQL/OLAP; GROUPING SETS support got added in Mondrian 2.4) since it\’s best to be able to focus tuning efforts at that tier.A related point in the "BI stack of the future" question is the underlying OS.  For the cloud, Windows may not be the OS of choice in many cases, so having to introduce it just to accommodate a single component such as AS2008 can be a non-starter from a deployment perspective.  (Perhaps Microsoft\’s acquisition of DATAllegro helps with this to some extent by increasing the available components.)  That\’s one big advantage to a cross-platform offering such as Mondrian; also, there\’s less lock-in with open source (but it would be fibbing to say there\’s no lock-in at all, especially given the lack of MDX standardization).

  2. Hi John,
     
    It was a bit of a throw-away comment, to be honest, and I had no intention to say that Mondrian was bad in this respect or that AS2008 was in any way better. Really the point I was trying to make is that the more MDX work I do, the more examples I see of really, really complex MDX calculations – calculations that not only have some difficult logic, but which involve several calculations overlapping and where the overall amount of MDX runs into thousands of lines of code. Typically this happens in financial applications. It seems to me that the more people learn MDX and the faster the calculation engine, the more people want to push it to the limit. The running sum calculations you mention are pretty trivial compared to some of the horrors I\’ve seen…!
     
    You\’re right in pointing out that in most cases it\’s the performance of the underlying RDBMS that makes the most difference, and I agree that the more you can push down to that layer the better. But that improving the efficiency of the generated SQL is a slightly different area from the one I\’m talking about, which as I think you realised is improving the efficiency of the calculation engine when it\’s working out what exactly it wants from the RDBMS and then doing the calculations once it\’s got the data. This type of financial application is still not that common on AS and I suspect even less so on Mondrian (it\’s more what Essbase and TM1 are aimed at I think) so it\’s less of a worry.
     
    I completely agree with your point about OS too, and Mondrian\’s advantage in terms of cross-platform support. For me Mondrian has another, bigger advantage in terms of its support and optimisation for many relational databases too, something AS hasn\’t focussed on because I guess so far it\’s always been better from a performance point of view to use its native MOLAP storage over ROLAP with SQL Server etc. With the new generation of databases like LucidDB emerging, though, then I\’m sure Mondrian is going to be able to piggy-back on their advances in scalability and performance and prove itself to be a significant power in the OLAP world.

  3. hi Chris & John – we are actually doing some work with both engines. each has it\’s pros and cons. Chris, as one of the best consultants out there. did you ever get to see a real life (@ a customer site) comparison between the 2 engines? can you share more insight ?

  4. Hi Chris, short comment on the original posts issue. I share your disbelieve that enterprises will upload their sensitive data to the cloud any time soon, there are just too many technical ans security issues to it. But what i could imagine is to move some parts of the BI "presentation layer" to the cloud, and upload only the data needed to support it. For a dashboard for example. In this line we set up a process that daily uploads our teams pending task statistics to a Google sheet via Python based Google Data API. Then we look at this via a chart gadget embedded in an internal web page or iGoogle page. It works great.

Leave a Reply to ChrisCancel reply