Bringing It All Together In The Cloud and Excel

A few things got released in the past week or so that I would normally have devoted a short blog post to; now I’ve finally made it onto Twitter I tend to just tweet about them instead, but I still think a blogging is the best way for me to get my thoughts together about what they actually mean. Let’s consider this ‘stuff’ in isolation first:

  • http://www.excelmashup.com/ went live. It’s a JavaScript API for the Excel Web App which is useful, but as Jamie Thomson notes here it’s not the API he and I have been asking for for a long time, alas.
  • The first CTP of Hadoop on Azure went live. Denny Lee’s blog post is probably the best place to start to learn about it; the video in the post is a very useful quick overview of what it does too. I noticed that it supports importing data from the Windows Azure Marketplace (what used to be called the Azure Datamarket).
  • The Azure Marketplace also got a speed upgrade, as Boyan Penev notes here. This is important because whenever I’d tried to use it in the past its appalling performance had stopped me doing anything interesting with it at all. This, plus the fact that you can now publish your own data there, turns what was something of a white elephant into what could be an integral part of the Microsoft cloud BI platform.
  • Version 2 of Google BigQuery went live, although it’s still in beta.
  • The CTP of Data Explorer got released, which of course I’ve blogged about already here and which Jamie blogged about here and here.
  • Microsoft announced Office 365 compliance with leading EU and US standards for data protection and security, which means less of those legal worries about whether you’re allowed to put all that interesting data you want to use for BI into the cloud.

From this it’s clear that Microsoft’s cloud BI platform is beginning to take shape, as are competing cloud BI platforms (if we assume that Google actually has a cloud BI strategy, and I think it has), and I think it’s fair to say Microsoft is well placed. There’s also yet more evidence, as if it was not blindingly obvious already, that Excel is at the heart of Microsoft’s BI strategy. Look at how data from Hadoop on Azure can be imported directly into Excel, and how this is flagged up as an important selling point, in the video on Denny’s blog post. However I think Microsoft needs to make even more of an effort to make everything Excel-like: now that it’s building a new BI platform from scratch it has a very rare opportunity to do this and to produce a truly coherent set of tools rather than the traditional grab-bag of technologies that make up the current Microsoft BI stack and which the cloud platform could also end up as too. Actually I’d like to go further and say that rather than have a bunch of separate cloud BI tools MS should make everything BI a feature of Excel, whether it be Excel on the desktop or Excel in the cloud. This might seem a controversial thing to say, but if MS is committed to true mass-market, self-service BI then Excel has to be the platform and MS needs to base its cloud BI strategy on it 100%.

Here are a couple of the things I’d like to see happen to achieve this:

  • Data Explorer’s expression language needs to be made consistent with Excel syntax, in the way that DAX is consistent with Excel syntax. While I like what Data Explorer can do at the moment I hate having to learn a new syntax and a new set of functions for each tool I use (think SSRS expressions, SSIS expressions, MDX and so on) and it makes me much less productive when I do. I want to use the same syntax to write formulas in my spreadsheet, calculations in DAX and calculations/transformations in Data Explorer too – after all, we want to do a lot of the same things in ETL and reporting like lookups, aggregation and pivoting.
  • Hadoop on Azure is an important tool to have, not least because Hadoop is already so widely used in the wider world so it adds some credibility to the MS BI stack, but I’d like to be able to control massively parallel calculations from inside Excel and not just consume the results of them there. I’m thinking something like DataScope or the way you can scale out Excel calculations on Windows HPC, maybe driven though a PowerPivot-like interface with calculations expressed in DAX, or Data Explorer (which links back to my last point, because if Data Explorer expressions were DAX it needn’t be an either/or choice). It could of course still be Hadoop at the back-end with a layer on top to make it consistent with Excel.
  • We desperately need a proper API for the Excel Web App (yes, here we go again…). An OData API for importing and exporting data from spreadsheets on the web is only the first step; in the long term I’d want the Excel Web App to have capabilities like Project Dirigible, so that Excel on the desktop could become a kind of client tool for a massively scalable Excel Server in the cloud (note that I didn’t say Excel Services, which is more like Excel-on-the-server than a real Excel Server in my opinion). I’d want to be able to store data in the cloud and automatically synchronise it with Excel on multiple desktops or elsewhere in the cloud. I’d also want to create Excel spreadsheets in the cloud that acted like functions (similar to Project Dirigible), where these functions could be called from other Excel spreadsheets which again could be on the desktop or be in the cloud.
  • We need Analysis Services in the cloud. We’ve known it’s been coming for a long time, but not what form it will take. Again, I’d like to see tight integration with Excel similar to how PowerPivot works on the client or tighter, where Analysis Services would not be a separate service but just be the functionality within Excel for manipulating tables of data for reporting purposes. I’d want Data Explorer to be able to create and load these tables.
  • We also need the full BI functionality of desktop Excel – pivot tables and cube formulas – in the Excel Web App. I would guess this is in the pipeline anyway since it’s a pretty obvious requirement if Microsoft’s cloud BI strategy is going to work. I also don’t see the point of Power View being a separate app – I’d like to see it become a data visualisation feature of Excel.
  • Finally, I’d like to see some way of tying all the data held in this gigantic data store together. You’d want to be able to search it, certainly, but also understand where it’s come from and what it actually represents. It sounds like maybe this is what Project Barcelona is aiming for

Coordinating the work of multiple teams at Microsoft in the way this would demand is a particularly thankless task, I know. But maybe, just maybe, someone senior (maybe a Technical Fellow like Amir…?) could pull off something this ambitious?

Microsoft Codename “Social Analytics”

I’ve just seen there’s a new release on the SQL Azure Labs site – Codename “Social Analytics” (that’s a bad codename by the way – did MS finally run out of place names in Washington state?). Full details are here:

http://www.microsoft.com/en-us/sqlazurelabs/labs/socialanalytics.aspx

It’s a service that allows you to collect social web data and then either use it for business analysis or automate responses; it has an OData endpoint so the obvious tool for doing the analysis is PowerPivot, and I guess it could also be a data source for Data Explorer when that comes along. At the moment it’s very limited in that you can only analyse the results of one of two feeds that aggregate data about either Bill Gates or Windows 8, which is a shame, but it should still be fun to play with.

Excel DataScope

Jamie Thomson just tipped me off about something new and very interesting – one week after I had a moan about Microsoft doing nothing about Excel and the cloud, here comes Excel DataScope:
http://research.microsoft.com/en-us/projects/azure/datascope.aspx

Here’s the blurb from the site:

From the familiar interface of Microsoft Excel, Excel DataScope enables researchers to accelerate data-driven decision making. It offers data analytics, machine learning, and information visualization by using Windows Azure for data and compute-intensive tasks. Its powerful analysis techniques are applicable to any type of data, ranging from web analytics to survey, environmental, or social data.

There are yet more tantalising details in the video and the two pdfs here:

http://research.microsoft.com/apps/video/?id=149888
http://research.microsoft.com/en-us/projects/azure/exceldatascope_11_poster1.pdf
http://research.microsoft.com/en-us/projects/azure/exceldatascope_11_poster2.pdf

I’m currently trying to find out more about all this, but there’s clearly a  ton of cool stuff here:

  • You can use the Live Labs Pivot control for visualisation.
  • It does data mining in the cloud. Is this the successor to the old data mining addin? The functionality is clearly the same.
  • There’s a section on Map Reduce running on Windows Azure on one of the posters. Is this using Dryad?

Is this a first glimpse of a new cloud-based BI platform from Microsoft? Will SSAS in the cloud form part of it? Before we all get too excited (or at least I get too excited) it’s worth noting that this is coming from the eXtreme Computing Group and not the SQL Server team, it’s clearly aimed at scientific rather than business users, and is described as “an ongoing research and development project”, ie it is not a commercial product. The potential is obvious though, and I hope it becomes something significant.

The problem of power users, DAX and difficult calculations

Ever since PowerPivot got released, one of the questions I’ve heard debated over and over is whether it’s easy for non-IT users to learn and use DAX or not. The stock answer from Microsoft, and I agree with them here, is that anyone with basic Excel knowledge can do simple calculations in DAX, but the more complex calculations (for example, those which need to use the CALCULATE() function) are probably only ever going to be written and understood by BI professionals. Even then there’s plenty of evidence that even BI and Excel pros struggle to get to drips with DAX, as blog posts like this and this suggest. This is certainly good news for me professionally – my feeling is that for more complex calculations, DAX is just as difficult as MDX although conceptually very different, and I’ve made a good living out of MDX over the years – but on the other hand I can’t help but feel this represents a capitulation on the part of Microsoft. After all, isn’t the target PowerPivot user someone like an accountant, actuary or statistician, someone who is mathematically literate and capable of creating amazing complicated spreadsheets? How can self-service BI be truly self-service if certain calculations are still too difficult for anyone but IT professionals?

The problem isn’t the design of DAX as such, I think, but the fact that DAX exists as a language. I’m sure most BI professionals have seen worked examples of calculations created in Excel that we have been expected to translate into SQL/MDX/DAX/whatever tool we’re using. It’s the act of translating the calculation into an expression that’s the difficult thing: end-users understand how the calculations work, but they can’t speak the languages that BI tools use whereas IT people can. So why can’t our BI tools allow users to express calculations in a way that users can understand – as a series of spreadsheet formulas?

One example of a tool that tries to do this already is the Intelligencia OLAP control. However, what prompted these thoughts was Project Dirigible, a new web-based spreadsheet from the people who produce Resolver One. The key feature that sets it apart from regular spreadsheets-in-a-browser is that it’s designed for scenarios where users want to scale out complex calculations over a large number of CPUs in the cloud. What makes this possible is the run_worksheet() function, documented here, that allows you to write a calculation that overwrites the values in cells on another sheet, recalculates that sheet, and then retrieves values from cells in that sheet – as this blog post explains, it allows you to use a spreadsheet as a function from within another spreadsheet. I think this is a deeply cool idea.

Here’s an extremely simple example of how it works. The following sheet:
http://www.projectdirigible.com/user/cwebb/sheet/1254/
Contains an example of a tax calculation, with three cells: the value before tax, the tax rate, and the value after tax.

image

This second sheet then contains a number of sales records and calls the first sheet to calculate the value for each of the sales after tax:
http://www.projectdirigible.com/user/cwebb/sheet/1264/

image

Here’s the function call I’ve used to calculate tax:

=run_worksheet("http://www.projectdirigible.com/user/cwebb/sheet/1254/", {(2,3) -> c3}).d3.value

Apart from the parallelism, there are a number of obviously good things that follow on from breaking calculations out into separate sheets like this: the ability to reuse this calculation across multiple spreadsheets; the ability to completely change how this calculation works at some later date so long as the parameters remain the same; and the fact that how it works is easily comprehensible to non-IT users because it’s expressed in a very visual way. I think this general approach could be applied to DAX: in DAX, most advanced calculations involve manipulating tables of data, filtering them and aggregating values in different columns. We have tables in Excel, these tables can be filtered, we can create calculated fields inside these tables and we can aggregate and perform other calculations on the data in these tables elsewhere in the sheet, so everything we can do in a DAX calculation can already be represented in the form of a spreadsheet. So if it was possible for the Vertipaq engine to understand calculations expressed in spreadsheet form instead of as DAX expressions then non-IT users would have far fewer limits on the kind of calculations they could write. Of course, this is a very big ‘if’ and there’s only a slim chance that something like this might happen – it would require a lot of development effort and probably that PowerPivot became a native feature of Excel.

Sadly, PowerPivot apart, I’m not sure I see much evidence of exciting new ideas in the world of Excel; indeed it seems like Excel has barely got to grips with the idea of the internet, let alone a truly cloud-based model. Why can’t all the interesting parallelisation stuff that’s possible with Windows HPC Server 2008 R2 be translated to the cloud and be made available to all Excel users? Why doesn’t the Excel Web App have any kind of API yet? I’m sure all the usual responses about lack of time and resources can be reused here, but I’d hate to see Excel become yet another IE6, neglected by MS while the competition innovates – and after all, isn’t Office one of Microsoft’s cash-cows, and so shouldn’t it be getting the kind of time and money lavished on stuff like Windows Phone and Bing…?

SQL Azure Reporting: Reporting Services in the Cloud

So SSRS in the cloud has just been announced! See this post on the SQL Azure team blog:
http://blogs.msdn.com/b/sqlazure/archive/2010/10/28/10082293.aspx

…and also this 20 minute video from PDC giving a lot more detail:
http://player.microsoftpdc.com/Session/5007e9c3-03cd-41b4-9e1c-4eb17cd60e37

Basically it’s the SSRS you know and love with only a few limitations: for example it only supports SQL Azure as a data source and there’s none of the developer extensibility options (like custom data extensions) available yet.

I can’t wait for SSAS in the cloud…

Hadoop on Azure?

Here’s something interesting I’ve just seen on James Dixon’s blog: apparently Microsoft is preparing to provide Hadoop on Windows Azure. Here’s the article James links to:

http://www.sdtimes.com/link/34319

I wonder if this is just a stopgap, in response to customer demand, as the article suggests it might be? How does this fit with the recently-announced Technical Computing Initiative? Was Project Dryad a dead end?

SQL Azure BI – now hiring

Hmmm, once again Microsoft’s job ads give away a few details on upcoming products. The SQL Azure team mentioned on their blog they were hiring, so I had a look what jobs were on offer. And lo-and-behold, there were a few jobs there for developers and testers for an Azure BI team. Here are two examples:

https://careers.microsoft.com/JobDetails.aspx?ss=&pg=0&so=&rw=1&jid=14197&jlang=EN
https://careers.microsoft.com/JobDetails.aspx?ss=&pg=0&so=&rw=2&jid=14206&jlang=EN

I quote from the first:

The SQL Azure team leads the expansion of the existing SQL Server business to the Cloud. As part of Microsoft’s comprehensive Azure Services offering we entered commercial availability earlier this year with a relational database service. We are extending the platform to bring Microsoft’s market-leading Business Intelligence services as part of a self-service BI solution.

and

Prior experience with SQL Server Reporting Services (SSRS) and SQL Server Analysis Services (SSAS) or other business intelligence technologies is desired

To be fair, we already knew that SSAS and SSRS in the cloud were coming at some point – it was on several publicly-available slide decks. Also, with PowerPivot being flavour-of-the-month and the cloud being the cloud, it’s hardly surprising that it’s a self-service BI solution. I dare say Dallas and OData fit in there somewhere too (incidentally, while we’re on the subject of OData, check out the Sesame Data Browser). The fact that all these ads have gone up recently and some of them talk about the ‘new’ BI team suggests development hasn’t started yet though, so I guess we won’t be seeing PowerPivot-in-the-cloud for another year or two; perhaps in time for Denali?

Tableau Public

In a clever marketing move, Tableau have just released a new free version of their tool called Tableau Public. I’ve been a fan of Tableau since I saw it a while back (was it really 2005?) but never seen it actually used at one of my customers, I suppose because of its hefty price. Anyway, what Tableau have done is created a basic version which can only connect to Access, Excel and Text files and can only save to Tableau’s own web gallery; however, it’s then possible to embed the visualisations you create in your own website/blog etc for sharing with the world. The stated aim is for this site to be the YouTube of data visualisation; coming from Tableau this is a half-credible claim because they really do understand this area and have some great tools. Also, I’m sure the extra publicity it will generate for the company will do no harm for sales of the paid version either.

And I’m sure it will be great when they get over their teething difficulties, because at the time of writing I can’t actually save anything… maybe they underestimated the amount of interest this would generate?

OK, it’s working now. But of course Windows Live Spaces doesn’t allow me to embed a ‘viz’ in a blog post (grr); instead, here’s a simple example I just uploaded using stats on the last 30 days of traffic on the front page of this blog:
http://public.tableausoftware.com/views/Blogstats/Sheet1

Here’s what it looks like in the client:

image

So Tuesday is the best day for page loads, but Thursday is slightly better for unique visitors… and so on. But it’s a cool tool and definitely worth checking out.

Pinpoint and Dallas

Interesting news from PDC: Microsoft has announced two new services – Pinpoint and Dallas.

You can find Pinpoint here: http://pinpoint.microsoft.com

Here’s the blurb from the site:

Pinpoint is the fast, easy way for business customers to find experts, applications, and professional services to meet their specific business needs—and build on the software they already have.

At the same time, Pinpoint helps developers and technology service providers quickly and easily get software applications and professional services to market—and engage customers who need what they offer.

Pinpoint is the largest directory of qualified IT companies and their software solutions built on Microsoft technologies.
  • More than 7,000 software application offerings.
  • More than 30,000 Microsoft-technology experts.
  • The largest, most diverse set of Microsoft business platform offerings in the industry in a central location.
  • Direct links between applications and the services that support them.

Whether you’re searching for expert help or offering it, Pinpoint helps you easily find and engage the right people and technologies to get the job done.

 

Much, much more interesting from a BI point of view is Dallas, which is part of Pinpoint: http://pinpoint.microsoft.com/en-US/Dallas

It’s Microsoft’s marketplace for data, all built on Azure. Again from the blurb:
Microsoft Codename “Dallas” is Microsoft’s Information Services business, enabling developers and information workers to instantly find, purchase, and manage Web services and datasets to power the next set of killer applications on any platform.

The Register has the best write-up of what this is here: http://www.theregister.co.uk/2009/11/17/microsoft_dallas_data_service/

From that article:
Dave Campbell, a Microsoft technical fellow, demonstrated Dallas at PDC. He showed a list of data provides from the partners such as infoUSA, subscriptions, the ability to store structured and unstructured data, and to explore the data without needing to parse it, to preview the data in ATOM, invoke the data as a Rest service and analyze the data using PowerPivot in Microsoft’s Excel spreadsheet program.

Note my emphasis on the last sentence! Here at last is the ability to buy that third party data that’s been a part of every Powerpivot demo. I’ve worked with a lot of companies that sell data in my career, and this looks like it could be a very significant development for them. I’d even heard vague rumours that MS were interested in buying commercial data providers at one point, several years ago – if they were prepared to go this extreme then it would certainly go a long way to making this strategy a success.

Now just think how cool it would be if SSAS or PowerPivot could be hosted on the cloud, so all you needed was Excel to analyse this data. Maybe one day…

Live Blogging @PASS – SQL Server BI in the Cloud

Some notes/thoughts while I’m listening to John Welch’s session here at PASS on “SQL Server BI in the Cloud”. The room is packed… full marks to John for picking such a hot topic to speak on!

  • Summary of reasons why the cloud is interesting for BI – easy scaling, setup, sizing etc.
  • Distinction between ‘virtualised’ and ‘hosted’ services.
    • Virtualised = pay on usage, instant scale, reduced scaling concerns
    • Hosted = buy a set capacity
  • Azure – making the point that, unlike most other cloud offerings, you can leverage your existing (SQL Server) skills
  • Notes that other parts of the BI stack, apart from the relational engine, have been promised for the future. My feeling is that when/if SSAS in the cloud appears, it’s more likely to be PowerPivot in the cloud; note also that SSRS in the cloud has kind of already appeared with Access Services.
  • BI scenarios not really considered so far by the Azure team. I echo John’s response of “Why???”
  • Description of the Azure architecture. I was talking to someone last night about the way Azure requires use of SQL authentication (which MS have discouraged us from using for years!); SSAS of course only supports Windows authentication, which would be a problem for SSAS in the cloud, so I wonder if in the future we’ll get username/password authentication for SSAS?
  • Limitations of Azure: 10Gb max data, query limit of 5 minutes, insert/update slow. Though for some, smaller, short-lived BI solutions Azure is a perfectly good solution; sharding is an option too.
  • Shows SSRS (locally) working against data from Azure. Works better in CTP2 but still occasional bug.
  • Before the presentation started I asked John if he’d tried using SSAS in ROLAP mode against Azure; he said he had and it worked, but it was v. slow (as you’d expect).
  • Using SSAS in MOLAP mode, since processing queries are v. slow and there’s a query timeout of 5 mins, you need to create lots of small partitions  to ensure processing queries finish as quickly as possible. Proactive caching can’t use automatic notifications.
  • SSIS out of the box support coming in R2. At the moment, SSIS doesn’t support bulk insert operations to ADO.Net destinations. 
%d bloggers like this: