Building a Simple BI Solution in Excel 2013, Part 1

nsurprisingly there’s been a lot of interest in Office 2013 since it was announced yesterday, and I’m certainly very excited by all of the new BI features in it. Luckily I wasn’t working today so I had the chance to have a proper play with it, and I thought it would be useful to walk through the process of building a simple BI solution in Excel 2013 to show off some of the new features.

Let’s start with a blank workbook:

image

Some people don’t like the new look for Office, but I quite like it. You can even set custom backgrounds: you may just be able to see some wispy clouds in the top right-hand corner of the screenshot above. But anyway, to business. To import some data, I first need to go to the Data tab as normal:

image

As you can see there are some new options available here, and I’m going to go to the Windows Azure Marketplace. Actually, I’m going to cheat a little and just say that I’m going to import the UK weather forecast from 12:00am today (July 17th) along with some related tables as described in this blog post. The UI for the import wizard is plain but functional:

image

Interestingly, the data is always imported at the end of the wizard even if I check the ‘Only Create Connection’ option on the last step of the wizard.

Once the data from all the tables has been imported, I need to specify some relationships. I can either do this by clicking on the Relationships button in the Data tab or (better still) going to the PowerPivot window and using the Diagram View. Now as I said yesterday, xVelocity in-memory database is now integrated into Excel but I still need to use the PowerPivot addin in some cases; The PowerPivot addin comes bundled with Excel 2013 but isn’t enabled by default, so in order to use it I first need to enable it; the steps to do this are detailed here. I can then click on the Manage button in the PowerPivot tab to open the PowerPivot window:

image

And then build some relationships between my tables in Diagram View, which can be reached by clicking on the small Diagram button in the very bottom right hand corner of the PowerPivot window. Once in the Diagram View, building relationships is simply a matter of dragging one column onto another:

image

I can also build hierarchies here; Duncan Sutcliffe shows how to do this here, and also how to use hierarchies in Power View which is something new. I’ve created a hierarchy going from Region to Weather Station.

With that done I can of course build a PivotTable. The first thing I’ve noticed is that there isn’t now a separate type of PivotTable for PowerPivot, which is kind of a shame because this means that you don’t seem to be able to create Slicers in the Field List any more, and have to go up to the Ribbon. It’s good for consistency though I suppose. Measure aggregation and number formats are all handled in the Value Field Settings dialog:

image

There’s a new type of Slicer available too for dates called the TimeLine which Raphael has a lot of good detail on here.

Here’s what the new Quick Explore option, which provides some options on where to drill to and for charts to create, looks like when I hover over a Region on Rows:

image

Really, though, the thing to do is to create a Power View sheet. This can be accomplished by going to the Insert tab and clicking the Power View button:

image

This is where things get exciting. Power View in Excel now supports maps and because my weather data contains the latitude and longitude of all of the weather stations in the UK it’s very easy to plot each weather station on a map and visualise the temperature and predicted weather for each station. To do this I just need to drop the Region Name column onto the Filters selection (choosing a region reduces the number of Weather Stations displayed down to a manageable number for the purposes of this demo), then drop the Latitude and Longitude columns onto the main canvas, turn the resulting table into a map, and then set the size of the markers to show temperature and the colour to show the type of weather:

image

I’ll be speaking at SQL South West this week so what’s the weather like down there at the moment?

image

From this I can see that today there’s fog in the Scilly Islands (shown by the green marker in the bottom left hand corner) and that it’s very cold and foggy in Liscombe (shown by the small red marker in the middle of the map). This dashboard was ridiculously easy to create, by the way, and I am already completely in love with Excel 2013 simply because of Power View – it’s a killer feature in my opinion.

At this point we’ve got a working self-service BI solution, made very quickly in Excel 2013 and it’s something that every competent Excel user would be able to achieve.

In Part 2, you’ll see how you can share this solution with your co-workers in the Office Web App.

OData Extensions for Data Aggregation

I was just reading the following blog post on the OASIS OData Technical Committee Call for Participation:
http://www.odata.org/blog/2012/6/11/oasis-odata-technical-committee-call-for-participation

…when I saw this:

In addition to the core OData version 3.0 protocol found here, the Technical Committee will be defining some key extensions in the first version of the OASIS Standard:

OData Extensions for Data Aggregation – Business Intelligence provides the ability to get the right set of aggregated results from large data warehouses. OData Extensions for Analytics enable OData to support Business Intelligence by allowing services to model data analytic "cubes" (dimensions, hierarchies, measures) and consumers to query aggregated data

Follow the link in the quoted text – it’s very interesting reading! Here’s just one juicy quote:

OData services expose a data model that describes the schema of the service in terms of
the Entity Data Model (EDM), an Entity-Relationship model that describes the data and
then allows for querying that data. The responses returned by an OData provider are
based on that exposed data model and retain the relationships between the entities in the
model. Adding the notion of  aggregation to OData, without changing any of the base
principles in OData as is, has two sides to it:
1. Means for the server to describe an “analytical shape” of the data represented by
the service
2. Means for the client to query an “analytical shape” on top of any given data
model (for sufficiently capable servers/services)
It’s important to notice that, while each of these two sides might be valuable in its own
right and can be used independently of the other, their combination provides additional
value for clients. The  descriptions  provided by the server  will help a consumer
understand more of the data structure looking at the service’s exposed data model from an
analytics perspective, whereas the  query extensions allow  the  clients to  express an
“analytical shape” for a particular query. The query extensions will also allow clients to
refer to the server-described “analytical shape” as shorthand.

One of the authors, Siva Harinath, I know from the SSAS dev team and it’s easy to see how this extension for OData would be incredibly useful for Microsoft BI. Not only could you imagine cloud-based SSAS (when it finally arrives) being queryable via OData, but it would allow an OData provider to not only of supply raw data to PowerPivot but the design of the PowerPivot model itself.

Importing UK Weather Data from Azure Marketplace into PowerPivot

I don’t always agree with everything Rob Collie says, much as I respect him, but his recent post on the Windows Azure Marketplace (part of which used to be known as the Azure Datamarket) had me nodding my head. The WAM has been around for a while now and up until recently I didn’t find anything much there that I could use in my day job; I had the distinct feeling it was going to be yet another Microsoft white elephant. The appearance of the DateStream date dimension table (see here for more details) was for me a turning point, and a month ago I saw something really interesting: detailed weather data for the UK from the Met Office (the UK’s national weather service) is now available there too. OK, it’s not going to be very useful for anyone outside the UK, but the UK is my home market and for some of my customers the ability to do things like use weather forecasts to predict footfall in shops will be very useful. It’s exactly the kind of data that analysts want to find in a data market, and if the WAM guys can add other equally useful data sets they should soon reach the point where WAM is a regular destination for all PowerPivot users.

Importing this weather data into PowerPivot isn’t completely straightforward though – the data itself is quite complex. The Datamarket guys are working on some documentation for it but in the meantime I thought I’d blog about my experiences; I need to thank Max Uritsky and Ziv Kaspersky for helping me out on this.

The first step in the process of importing this data is to go to the Azure Marketplace and construct a query to get the slice of data that you want – this is a big dataset and you won’t want to get all of it. Once you’ve signed in, go to https://datamarket.azure.com/dataset/0f2cba12-e5cf-4c6d-83c9-83114d44387a, subscribe to the dataset and then click on the “Explore this Dataset” link:

SNAGHTML190a951

This takes you to the query builder page, where you get to explore the data in the different tables that make up this dataset:

image

You choose the table you want to explore in the confusingly-named ‘Query’ dropdown box on the right-hand side of the screen. The available tables are:

  • ThreeHourlyForecast, a fact tables containing three hourly weather forecasts
  • ThreeHourlyForecastArchive, a fact table containing aggregated, averaged values for the various forecasts for a given date and time
  • SignificantWeather, a dimension table containing the different types of weather that can be forecast
  • Visibility, a dimension table containing the different levels of visibility
  • DailyForecast, a fact table containing daily weather forecasts
  • Observations, a fact table containing observed weather
  • Site, a dimension table containing all the UK’s weather stations

As far as I can tell, this data is more or less the same as what’s available through the Met Office’s own DataPoint service, and the documentation for this is here: http://www.metoffice.gov.uk/public/ddc/datasets-documentation.html

Once you’ve selected a table you can construct a filter by entering values in the Optional Parameters boxes below the query dropdown. These changes are then reflected in the URL shown at the top of the screen:

image

This URL represents an OData query. One thing I didn’t notice initially is that the query that is generated here includes a top 100 filter in it which you’ll need to remove (by deleting &$top=100 from the end of the URL) if you want to do anything useful with the data; you might also want to build a more complex query than is supported by the query builder, and you can learn how to do this by reading this article.

In my case I decided to look at the full three hourly forecast data. As I said, this is a big dataset – initially I thought I’d download the whole thing, but 18 million rows and several hours later I cancelled the import into PowerPivot. Instead I opted to look at data for the whole of the UK for just one forecast made on one day, which worked out at a more manageable 250000 rows. What’s not clear from any of the current documentation is what all of the columns in the three hourly forecast fact table represent:

  • Date is the date the forecast is issued
  • StartTime is the time the forecast is issued and is either 0, 6, 12 or 18, representing midnight, 06:00, 12:00 and 18:00 – new forecasts are issued every 6 hours
  • PredictionTime is the time that an incremental update to a forecast is issued; these incremental updates appear every hour. PredictionTime is an actual time value going from 0 to 23 representing the hour the incremental update was issued.
  • TimeStep is an offset in hours from the StartTime, and represents the time that the forecast is predicting the weather for. It ranges in value from 0 to 120, going up in 3s (so the values go 0, 3, 6, 9… 120), meaning we have weather predictions for 5 days into the future for each forecast.

Therefore, for any given row in the ThreeHourlyForecast table, if the Date is April 10th 2012, StartTime is 6, PredictionTime is 8 and TimeStep is 9, then this is data from a forecast that was issued on April 10th 2012 at 8am (the second incremental update to the 6am forecast) and this row contains the prediction for the weather for the time StartTime+TimeStep = 6 + 9 = 15:00 on April 10th 2012.

Here’s the OData url I used to grab data for the three hourly forecast issued on April 10th at midnight (StartTime=0 and PredictionTime=0) for all weather stations and all time steps:

https://api.datamarket.azure.com/Data.ashx/DataGovUK/MetOfficeWeatherOpenData/ThreeHourlyForecast?$filter=Date%20eq%20datetime%272012-04-10T00%3a00%3a00%27%20and%20PredictionTime%20eq%200

To use this URL in PowerPivot, you need to create a new PowerPivot workbook, open the PowerPivot window and then click the From Azure DataMarket button:

image

Then enter your query URL and Account Key (which you can find on the Query Explorer page by clicking on the Show link, as indicated in the screenshot above):

image

Having imported this data I also imported the whole of Site (renamed here to Weather Stations) and SignificantWeather tables to give the following PowerPivot model:

image

Here are the joins I used:

image

I also created a few calculated columns, including one called ActualPredictionForStart which added the TimeStep to the Start Time and the Date to get the actual date and time that the prediction is for:

=[Date] + (([StartTime] + [TimeStep])/24)

With this all done, I was able to find out what the predicted weather for the current time and my home town was in this (as of the time of writing) two-day old forecast:

image

…and do all the usual PivotTable-y  and chart-y things you can do with data once it’s in Excel:

image

Incidentally, the forecast is wrong – it’s not raining outside right now!

PivotTables and Excel charts are all very well, but there’s a better way of visualising this data when it’s in Excel – and in my next post I’ll show you how…

 

UPDATE: First of all, I owe an apology to the Met Office – as soon as I hit publish on this post it started pouring with rain, so they were right after all. Secondly, in a weird co-incidence, Rob Collie posted about using the other weather dataset in the DataMarket on his blog: http://www.powerpivotpro.com/2012/04/download-10000-days-of-free-weather-data-for-almost-any-location-worldwide/

Bringing It All Together In The Cloud and Excel

A few things got released in the past week or so that I would normally have devoted a short blog post to; now I’ve finally made it onto Twitter I tend to just tweet about them instead, but I still think a blogging is the best way for me to get my thoughts together about what they actually mean. Let’s consider this ‘stuff’ in isolation first:

  • http://www.excelmashup.com/ went live. It’s a JavaScript API for the Excel Web App which is useful, but as Jamie Thomson notes here it’s not the API he and I have been asking for for a long time, alas.
  • The first CTP of Hadoop on Azure went live. Denny Lee’s blog post is probably the best place to start to learn about it; the video in the post is a very useful quick overview of what it does too. I noticed that it supports importing data from the Windows Azure Marketplace (what used to be called the Azure Datamarket).
  • The Azure Marketplace also got a speed upgrade, as Boyan Penev notes here. This is important because whenever I’d tried to use it in the past its appalling performance had stopped me doing anything interesting with it at all. This, plus the fact that you can now publish your own data there, turns what was something of a white elephant into what could be an integral part of the Microsoft cloud BI platform.
  • Version 2 of Google BigQuery went live, although it’s still in beta.
  • The CTP of Data Explorer got released, which of course I’ve blogged about already here and which Jamie blogged about here and here.
  • Microsoft announced Office 365 compliance with leading EU and US standards for data protection and security, which means less of those legal worries about whether you’re allowed to put all that interesting data you want to use for BI into the cloud.

From this it’s clear that Microsoft’s cloud BI platform is beginning to take shape, as are competing cloud BI platforms (if we assume that Google actually has a cloud BI strategy, and I think it has), and I think it’s fair to say Microsoft is well placed. There’s also yet more evidence, as if it was not blindingly obvious already, that Excel is at the heart of Microsoft’s BI strategy. Look at how data from Hadoop on Azure can be imported directly into Excel, and how this is flagged up as an important selling point, in the video on Denny’s blog post. However I think Microsoft needs to make even more of an effort to make everything Excel-like: now that it’s building a new BI platform from scratch it has a very rare opportunity to do this and to produce a truly coherent set of tools rather than the traditional grab-bag of technologies that make up the current Microsoft BI stack and which the cloud platform could also end up as too. Actually I’d like to go further and say that rather than have a bunch of separate cloud BI tools MS should make everything BI a feature of Excel, whether it be Excel on the desktop or Excel in the cloud. This might seem a controversial thing to say, but if MS is committed to true mass-market, self-service BI then Excel has to be the platform and MS needs to base its cloud BI strategy on it 100%.

Here are a couple of the things I’d like to see happen to achieve this:

  • Data Explorer’s expression language needs to be made consistent with Excel syntax, in the way that DAX is consistent with Excel syntax. While I like what Data Explorer can do at the moment I hate having to learn a new syntax and a new set of functions for each tool I use (think SSRS expressions, SSIS expressions, MDX and so on) and it makes me much less productive when I do. I want to use the same syntax to write formulas in my spreadsheet, calculations in DAX and calculations/transformations in Data Explorer too – after all, we want to do a lot of the same things in ETL and reporting like lookups, aggregation and pivoting.
  • Hadoop on Azure is an important tool to have, not least because Hadoop is already so widely used in the wider world so it adds some credibility to the MS BI stack, but I’d like to be able to control massively parallel calculations from inside Excel and not just consume the results of them there. I’m thinking something like DataScope or the way you can scale out Excel calculations on Windows HPC, maybe driven though a PowerPivot-like interface with calculations expressed in DAX, or Data Explorer (which links back to my last point, because if Data Explorer expressions were DAX it needn’t be an either/or choice). It could of course still be Hadoop at the back-end with a layer on top to make it consistent with Excel.
  • We desperately need a proper API for the Excel Web App (yes, here we go again…). An OData API for importing and exporting data from spreadsheets on the web is only the first step; in the long term I’d want the Excel Web App to have capabilities like Project Dirigible, so that Excel on the desktop could become a kind of client tool for a massively scalable Excel Server in the cloud (note that I didn’t say Excel Services, which is more like Excel-on-the-server than a real Excel Server in my opinion). I’d want to be able to store data in the cloud and automatically synchronise it with Excel on multiple desktops or elsewhere in the cloud. I’d also want to create Excel spreadsheets in the cloud that acted like functions (similar to Project Dirigible), where these functions could be called from other Excel spreadsheets which again could be on the desktop or be in the cloud.
  • We need Analysis Services in the cloud. We’ve known it’s been coming for a long time, but not what form it will take. Again, I’d like to see tight integration with Excel similar to how PowerPivot works on the client or tighter, where Analysis Services would not be a separate service but just be the functionality within Excel for manipulating tables of data for reporting purposes. I’d want Data Explorer to be able to create and load these tables.
  • We also need the full BI functionality of desktop Excel – pivot tables and cube formulas – in the Excel Web App. I would guess this is in the pipeline anyway since it’s a pretty obvious requirement if Microsoft’s cloud BI strategy is going to work. I also don’t see the point of Power View being a separate app – I’d like to see it become a data visualisation feature of Excel.
  • Finally, I’d like to see some way of tying all the data held in this gigantic data store together. You’d want to be able to search it, certainly, but also understand where it’s come from and what it actually represents. It sounds like maybe this is what Project Barcelona is aiming for

Coordinating the work of multiple teams at Microsoft in the way this would demand is a particularly thankless task, I know. But maybe, just maybe, someone senior (maybe a Technical Fellow like Amir…?) could pull off something this ambitious?

Microsoft Codename “Social Analytics”

I’ve just seen there’s a new release on the SQL Azure Labs site – Codename “Social Analytics” (that’s a bad codename by the way – did MS finally run out of place names in Washington state?). Full details are here:

http://www.microsoft.com/en-us/sqlazurelabs/labs/socialanalytics.aspx

It’s a service that allows you to collect social web data and then either use it for business analysis or automate responses; it has an OData endpoint so the obvious tool for doing the analysis is PowerPivot, and I guess it could also be a data source for Data Explorer when that comes along. At the moment it’s very limited in that you can only analyse the results of one of two feeds that aggregate data about either Bill Gates or Windows 8, which is a shame, but it should still be fun to play with.

Excel DataScope

Jamie Thomson just tipped me off about something new and very interesting – one week after I had a moan about Microsoft doing nothing about Excel and the cloud, here comes Excel DataScope:
http://research.microsoft.com/en-us/projects/azure/datascope.aspx

Here’s the blurb from the site:

From the familiar interface of Microsoft Excel, Excel DataScope enables researchers to accelerate data-driven decision making. It offers data analytics, machine learning, and information visualization by using Windows Azure for data and compute-intensive tasks. Its powerful analysis techniques are applicable to any type of data, ranging from web analytics to survey, environmental, or social data.

There are yet more tantalising details in the video and the two pdfs here:

http://research.microsoft.com/apps/video/?id=149888
http://research.microsoft.com/en-us/projects/azure/exceldatascope_11_poster1.pdf
http://research.microsoft.com/en-us/projects/azure/exceldatascope_11_poster2.pdf

I’m currently trying to find out more about all this, but there’s clearly a  ton of cool stuff here:

  • You can use the Live Labs Pivot control for visualisation.
  • It does data mining in the cloud. Is this the successor to the old data mining addin? The functionality is clearly the same.
  • There’s a section on Map Reduce running on Windows Azure on one of the posters. Is this using Dryad?

Is this a first glimpse of a new cloud-based BI platform from Microsoft? Will SSAS in the cloud form part of it? Before we all get too excited (or at least I get too excited) it’s worth noting that this is coming from the eXtreme Computing Group and not the SQL Server team, it’s clearly aimed at scientific rather than business users, and is described as “an ongoing research and development project”, ie it is not a commercial product. The potential is obvious though, and I hope it becomes something significant.

The problem of power users, DAX and difficult calculations

Ever since PowerPivot got released, one of the questions I’ve heard debated over and over is whether it’s easy for non-IT users to learn and use DAX or not. The stock answer from Microsoft, and I agree with them here, is that anyone with basic Excel knowledge can do simple calculations in DAX, but the more complex calculations (for example, those which need to use the CALCULATE() function) are probably only ever going to be written and understood by BI professionals. Even then there’s plenty of evidence that even BI and Excel pros struggle to get to drips with DAX, as blog posts like this and this suggest. This is certainly good news for me professionally – my feeling is that for more complex calculations, DAX is just as difficult as MDX although conceptually very different, and I’ve made a good living out of MDX over the years – but on the other hand I can’t help but feel this represents a capitulation on the part of Microsoft. After all, isn’t the target PowerPivot user someone like an accountant, actuary or statistician, someone who is mathematically literate and capable of creating amazing complicated spreadsheets? How can self-service BI be truly self-service if certain calculations are still too difficult for anyone but IT professionals?

The problem isn’t the design of DAX as such, I think, but the fact that DAX exists as a language. I’m sure most BI professionals have seen worked examples of calculations created in Excel that we have been expected to translate into SQL/MDX/DAX/whatever tool we’re using. It’s the act of translating the calculation into an expression that’s the difficult thing: end-users understand how the calculations work, but they can’t speak the languages that BI tools use whereas IT people can. So why can’t our BI tools allow users to express calculations in a way that users can understand – as a series of spreadsheet formulas?

One example of a tool that tries to do this already is the Intelligencia OLAP control. However, what prompted these thoughts was Project Dirigible, a new web-based spreadsheet from the people who produce Resolver One. The key feature that sets it apart from regular spreadsheets-in-a-browser is that it’s designed for scenarios where users want to scale out complex calculations over a large number of CPUs in the cloud. What makes this possible is the run_worksheet() function, documented here, that allows you to write a calculation that overwrites the values in cells on another sheet, recalculates that sheet, and then retrieves values from cells in that sheet – as this blog post explains, it allows you to use a spreadsheet as a function from within another spreadsheet. I think this is a deeply cool idea.

Here’s an extremely simple example of how it works. The following sheet:
http://www.projectdirigible.com/user/cwebb/sheet/1254/
Contains an example of a tax calculation, with three cells: the value before tax, the tax rate, and the value after tax.

image

This second sheet then contains a number of sales records and calls the first sheet to calculate the value for each of the sales after tax:
http://www.projectdirigible.com/user/cwebb/sheet/1264/

image

Here’s the function call I’ve used to calculate tax:

=run_worksheet("http://www.projectdirigible.com/user/cwebb/sheet/1254/", {(2,3) -> c3}).d3.value

Apart from the parallelism, there are a number of obviously good things that follow on from breaking calculations out into separate sheets like this: the ability to reuse this calculation across multiple spreadsheets; the ability to completely change how this calculation works at some later date so long as the parameters remain the same; and the fact that how it works is easily comprehensible to non-IT users because it’s expressed in a very visual way. I think this general approach could be applied to DAX: in DAX, most advanced calculations involve manipulating tables of data, filtering them and aggregating values in different columns. We have tables in Excel, these tables can be filtered, we can create calculated fields inside these tables and we can aggregate and perform other calculations on the data in these tables elsewhere in the sheet, so everything we can do in a DAX calculation can already be represented in the form of a spreadsheet. So if it was possible for the Vertipaq engine to understand calculations expressed in spreadsheet form instead of as DAX expressions then non-IT users would have far fewer limits on the kind of calculations they could write. Of course, this is a very big ‘if’ and there’s only a slim chance that something like this might happen – it would require a lot of development effort and probably that PowerPivot became a native feature of Excel.

Sadly, PowerPivot apart, I’m not sure I see much evidence of exciting new ideas in the world of Excel; indeed it seems like Excel has barely got to grips with the idea of the internet, let alone a truly cloud-based model. Why can’t all the interesting parallelisation stuff that’s possible with Windows HPC Server 2008 R2 be translated to the cloud and be made available to all Excel users? Why doesn’t the Excel Web App have any kind of API yet? I’m sure all the usual responses about lack of time and resources can be reused here, but I’d hate to see Excel become yet another IE6, neglected by MS while the competition innovates – and after all, isn’t Office one of Microsoft’s cash-cows, and so shouldn’t it be getting the kind of time and money lavished on stuff like Windows Phone and Bing…?

%d bloggers like this: