Importing UK Weather Data from Azure Marketplace into PowerPivot

I don’t always agree with everything Rob Collie says, much as I respect him, but his recent post on the Windows Azure Marketplace (part of which used to be known as the Azure Datamarket) had me nodding my head. The WAM has been around for a while now and up until recently I didn’t find anything much there that I could use in my day job; I had the distinct feeling it was going to be yet another Microsoft white elephant. The appearance of the DateStream date dimension table (see here for more details) was for me a turning point, and a month ago I saw something really interesting: detailed weather data for the UK from the Met Office (the UK’s national weather service) is now available there too. OK, it’s not going to be very useful for anyone outside the UK, but the UK is my home market and for some of my customers the ability to do things like use weather forecasts to predict footfall in shops will be very useful. It’s exactly the kind of data that analysts want to find in a data market, and if the WAM guys can add other equally useful data sets they should soon reach the point where WAM is a regular destination for all PowerPivot users.

Importing this weather data into PowerPivot isn’t completely straightforward though – the data itself is quite complex. The Datamarket guys are working on some documentation for it but in the meantime I thought I’d blog about my experiences; I need to thank Max Uritsky and Ziv Kaspersky for helping me out on this.

The first step in the process of importing this data is to go to the Azure Marketplace and construct a query to get the slice of data that you want – this is a big dataset and you won’t want to get all of it. Once you’ve signed in, go to https://datamarket.azure.com/dataset/0f2cba12-e5cf-4c6d-83c9-83114d44387a, subscribe to the dataset and then click on the “Explore this Dataset” link:

SNAGHTML190a951

This takes you to the query builder page, where you get to explore the data in the different tables that make up this dataset:

You choose the table you want to explore in the confusingly-named ‘Query’ dropdown box on the right-hand side of the screen. The available tables are:

  • ThreeHourlyForecast, a fact tables containing three hourly weather forecasts
  • ThreeHourlyForecastArchive, a fact table containing aggregated, averaged values for the various forecasts for a given date and time
  • SignificantWeather, a dimension table containing the different types of weather that can be forecast
  • Visibility, a dimension table containing the different levels of visibility
  • DailyForecast, a fact table containing daily weather forecasts
  • Observations, a fact table containing observed weather
  • Site, a dimension table containing all the UK’s weather stations

As far as I can tell, this data is more or less the same as what’s available through the Met Office’s own DataPoint service, and the documentation for this is here: http://www.metoffice.gov.uk/public/ddc/datasets-documentation.html

Once you’ve selected a table you can construct a filter by entering values in the Optional Parameters boxes below the query dropdown. These changes are then reflected in the URL shown at the top of the screen:

This URL represents an OData query. One thing I didn’t notice initially is that the query that is generated here includes a top 100 filter in it which you’ll need to remove (by deleting &$top=100 from the end of the URL) if you want to do anything useful with the data; you might also want to build a more complex query than is supported by the query builder, and you can learn how to do this by reading this article.

In my case I decided to look at the full three hourly forecast data. As I said, this is a big dataset – initially I thought I’d download the whole thing, but 18 million rows and several hours later I cancelled the import into PowerPivot. Instead I opted to look at data for the whole of the UK for just one forecast made on one day, which worked out at a more manageable 250000 rows. What’s not clear from any of the current documentation is what all of the columns in the three hourly forecast fact table represent:

  • Date is the date the forecast is issued
  • StartTime is the time the forecast is issued and is either 0, 6, 12 or 18, representing midnight, 06:00, 12:00 and 18:00 – new forecasts are issued every 6 hours
  • PredictionTime is the time that an incremental update to a forecast is issued; these incremental updates appear every hour. PredictionTime is an actual time value going from 0 to 23 representing the hour the incremental update was issued.
  • TimeStep is an offset in hours from the StartTime, and represents the time that the forecast is predicting the weather for. It ranges in value from 0 to 120, going up in 3s (so the values go 0, 3, 6, 9… 120), meaning we have weather predictions for 5 days into the future for each forecast.

Therefore, for any given row in the ThreeHourlyForecast table, if the Date is April 10th 2012, StartTime is 6, PredictionTime is 8 and TimeStep is 9, then this is data from a forecast that was issued on April 10th 2012 at 8am (the second incremental update to the 6am forecast) and this row contains the prediction for the weather for the time StartTime+TimeStep = 6 + 9 = 15:00 on April 10th 2012.

Here’s the OData url I used to grab data for the three hourly forecast issued on April 10th at midnight (StartTime=0 and PredictionTime=0) for all weather stations and all time steps:

https://api.datamarket.azure.com/Data.ashx/DataGovUK/MetOfficeWeatherOpenData/ThreeHourlyForecast?$filter=Date%20eq%20datetime%272012-04-10T00%3a00%3a00%27%20and%20PredictionTime%20eq%200

To use this URL in PowerPivot, you need to create a new PowerPivot workbook, open the PowerPivot window and then click the From Azure DataMarket button:

Then enter your query URL and Account Key (which you can find on the Query Explorer page by clicking on the Show link, as indicated in the screenshot above):

Having imported this data I also imported the whole of Site (renamed here to Weather Stations) and SignificantWeather tables to give the following PowerPivot model:

Here are the joins I used:

I also created a few calculated columns, including one called ActualPredictionForStart which added the TimeStep to the Start Time and the Date to get the actual date and time that the prediction is for:

=[Date] + (([StartTime] + [TimeStep])/24)

With this all done, I was able to find out what the predicted weather for the current time and my home town was in this (as of the time of writing) two-day old forecast:

…and do all the usual PivotTable-y  and chart-y things you can do with data once it’s in Excel:

Incidentally, the forecast is wrong – it’s not raining outside right now!

PivotTables and Excel charts are all very well, but there’s a better way of visualising this data when it’s in Excel – and in my next post I’ll show you how…

 

UPDATE: First of all, I owe an apology to the Met Office – as soon as I hit publish on this post it started pouring with rain, so they were right after all. Secondly, in a weird co-incidence, Rob Collie posted about using the other weather dataset in the DataMarket on his blog: http://www.powerpivotpro.com/2012/04/download-10000-days-of-free-weather-data-for-almost-any-location-worldwide/


Discover more from Chris Webb's BI Blog

Subscribe to get the latest posts to your email.

18 thoughts on “Importing UK Weather Data from Azure Marketplace into PowerPivot

  1. Great read; I used your post as much as a Marketplace tutorial as I did anything else. The current Marketplace weather offerings for the U.S. are predictably paltry (and expensive!) — though I look forward to this evolving as time goes on.

    Please keep your ear to the Marketplace ground for us! Let us know if you hear of any new or interesting datasets that might be useful to the BI community at large.

    1. Chris, I’ve been experimenting with Marketplace (and a lot of the new 2012 tools) all day, using your blog post as a reference, and I’m trying to determine how one might circumvent PowerPivot and import this information directly into SQL Server. Am I thinking about things wrong? Do I need to have an Azure account to proceed in that direction? If so, it would surprise me, considering I’m able to access this UK weather data so easily via PowerPoint.

      1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
        Chris Webb says:

        Hi Shawn,

        You can also export data from the Datamarket in CSV format – just click on the Export button in the Explore screen. That should be the easiest way of getting the data into SQL Server. You don’t need an Azure account to use this data, you just need a Windows Live ID.

        Chris

      2. Hi Chris,

        (I’m replying to my own question, as it appears nested comments stop three levels down.)

        Going the CSV route makes sense for data such as DateStream, which would be a one-off pull — but when with forecast data, which by definition is quite volatile and must be pulled often, I would very much want to automate that feed for use in downstream scorecards.

        Might I then infer that I, the BI developer, am not the primary demographic for WAM? Or is it more accurate to assume they are forcing my hand to sign up for an Azure account?

        -Shawn

      3. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
        Chris Webb says:

        Hi Shawn,

        I don’t know what you mean exactly by an ‘Azure account’ – the only account you need is a Windows Live ID. For developers it’s pretty easy to use OData in code for automating updates – there are lots of examples here: http://www.odata.org/

        Chris

  2. Chris, interesting arcticle. I wanted to play around with it a little bit without using PowerPivot. Is it possible to retrieve the raw OData by calling the url? If I log in to the marketplace with my WindowsLiveID and then call the url “https://api.datamarket.azure.com/Data.ashx/DataGovUK/MetOfficeWeatherOpenData/Observation?$top=100” it does not work. I’m asked for a Login. Perhaps I have to add the Account key to the query in some way?

    Frank

      1. Thank You for the link. One has to set the credentials correctly.

        A small program works like this:

        1. Create a C# console application and add a service reference for the Service-Uri “https://api.datamarket.azure.com/DataGovUK/MetOfficeWeatherOpenData/”. In my example the name of this reference is “ServiceReference1”.

        2. This code reads the first 100 observations:

        Uri ServiceUri = new Uri(“https://api.datamarket.azure.com/DataGovUK/MetOfficeWeatherOpenData/”);

        ServiceReference1.DataGovUKMetOfficeWeatherOpenDataContainer Context;

        Context = new ServiceReference1.DataGovUKMetOfficeWeatherOpenDataContainer(ServiceUri);

        Context.Credentials = new NetworkCredential(MY_WINDOWS_LIVE_ID, PRIMARY_ACCOUNT_KEY_OF_THE_QUERY);

        Uri QueryUri = new Uri(“https://api.datamarket.azure.com/Data.ashx/DataGovUK/MetOfficeWeatherOpenData/Observation?$top=100”);

        IEnumerable Query;

        Query = Context.Execute(QueryUri);

        IList List = Query.ToList();

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      In a couple of ways. First and foremost, PowerPivot is a fully supported product. Secondly, PowerPivot stores all of its data in-memory locally; DataScope, as I understand it, is all about keeping data in the cloud.