Calling A Web Service From Data Explorer (Power Query), Part 1

NOTE: This post was written before Data Explorer was renamed as Power Query. All of the content is still relevant to Power Query.

Last week I showed how you could use the WebService() function in Excel 2013 to bring location data into Excel 2013. Since this is a topic I have a particular interest in, this week I’ll show you how to do the same thing all over again but in Data Explorer.

First of all, a simple example. In that previous post I used the Google Distance Matrix API to calculate the distance between two points; for example the following call shows how long it would take me to drive from my home to Buckingham Palace to see the Queen (52 minutes in case you’re wondering):
http://maps.googleapis.com/maps/api/distancematrix/xml?origins=HP66HF&destinations=SW1A1AA&mode=driving&sensor=false

The following post on the Data Explorer forum from James Terwilliger gives some helpful tips on how to consume web services from within Data Explorer:
http://social.msdn.microsoft.com/Forums/en-US/dataexplorer/thread/069b50e3-ab9e-4ee4-99a9-23440fcfc768

…but it’s not altogether straightforward. For example if you paste the link above into the From Web data source, you do get something returned but it’s extremely hard to find any useful data. Instead, I found the following steps worked:

  • First, hit From Web and enter something in the URL box:
    image
  • This gives you a new web query, but you want to discard any auto-generated code in the first step. Instead, paste the following expression:= Xml.Document(
    Web.Contents(“http://maps.googleapis.com/maps/api/distancematrix/xml”
    , [Query = [ origins= “HP66HF”, destinations = “SW1A1AA”, mode = “driving”, sensor = “false” ] ]))

    image

    This uses Web.Contents() to call the web service (as described in that forums reply) with the appropriate parameters. Xml.Document() is then used to interpret the response as an XML document.

  • With this done, it’s quite easy to navigate through the XML by clicking on the Table links in each step to find the useful data:
    image
  • And finally hit Done to surface it the worksheet:
    image

Some thoughts at this point: I don’t like the way the DE formula language is case-sensitive, and I suspect in the long run it will have to be either hidden or replaced with VBA or Excel formula language/DAX if it’s going to be used even by Excel power users. It is very, very powerful though, and luckily the UI is good enough to mean that 99% of users will never need to write DE formula language anyway.

The next question: I’ve hard-coded my origins and destinations in this example, but how can I read these values from the worksheet without my users having to open Data Explorer and edit the query? Tune in for Part 2 to find out!

Importing Data From Multiple Log Files Using Data Explorer (Power Query)

NOTE: This post was written before Data Explorer was renamed as Power Query. All of the content is still relevant to Power Query.

It’s only been two days since the official Preview release of Data Explorer and already the blog posts about it are coming thick and fast. Here are some of the more interesting ones that I’ve seen that show what’s possible with it:
http://sqlblog.com/blogs/jamie_thomson/archive/2013/02/28/traversing-the-facebook-graph-using-data-explorer.aspx
http://www.mattmasson.com/index.php/2013/03/access-the-windows-azure-marketplace-from-data-explorer/
http://community.altiusconsulting.com/best-oscar-winning-film-my-first-data-explorer-adventure/
http://www.spsdemo.com/blog/Lists/Posts/Post.aspx?List=c67861cd-a0d9-4ed8-9d9d-9b29652a516f&ID=371&Web=f74569c2-ae3f-42c6-a3fa-9f099dfaeb7f

Obviously I can’t let everyone else have all the fun, so I thought I’d show how you can use Data Explorer to import data from multiple files, clean it, load it into a single table and then report on it.

First of all, the data. Like all bloggers I have an unhealthy interest in my blog stats, and one of the ways I monitor the hits on this site is using Statcounter. I’m also a bit of a miser, though, so I only use their freebie service and that means that I only get to see stats on the last 500 site visits. How can I analyse this data then? Well, Statcounter allow you to download log data as a csv file, so at about 2:30pm I downloaded one file and at 8:30pm I downloaded another.

Now, the first cool thing to show about Data Explorer is that you can import and merge data from multiple files with the same structure if they’re in the same folder. With both of my files in a folder called Blog Logs, and Excel open, the first thing you need to do is to the Data Explorer tab and hit From File/From Folder:

image

The next step is to enter the name of the folder with the files in in the dialog:

image

With that done, a new Query screen appears with a list of the files in the folder:

image

You then need to hit the icon with the two down arrows and a horizontal line that I’ve highlighted in the screenshot above, next to the Content heading. This then shows the data in the files (obviously I’ve had to scrub out the sensitive data here):

image

You can then use the first row as the column headers:

image

Filter the data so that the row with the second set of column headers is removed (I wonder if there’s a way to do this automatically when importing multiple csv files?) by clicking on the Date and Time column and deselecting the value “Date and Time” as shown:

image

Right-click on each column you don’t want to import (such as IP Address) and selecting Hide:

image

Right-click on the Date and Time column and select Remove Duplicates to remove any records that appear in both log files (I’m assuming that there were no cases where two people hit a page at exactly the same date and time, which of course may not be completely correct):

image

And force the Date and Time column to be treated as a Date/Time type:

image

And bingo, you’re done. Here are all the steps in the import, all of which can be edited, deleted, reordered etc:

image

The data is then loaded into a table in a worksheet (though you can turn that off), and by clicking “Load to data model” in the Query Settings pane you can load the data into the Excel data model:

image

(NB I found some issues with loading date data into the data model and US/European date formats that I’ve reported here, but don’t forget this is beta software so there are bound to be problems like this)

You can build cool Power View reports using this data:

image

Or even explore it on a 3D map with GeoFlow:

image

Fun, isn’t it?

Public Preview of Data Explorer

The Public Preview of Data Explorer (which some of you know I’ve been following for a while, since it first appeared in SQL Azure Labs), is now available for download. You can get it here:
http://www.microsoft.com/en-us/download/details.aspx?id=36803

There’s also a good video overview here:

Data Explorer

In a nutshell, Data Explorer is self-service ETL for the Excel power user – it is to SSIS what PowerPivot is to SSAS. In my opinion it is just as important as PowerPivot for Microsoft’s self-service BI strategy.

I’ll be blogging about it in detail over the coming days (and also giving a quick demo in my PASS Business Analytics Virtual Chapter session tomorrow), but for now here’s a brief list of things it gives you over Excel’s native functionality for importing data:

  • It supports a much wider range of data sources, including Active Directory, Facebook, Wikipedia, Hive, and tables already in Excel
  • It has better functionality for data sources that are currently supported, such as the Azure Marketplace and web pages
  • It can merge data from multiple files that have the same structure in the same folder
  • It supports different types of authentication and the storing of credentials
  • It has a user-friendly, step-by-step approach to transforming, aggregating and filtering data until it’s in the form you want
  • It can load data into the worksheet or direct into the Excel model

There’s a lot to it, so download it and have a play! It’s supported on Excel 2013 and Excel 2010 SP1.

UPDATE: Check out the following blogs/links for Data Explorer:
http://blogs.msdn.com/b/dataexplorer/archive/2013/02/27/announcing-microsoft-data-explorer-preview-for-excel.aspx
http://social.msdn.microsoft.com/Forums/en-US/dataexplorer/
http://office.microsoft.com/en-us/excel-help/learn-about-data-explorer-formulas-HA104003958.aspx?CTT=5&origin=HA104003813
http://blogs.msdn.com/b/mllopis/archive/2013/02/28/get-microsoft-quot-data-explorer-quot-preview-for-excel-today.aspx

The Future of Data Explorer

You might have seen me mention Data Explorer a few times over the last year in various blog posts; it’s a self-service ETL tool that is currently available via SQL Azure labs:
http://www.microsoft.com/en-us/sqlazurelabs/labs/dataexplorer.aspx

I’ve had a lot of fun using it and so I was pleased, and quite surprised, to see the new version of it being used in the keynote here at the PASS Summit on day 2. After a few behind the scenes enquiries, I can now confirm that the ‘Data Explorer experience’ is currently being worked on by Microsoft, and a public preview of ‘the new Excel-based experiences’ (ie what was shown in the keynote) will be available pretty soon, hopefully. Which is very good news.

Using Google Docs, Data Explorer and PowerPivot for Questionnaires

You may have already seen that the labs release of Data Explorer is now publicly available; there’s a whole load of really useful resources available on the learning page too if you’re interested in finding out more about it.  I’ve been very lucky to have had early access to Data Explorer, and to test it out I put together a simple demo using the cloud service that shows off a typical use-case.

The first thing I did was to use Google Docs (just to have a cross-platform demo, not because I love Google in any way, honest…) to create a simple questionnaire using Google Forms. Before you read any further, please go and fill out the questionnaire I created here:

https://docs.google.com/spreadsheet/viewform?formkey=dDRnNi1fbkotLVd6Q0g4MmhsdFV2OGc6MQ

Don’t worry, there’s only three questions and it’s all non-personal data! For those of you reading offline, here’s a screenshot:

image

Now when you create a questionnaire like this in Google Forms, the responses get put inside a Google Docs spreadsheet. Here’s the link to the spreadsheet behind my questionnaire:

https://docs.google.com/spreadsheet/ccc?key=0Akv4XYo6s_Z2dDRnNi1fbkotLVd6Q0g4MmhsdFV2OGc

image

The good thing about Google Docs (unlike, ahem, the Excel Web App) is that it has an API. The contents of this sheet could easily be exported to a number of formats including csv, which means I could get the data into PowerPivot very easily. But there was a problem: the last question is multiple choice, and for the results of that question I got a comma-delimited list of values in a single cell in the spreadsheet (see the above screenshot) – which was not going to be very useful for analysis purposes. What I really wanted was all this data split out into separate columns, one column for each version and containing a boolean value to show if that version has been checked, so if I was going to analyse my responses by version I clearly needed to do some ETL work. I could do this with a calculated column inside PowerPivot of course, but the problem with this is that every time someone wanted to work with this data in a new PowerPivot model they’d have to repeat all this work, which is a pain, and clearly some users wouldn’t have the DAX skills to do this. The best thing to do would be to perform the ETL somewhere up in the cloud so everyone could benefit from it…

Enter Data Explorer. I created a simple mashup with the following steps:

  • Imported the data from the Google spreadsheet as a csv file
  • Cast that data as a table
  • Split the Timestamp column into two separate Date and Time columns
  • Added new columns to the table for each version of SSAS that contained the value True if that version had been checked in a particular response, False if not

image

Apart from the usual struggles that go with learning a new language, it was pretty straightforward and I was impressed with how easy it was to use. Here’s an example of an expression that adds a new column to show whether the respondent checked the “OLAP Services” box in the final question:

= Table.AddColumn(#"Rename Date Time", "Used OLAP Services", each if Text.Contains([#"What versions of Analysis Services have you used?"],"OLAP Services") then "True" else "False")

Finally, I published the output of the mashup publicly. This page contains all of the links to download the live data in various different formats:

https://ws18615032.dataexplorer.sqlazurelabs.com/Published/Chris%20Webb%20Questionnaire%20Demo

image

If you filled in the questionnaire you should be able to find your responses in there because it’s a live feed.

And you can of course import the data into PowerPivot now very easily, for example by using the OData feed from Data Explorer. First, start Excel, go into the PowerPivot window and click on the “From Data Feeds” button:

image

Then, in the wizard, enter the URL of the OData feed:

image

And you should then have no problems importing the data:

image

…and then analysing the responses. You will want to create a simple measure with a definition something like this to do so to count the number of responses:

=COUNTROWS(‘Questionnaire Data’)

image

I’m looking forward to seeing the data come flooding in!

This approach could easily be applied to scenarios such as analysing feedback from user group meetings or events, and what with the number of online data sources out there there must be hundreds of other potential applications as well. And given that anyone can now publish and sell data on the Windows Azure Marketplace there must be ways of making money from this too…

%d bloggers like this: