Power Query/Excel 2016 VBA Examples

In Excel 2016, Power Query is no longer an Excel add-in but a native feature of Excel, and what’s more, you can now use VBA to create and manage Power Query queries.

I’ve found two sources of information about how to use VBA with Power Query in Excel 2016. First, there are some code samples on the Technet Gallery here:
https://gallery.technet.microsoft.com/VBA-to-automate-Power-956a52d1#content
…and Gil Raviv, a Program Manager at Microsoft, has also asked for feedback on this functionality on this thread:
https://social.technet.microsoft.com/Forums/en-US/1eac9c36-b6e4-48f0-a51a-fa92b24cf1d9/vba-and-power-query-in-excel-2016-preview-lets-get-started?forum=powerquery

Secondly, I was contacted recently by Tycho Grouwstra who shared with me some of the interesting work he has done using VBA and Power Query in the Excel 2016 Preview, and who has very kindly allowed me to blog about it here. His work is much more representative of how I think most people will want to use this feature.

Tycho sent me a .xlsm file containing all of the VBA code, which you can download here. Obviously the code only works in the Excel 2016 Preview, but you can still open the file and look at the code in Excel 2013. However if you’re worried about downloading a workbook with macros in, I extracted the code to a text document which you can see here. If you want to copy the code to use in your own workbook, you’ll need to go to the VBA Editor, select Tools/References and add a reference to “Microsoft ActiveX Data Objects 6.1 Library”.

image

The VBA code includes examples of how to:

  • Delete all the Power Query queries in a workbook
  • Export/import the M code for all queries to/from another Excel workbook
  • Export/import the M code for all queries to text files
  • Refresh all the Power Query queries in the workbook
  • Load a query to an Excel table

A few bugs/features in the Preview are also pointed out, namely:

  • Imported queries don’t always show up in the Workbook Queries pane; the workaround is to close and reopen the workbook
  • Functions aren’t recognised as functions (ie they don’t have the fx icon) until you open the Query Editor and the Close & Load
  • Query groups aren’t supported yet – which is a bit of an oversight, in my opinion, but the forums thread linked to above indicates it won’t be addressed before RTM unfortunately
  • Loading the output of a query into an Excel table using the code given here doesn’t seem to have the same result as loading a query to a table in the worksheet using the Power Query UI: it creates a blue, rather than green, table that doesn’t always retain row order.

I can imagine a lot of serious Power Query users will create workbooks containing a library of their most useful queries and functions, and use VBA code to copy these queries and functions into new workbooks as and when necessary. We’ll have to wait and see what Microsoft’s plans for sharing Power Query queries are, whether they’ll go beyond what’s already been seen in Office 365 Power BI, whether they will be part of a bigger bundle of services and what the cost will be.

Incidentally, the sample workbook contains a lot of interesting, generally useful Power Query queries and functions written by Tycho and others which is also available in the following GitHub repository: https://github.com/tycho01/pquery

Drawing Lines On Maps With Power Map And Power Query

Recently, I was working with a customer that wanted to be able to draw lines between two points in Power Map. At the time I thought the only way that it might be possible was by using Power Query to generate a table of data containing a series of points that were so close together that they looked like a single line, and then plot these points in Power Map (similar to what I show in the screenshot here). Soon after, the new custom regions functionality was released in Power Map (there’s no documentation I can find right now, but this blog post is reasonably detailed) and I wondered whether now it might be possible to draw lines. Unfortunately not: Power Map can now import SHP and KML files, but it doesn’t support all the features of KML – only polygons (and even then not all the features of polygons, although inner and outer boundaries work fine). I guess this is ok for the primary use-case of Power Map, which is plotting BI data on a map, but it would be nice to see more KML features supported so that Power Map can show richer supporting information for the data: things like arrows showing direction of travel, and so on.

Anyway, I then thought – why not use polygons to draw these lines? Again, I hit a slight problem: I wanted to generate the polygons for the lines in code, and Power Map can only import SHP or KML data from files. It would be really useful if we could use shape data stored in the Excel Data Model… but we can’t. However, it is possible to use Power Query to generate KML and then copy and paste this code into a file, which can then be imported into Power Map. So, just for the fun of it, I put together a proof-of-concept workbook containing Power Query queries to generate all the tables and KML code needed to draw lines between two places, and a Power Map tour that shows the output. Here’s what the end result looks like:

image

You can download my example workbook that contains all the code, plus all the supporting files, here. You will need to update some file paths in the M code to get it all to work.

The starting point is two tables on the worksheet, one containing the single starting point for the lines, the other all of the destinations:

image

There’s a lot of M code so I’m not going to include it in this post, but here’s an overview of what each query does:

  • GetLatLong is a function that calls the Bing Maps REST API to find the latitude and longitude for each place in the tables above. You will need your own Bing Maps account key if you want to use this code yourself – you can get one at https://www.bingmapsportal.com/
  • Starting Point and Ending Points simply load the data from the Excel tables
  • StartingPointLatLong gets the latitude and longitude of the starting point by calling GetLatLong
  • StartEndPoints gets the latitude and longitude of all the ending points by calling GetLatLong, adds custom columns to show the starting point name, latitude and longitude against each ending point, and loads the result to the Excel Data Model. You have to have some data in the Excel Data Model for Power Map to display the lines, and it’s important that Power Map can match the values in one column in this table with the names of objects in the KML file.
  • BasicPolygonTemplate loads a fragment of KML, containing the definition of a polygon, from a text file. This contains two ‘parameters’, @Name and @Coordinates, which will be overwritten using Text.Replace() later on when the actual KML is being generated.
  • GetCoordinateList is a function to draw a rectangular polygon that represents the line between the starting point and an ending point. I had a lot of fun trying to get the code for this working properly (I wish I could remember any of the trigonometry that I learned after the age of 13…) and I’m still not convinced the rectangles are properly rectangular, but they’re good enough.
  • KML generates the KML for all of the polygons. The output of this query must be copied from the Power Query query window into a text file with the .kml extension, for example Test.kml. There’s no need to load the output of this query to anywhere.

image

With all of that done, you now need to open Power Map and create a new tour. Choose EndingPoint as the sole Geography column, then choose Custom Region (.kml, .shp) from the dropdown list below and click Yes to import custom regions.

image

Select the .kml file you created earlier, containing the output of the KML Power Query query, and then click Import:

image

Finally, change the visualisation type to Region and optionally add Ending Point to Category to make the lines different colours:

image

And bingo, you see the lines:

image

Support for custom regions is a massive step forward for Power Map in my opinion: rather than just being a toy for creating flashy demos it’s now able to handle a lot more real-world requirements. However, having some way of programmatically creating regions and shapes (either through Power Query as I’ve done here, or using VBA or some other API), being able to load shape data from the Excel Data Model, or even just to be able to draw shapes on a map manually, would be welcome. I’m no mapping expert but I’ve come across a few frustrated Mappoint (which was discontinued at the end of 2014) users who would like to use Power Map but find that it can’t do everything that they need. The code in this post shows what’s possible but it’s still way too complex for most users and hardly an elegant solution.

Working With International Date And Number Formats In Power Query

One problem that can really drive you crazy when working with data from text files is the many different formats that dates and numbers can be stored in around the world. For example, take a look at the contents of a tab-delimited file shown in the following screenshot:

image

There’s a single row of data and two columns, Date and Sales. What number and date are shown though? For me, living in the UK, the date shown is the 2nd of March 2015 (in the UK, like most of the world, we use the DD/MM/YYYY date format) and the number is one hundred thousand and two (in the UK we use the . sign as a decimal separator and the , as a thousands separator). However, if I was from the US and used the MM/DD/YYYY format I’d read the date as the 3rd of February 2015; and if I was from France and used a comma as a decimal separator, I’d read the number as a value just a tiny amount over one hundred. Of course, if you’re working in one country and getting data sent from another, which uses a different date or number format, you need to take all of these variations into account.

The good news is that Power Query has a really easy way of doing this for you, even if it’s not immediately obvious where this functionality is. You don’t need to change your Windows locale or split dates into their constituent parts and rearrange them in the correct order, or anything like that.

When you first load a csv file like the one shown above into Power Query, it will try to guess the data types of each column. Here’s the code generated by Power Query for this file:

let
    Source = Csv.Document(
                       File.Contents(
                        "C:\Users\Chris\Documents\Power Query demos\PQDateNumberFormats.csv"),
                        null,"#(tab)",null,1252),
    #"First Row as Header" = Table.PromoteHeaders(Source),
    #"Changed Type" = Table.TransformColumnTypes(
                                     #"First Row as Header",
                                     {{"Date", type date}, {"Sales", Int64.Type}})
in
    #"Changed Type"

As you can see, it has created three steps in the query automatically:

  1. Source loads the data from the CSV file given and sees that it’s tab delimited
  2. First Row as Header uses the first row of data as the column headers
  3. Changed Type sets the Date column to be type date, and Sales to be an integer

image

To change how the column data types are interpreted though, you first need to delete the Changed Type step by clicking on the cross icon next to it, shown above. Then, right-click on the Date column, choose Change Type then Using Locale…

image

When you do this, the Change Type With Locale dialog appears:

image

A locale is simply a name for all of the rules for date and number formats and more associated with a particular language and region. So, for example, setting the Date column so that it is interpreted using the English (United States) locale, means that when the data is loaded into Power Query on my machine I see the date 3/2/2015.

image

What’s happened is that the csv file contains the date “2/3/2015”, I have told Power Query that the data source uses a US English format date, Power Query has then assumed this date is therefore in MM/DD/YYYY format, loaded the data for me and shown the date in my own locale (which is English (United Kingdom)) in DD/MM/YYYY format as 3/2/2015. The date itself hasn’t changed, just the way it is displayed.

Using the same technique to import the Sales column as a decimal value using the English United States locale like so:

image

…gives the following final result, where the comma is assumed to be a thousands separator:

image

Here’s the new code for the query:

let
    Source = Csv.Document(
                      File.Contents(
                       "C:\Users\Chris\Documents\Power Query demos\PQDateNumberFormats.csv"),
                       null,"#(tab)",null,1252),
    #"First Row as Header" = Table.PromoteHeaders(Source),
    #"Changed Type with Locale" = Table.TransformColumnTypes(
                        #"First Row as Header", 
                        {{"Date", type date}}, "en-US"),
    #"Changed Type with Locale1" = Table.TransformColumnTypes(
                        #"Changed Type with Locale", 
                        {{"Sales", type number}}, "en-US")
in
    #"Changed Type with Locale1"

However, if I change the locale used to set the types of these two columns to French (France), I see the following:

image

You will see that I have told Power Query to interpret the value “100,002” as a decimal number using a French format, where a comma is used as a decimal separator, and it has therefore imported and displayed it as 100.002 in the English (United Kingdom) format with a decimal point used as the decimal separator.

Nested Calculations In Power Query

Quite often, in Power Query, you want to ‘nest’ calculations and transformations – apply them not across the whole table, but repeat the same calculation or transformation across multiple groups of rows within that table. Let me give you an example…

Take the following input table:

image

Imagine you wanted to add a column showing the rank of each row by Sales. In Power Query you would just need to:

  1. Load the data
  2. Sort the table in descending order by Sales
  3. Add an index column starting at 1, which is the rank

You would end up with the following output:

image

…and here’s the M code, all of which is generated by the UI:

let
    Source = Excel.CurrentWorkbook(){[Name="Sales"]}[Content],
    #"Sorted Rows" = Table.Sort(Source,{{"Sales", Order.Descending}}),
    #"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Rank", 1, 1)
in
    #"Added Index"

Now imagine you wanted to rank Sales within each month. That’s to say you want to apply the same steps as shown in the previous query but for each month individually to get an output like this:

image

One way to tackle this, and problems like it, is to do the following. First, do a Group By on the column you want to repeat the calculation over, in this case Month, and use the All Rows aggregation operation. This will result in a table with one row for each month and a column containing nested tables, as shown below:

image

Each one of these tables contains the rows from the original table for the month.

You can then take your original transformation and turn it into a function, either in a separate query or as a step in your current query. Here’s an example of how the query above can be turned into a function that takes a table and returns a table with a rank column added:

(tabletorank as table) as table =>
     let
      SortRows = Table.Sort(tabletorank,{{"Sales", Order.Descending}}),
      AddIndex = Table.AddIndexColumn(SortRows, "Rank", 1, 1)
     in
      AddIndex

 

Next, you need to pass each nested table to this function. You could do that in a calculated column, but the most elegant way I think is by using the Table.TransformColumns() function which takes a function and applies it to every value in a column (see here for another example of how to use it).

Finally, you get the final output by clicking on the Expand icon in the AllRows column and then choosing to expand all the columns in the nested table except the ones you originally grouped on:

image

image

Here’s the full M code:

let
    //Get data from Excel
    Source = Excel.CurrentWorkbook(){[Name="Sales"]}[Content],
    //Group by Month
    Grouped = Table.Group(Source, {"Month"}, {{"AllRows", each _, type table}}),
    //Declare a function that adds a Rank column to a table
    RankFunction = (tabletorank as table) as table =>
     let
      SortRows = Table.Sort(tabletorank,{{"Sales", Order.Descending}}),
      AddIndex = Table.AddIndexColumn(SortRows, "Rank", 1, 1)
     in
      AddIndex,
    //Apply that function to the AllRows column
    AddedRank = Table.TransformColumns(Grouped, {"AllRows", each RankFunction(_)}),
    //Expand the tables in the AllRows column again
    ExpandAgain = Table.ExpandTableColumn(AddedRank, "AllRows", 
       {"Product", "Sales", "Rank"}, {"Product", "Sales", "Rank"})
in
    ExpandAgain

You can download the example workbook here.

This pattern could also be applied to other types of calculation that need to be nested, for example running totals or shares.

Improving Power Query Calculation Performance With List.Buffer()

I saw an interesting post the other day on the Power Query Technet forum which showed how the List.Buffer() function could be used to improve calculation performance. This is something I’d seen hinted at in other places so I thought it was worth a bit of investigation.

Consider the following query:

let
    //Connect to SQL Server
    Source = Sql.Database("localhost", "adventure works dw"),
    //Get first 2000 rows from FactInternetSales
    dbo_FactInternetSales = Table.FirstN(
          Source{[Schema="dbo",Item="FactInternetSales"]}[Data], 
          2000),
    //Remove unwanted columns
    RemoveColumns = Table.SelectColumns(
          dbo_FactInternetSales,
          {"SalesOrderLineNumber", "SalesOrderNumber","SalesAmount"}),
    //Get sorted list of values from SalesAmount column
    RankValues = List.Sort(RemoveColumns[SalesAmount], Order.Descending),
    //Calculate ranks
    AddRankColumn = Table.AddColumn(RemoveColumns , "Rank", 
          each List.PositionOf(RankValues,[SalesAmount])+1)
in
    AddRankColumn

 

It gets the first 2000 rows from the FactInternetSales table in the Adventure Works DW database, removes most of the columns, and adds a custom column that shows the rank of the current row based on its Sales Amount.

image

On my laptop it takes around 35 seconds to run this query – pretty slow, in my opinion, given the amount of data in this table.

However, using the List.Buffer() function in the RankValues step like so:

let
    //Connect to SQL Server
    Source = Sql.Database("localhost", "adventure works dw"),
    //Get first 2000 rows from FactInternetSales
    dbo_FactInternetSales = Table.FirstN(
          Source{[Schema="dbo",Item="FactInternetSales"]}[Data], 
          2000),
    //Remove unwanted columns
    RemoveColumns = Table.SelectColumns(
          dbo_FactInternetSales,
          {"SalesOrderLineNumber", "SalesOrderNumber","SalesAmount"}),
    //Get sorted list of values from SalesAmount column
    //And buffer them!
    RankValues = List.Buffer(List.Sort(RemoveColumns[SalesAmount], Order.Descending)),
    //Calculate ranks
    AddRankColumn = Table.AddColumn(RemoveColumns , "Rank", 
          each List.PositionOf(RankValues,[SalesAmount])+1)
in
    AddRankColumn

 

Makes the query run in just 2 seconds. The List.Buffer() function stores the sorted list of values used to calculate the rank in memory which means it will only be evaluated once; in the original query it seems as though this step and those before it are being evaluated multiple times. Curt Hagenlocher’s comment (on this thread) on what List.Buffer() does for a similar calculation is telling:

The reason for this is that M is both functional and lazy, so unless we buffer the output of List.Select, we’re really just building a query that needs to be evaluated over and over. This is similar to the Enumerable functions in LINQ, if you’re familiar with those.

Table.Buffer() and Binary.Buffer() functions also exist, and do similar things.

A few other points to make:

  • This is not necessarily the optimal way to calculate ranks in Power Query – it’s just an example of how List.Buffer() can be used.
  • In the first query above, query folding is not taking place. If it had been it’s likely that performance would have been better. Since using List.Buffer() explicitly prevents query folding from taking place, it could make performance worse rather than better because of this in many cases.
  • I’m 100% certain you’ll get much better performance for a rank calculation by loading the table to the Excel Data Model/Power Pivot and writing the calculation in DAX. You should only really do calculations like this in Power Query if they are needed for other transformations in your query.

Using Function.InvokeAfter() In Power Query

Last year I blogged about how to introduce a delay between web service requests in M code. Since then a new function has been added to Power Query which makes this a lot easier: Function.InvokeAfter(). This function doesn’t seem to be documented anywhere apart from the Power Query language reference (downloadable from here); the signature is as follows:

Function.InvokeAfter(function as function, delay as duration) as any

It invokes a function after waiting for a given amount of time. Here’s a simple example of how it can be used that declares a function which returns the current date/time as text, then calls it twice with a five second delay in between:

let
    GetTimeAsText = ()=> DateTime.ToText(DateTime.LocalNow()),
    Output = GetTimeAsText() & " " 
                   & Function.InvokeAfter(GetTimeAsText, #duration(0,0,0,5))
in
    Output

 

The output of this query (at the time of writing) is:

28/04/2015 23:06:38 28/04/2015 23:06:43

One thing that did confuse me a bit was the fact that Function.InvokeAfter() doesn’t allow you to pass a list of arguments for the function you’re invoking like Function.Invoke(). The nice people at Microsoft helped me out with this though, and here’s a slightly more complicated example showing how to use Function.InvokeAfter() with a function that appends “Hello “ to a person’s name:

let
    SayHello = (personname as text) as text => "Hello " & personname,
    Output = Function.InvokeAfter(()=>SayHello("Chris"), #duration(0,0,0,5))
in
    Output

Analysing SSAS Extended Event Data With Power Query: Part 2, Storage Engine Activity

In part 1 of this series I showed how to use Power Query to extract Extended Event data generated by SSAS. Having done that, I now want to show the first (I hope of many) examples of how this data can be used for performance tuning: analysing activity in the Storage Engine, the part of SSAS that reads data from disk and aggregates it up.

I won’t go into the technical details of how I’ve used Power Query to crunch this data; you can download the sample workbook here and see for yourself. There’s nothing particularly complex going on. In brief, what I’ve done is the following:

  • Called the function shown in part 1 to get the raw Extended Event data
  • Filtered that data so that only the Query End, Query Subcube Verbose and Progress Report End events are left
  • Calculated the start time of each event relative to the start time of the earliest recorded event, to make plotting these events on a waterfall chart possible
  • Built an Excel report, including various Power Pivot measures, some normal slicers to make it easy to filter the data, some disconnected slicers for filtering so you only see events that started within a given time range, and a PivotChart showing the waterfall chart (since Excel doesn’t support this type of chart natively, I’ve used this technique to reproduce a waterfall chart with a stacked bar chart)

Here’s an example screenshot of the result, showing Storage Engine activity for a single query:

image

Though it’s hard to see the details at this resolution, the yellow line is the Query End event associated with the query, the grey lines are the Query Subcube Verbose events associated with the query, and the brown lines are the Progress Report events associated with each Query Subcube Verbose event.

What could this be used for? Here are some ideas:

  • Looking for times when there are a lot of queries running simultaneously – and which, as a result, may be performing poorly.
  • Looking for long-running Query Subcube Verbose and Progress Report End events which could be optimised by the creation of aggregations.
  • Visualising the amount of parallelism inside the Storage Engine, in particular the number of Progress Report End events that are running in parallel. This would be very interesting for queries using distinct count measures when you are testing different ways of partitioning your measure group.
  • Highlighting situations where calculations are being evaluated in cell-by-cell mode. When this happens you typically see a very large number of Query Subcube Verbose events being fired off within a query.

I’d like to stress once again that the object of this exercise is not to show off a ‘finished’ tool, but to show how Power Query, Power Pivot and Excel can be used for self-service analysis of this data. This workbook is just a starting point: if you wanted to use this on your own data it’s extremely likely you’d need to change the Power Query queries, the Power Pivot model and the report itself. Hopefully, though, this workbook will save you a lot of time if you do need to understand what’s going on in the Storage Engine when you run an MDX query.