Specifying Allowed Values, Sample Values and Descriptions For Function Parameters In Power Query–Part 1

The other day when I was thinking about Power Query functions (as you do) it occurred to me that they would be a lot more usable if, when you invoked them, you could show your users a dropdown box for each parameter where they could choose all of the allowed parameter values rather than expect them to type a value which may or may not be correct. I thought this was such a good idea that I sent an email to the Power Query team, and within a few hours I got a reply from Matt Masson… saying that they had already implemented this some time ago but hadn’t documented it yet. He also, very kindly sent me a code example, and allowed me to blog about it (thanks Matt!). In this post I’m going to look at a very simple example of how to do this; in part 2 I’ll show some a complex scenario involving SQL Server.

Consider the following Power Query function that multiplies two numbers:

(FirstNumber as number, SecondNumber as number) as number =>
FirstNumber * SecondNumber

 

Enter this into a new blank query and you’ll see the following in the Query Editor:

image

Clicking either the Invoke Function or Invoke button will make the Enter Parameters dialog appear:

image

You can then enter two numbers, click OK, and the query will invoke the function with those two numbers and show the result:

image

Now it turns out that there are three properties you can set on a function parameter to control how it is displayed in the Enter Parameters dialog:

  • A text description of the parameter
  • A sample value
  • A list of allowed values that can be passed to the parameter, that can be selected from a dropdown box

Unfortunately the way to set these properties isn’t exactly straightforward, but the basic process is this:

  • For each parameter create a custom type, and then use a metadata record to assign the property values for that type
  • Then create a custom function type, where each parameter uses the custom types created in the previous bullet
  • Finally take your existing function and cast it to your new function type using the Value.ReplaceType() function.

OK, I know, this sounds scary – and it is, a bit. Here’s how the original function can be rewritten:

let
    //declare a function
    MultiplyTwoNumbersBasic = 
                             (FirstNumber as number, SecondNumber as number) 
                             as number =>
                             FirstNumber * SecondNumber,
    //declare custom number types with metadata for parameters
    MyFirstNumberParamType = type number
                        meta
                        [Documentation.Description = "Please enter any number",
                         Documentation.SampleValues = {8}],

    MySecondNumberParamType = type number
                        meta
                        [Documentation.Description = "Please choose a number",
                         Documentation.AllowedValues = {1..5}],
    //declare custom function type using custom number types
    MyFunctionType = type function(
                                           FirstNumber as MyFirstNumberParamType, 
                                           SecondNumber as MySecondNumberParamType) 
                                    as number,
    //cast original function to be of new custom function type
    MultiplyTwoNumbersV2 = Value.ReplaceType(
                                                   MultiplyTwoNumbersBasic, 
                                                   MyFunctionType)
in
    MultiplyTwoNumbersV2 

 

Now, when you invoke the function, the Enter Parameters dialog looks like this:

image

Note how for the FirstNumber parameter the description and the sample value is displayed; and how for the SecondNumber parameter the dropdown list shows all of the allowed values in the list we declared (remember that {1..5} returns the list {1,2,3,4,5} in M) and how it is no longer possible to type a value here.

Three things to mention last of all:

  • Although Documentation.SampleValues is a list, only the first value in the list seems to be displayed
  • Documentation.AllowedValues doesn’t actually prevent you from calling the function with a value not in this list, it just controls what values are seen in the dropdown list
  • The Workbook Queries pane no longer recognises your function as a function when you do all this – it doesn’t have the special function icon, and there is no Invoke option on the right-click menu. This is a real shame because, while it doesn’t affect how the function works, it does make it much less intuitive to use and the whole point of this exercise is usability. Hopefully this gets fixed in a future build.

You can download the a sample workbook containing all the code in this post here.

Power Query Now Works With SSAS

I normally don’t bother blogging when one of the Power Query monthly updates gets released – the posts on the Power BI blog have all the details and there’s nothing more to add. This month’s update, though, sees Power Query able to use SSAS Multidimensional and Tabular models as a data source (you can read the announcement here and watch the video here) and I thought that since these are my two favourite Microsoft products I should comment.

In no particular order, some observations/thoughts:

  • Power Query is generating MDX queries in the background here. And yes, query folding is taking place here so where possible the filtering and sorting is being translated back to MDX so the work is taking place on the server. I’ve had a look at the MDX being generated and while it’s a bit strange in places it’s basically good and should perform well.
  • In order to get Power Query to work I had to install the 2012 version of ADOMD, even though I already had the 2014 version installed. Hmmm.
  • It also doesn’t display calculated measures that aren’t associated with a measure group, although this is going to be fixed in a later release. In fact I experienced several errors associated with calculated measures when I first installed the new release, but these went away after I cleared the Power Query cache and rebooted.
  • This update does not support custom MDX queries, unlike the equivalent SQL Server data source; again this is on the roadmap though. This will be useful for two reasons:
    • For complex queries, or queries that have to be written in a certain way to perform well, sometimes you have to have complete control of the query
    • It would allow you to run DAX, DMX and SQL queries on SSAS, and also let you query SSAS DMVs. Custom DAX queries can perform a lot better than the equivalent MDX in some cases, and querying DMVs would allow you to build SSAS monitoring dashboards in Excel.
  • The Excel Data Model (which lots of people call Power Pivot, but I’m going to stick to my guns on this one because the Excel Data Model is not the same thing as Power Pivot) is not supported as a data source yet either, even though it can be queried in MDX just like SSAS. This would also be useful to have although I can also see it’s harder to implement, given that the Excel Data Model is both a data source and a data destination.
  • There are a whole bunch of new (well, not really new because they have been around since the support for SAP Business Objects came out this summer) M functions to generate those MDX queries. They probably deserve their own blog post at some point in the future.
  • Apart from the obvious reason that it allows you to combine data from SSAS with other data sources and/or load it into the Excel Data Model, why is it useful to have Power Query to connect to SSAS? What does it give you over and above the existing PivotTable, Excel Cube Function and Query Table connectivity? Why do we even need to extract data from a cube in the first place? 
    • This may only be a minor reason, but a lot of people still do get caught by the problem of having multiple cubes where they should have one, and need to build reports that incorporate data from multiple cubes. Power Query gives them an easy way of doing this.
    • I believe the step-based approach of Power Query makes it much easier for users to build complex queries than other SSAS front-ends, such as PivotTables. Being able to see the effect of filters and other transformations as they are applied, and being able to delete them and rearrange the order that they are applied in, is a huge benefit. Think of Power Query as being a custom MDX query builder where the output is a regular table in Excel rather than a PivotTable.
    • This last point leads me on to a topic that I’ve been thinking about a lot recently, and which will be the subject of a whole series of blog posts soon, namely that it is wrong to think of Power Query as simply a self-service ETL tool. It is that, but I also think that has a lot of potential as a self-service reporting tool too. For both relational database and SSAS I think Power Query could be very useful as a means of creating SSRS-like detail-level reporting solutions inside Excel. Problems with the MDX generated by PivotTables mean that queries with large numbers of hierarchies crossjoined together perform badly; the MDX that Power Query generates does not suffer from this problem. Functions can be used to parameterise Power Query queries (in fact earlier this evening I learned something immensely cool about function parameters that I’ll blog about later this week and which will make functions even more useful!) and can therefore be used in almost exactly the same way as datasets in SSRS.

Combining Data From Multiple Excel Workbooks With Power Query–The Easy Way!

If there’s one feature of Power Query that’s guaranteed to get Excel users very, very excited indeed it’s the ability to combine data from multiple workbooks into a single table. The bad news is that this is something that Power Query can’t do through the user interface (although so many people have asked for it I wouldn’t be surprised if it gets added to the product soon) and it’s not obvious how to do it.

This is a topic that has been blogged about many times over the past year or so (see DutchDataDude, Mike Alexander, Ken Puls, Miguel Escobar – apologies to anyone I’ve missed) so why should I write about it? Well, all these other posts show you the steps you have to go through to build your own function and then use that function inside a query, which is fine, but it involves a lot of clicking and typing code each time you want to do it. It’s all very time-consuming if you don’t know Power Query that well, though, and not something a regular Excel user would want to do. I’ve got an easier way though: a generic function that can combine data from workbooks in any folder you point it at. Once you’ve created it it’s very easy for anyone to use, can be reused over and over, and of course you can share this function through the Power BI Data Catalog if you have a Power BI for Office 365 subscription.

Steps to add the Power Query function to your workbook

You can either follow the steps below to add the function to your workbook, or instead just download the sample workbook containing the function here – which is a lot quicker!

1) Copy the following code onto the clipboard

//Define function parameters
(#"Directory containing Excel files to combine" as text, 
optional #"Name of each Excel object to combine" as text, 
optional #"Use first rows as headers" as logical) =>
let
    //If the optional Excel object name parameter is not set, then default to Sheet1
    ExcelName = if #"Name of each Excel object to combine" = null 
                then "Sheet1" 
                else #"Name of each Excel object to combine",
    //If the optional Use first rows as headers parameter is not set, then default to true
    UseFirstRowsAsHeaders = if #"Use first rows as headers"= null 
                            then true 
                            else #"Use first rows as headers",
    //Get a list of all the files in the folder specified
    Source = Folder.Files(#"Directory containing Excel files to combine"),
    //Filter these to only get Excel files
    OnlyGetExcelFiles = Table.SelectRows(Source, 
                              each ([Extension] = ".xlsx") 
                              or ([Extension] = ".xls")),
    //Find the full path of each file
    FullPath = Table.CombineColumns(
                    OnlyGetExcelFiles ,
                    {"Folder Path", "Name"},
                    Combiner.CombineTextByDelimiter("", QuoteStyle.None),"Merged"),
    //Get a list containing each file path
    ExcelFiles = Table.Column(FullPath, "Merged"),
    //Define a function to get the data from the specified name in each Excel workbook
    GetExcelContents = (FileName as text) =>
     let
      //Connect to the workbook
      Source = Excel.Workbook(File.Contents(FileName), UseFirstRowsAsHeaders),
      //Get a table of data from the name specified
      //If the name doesn't exist catch the error and return null
      ExcelData = try Source{[Item=ExcelName]}[Data] 
                           otherwise try Source{[Name=ExcelName]}[Data] 
                           otherwise null
     in
      ExcelData,
    //Call the above function for each Excel file
    ReadAllWorkbooks = List.Transform(ExcelFiles, each GetExcelContents(_)),
    //Remove any null values resulting from errors
    IgnoreNulls = List.RemoveNulls(ReadAllWorkbooks),
    //Combine the data from each workbook into a single table
    CombineData = Table.Combine(IgnoreNulls)
in
   CombineData

 

2) Open Excel and go to the Power Query tab on the ribbon. Click on the From Other Sources button and then click Blank Query.

image

3) The Power Query query editor window will open. Go to the View tab and click on the Advanced Editor button.

image

4) The Advanced Editor will open. Delete all the code in the main textbox and replace it with the code above. Click OK to close the Advanced Editor.

image

5) In the Query Settings pane on the right-hand side of the Query Editor, change the name of the query to CombineExcel, then go to the Home tab on the ribbon and click the Close & Load button. The Query Editor will close.

image

image

6) You can now see your function in the Workbook Queries pane in Excel! It should look like this:

image

 

Using the function to combine data from multiple workbooks

To use the function, double-click on it in the Workbook Queries pane or right-click and select Invoke. The following dialog will appear:

image

You can enter three parameters here:

  • The path of the directory containing the Excel workbooks that you want to read data from. The function can read from xlsx and xls files (though for the latter to work you need the Access 2010 engine installed, a free download if you only have Excel 2013) and will ignore any other files in the folder. The function will also read any Excel files in any subfolders.
  • Where you want to get data from in each Excel workbook. This can be the name of a worksheet (for example you could enter Sheet2 here) or a named range or a table. It’s an optional parameter so if you leave it blank it will get data from the worksheet Sheet1. If a workbook doesn’t contain the name you enter here it will be ignored. If the format of the data in each worksheet is not consistent (for example if you have different column names) then be warned: you may get some strange results.
  • Whether data on your worksheet (if you’re getting data from a worksheet) contains headers. Enter true here if your data does have a header row in every worksheet; false otherwise. This is also an optional parameter and if you leave this box empty the default value is true.

When you click OK, a new Power Query query will be created, the Query Editor window will open and you’ll see all the data from all of the Excel workbooks combined. The first step of this query is a call to the CombineExcel() function and you can carry on working with your data in Power Query as normal.

image

Disclaimer: I’ve done a reasonable amount of testing on this and I’m pretty sure it works well, but of course there will be bugs. Please leave a comment if you find a bug or can suggest any other improvements.

Returning Error Messages As Well As Results In Power Query

Back in September I wrote a post on handling situations where the data source for your Power Query query was unavailable. The great thing about that technique is that instead of seeing an error message when you refresh your query, instead you get an empty table with the same structure – which means that the Excel Data Model doesn’t get messed up, you don’t lose relationships and all your PivotTables remain intact. However it means you no longer see the error message (unless you return it in the table somehow, which isn’t ideal), which makes it hard to debug. What you really want is to get the empty table AND the error message somehow. But surely a Power Query query can only return a single value as an output?

Actually no. Every value in M can have a metadata record associated with it, and we can use this here to our advantage (I’ve been wondering whether there was a use for metadata ever since I read about it in the Language spec… and at long last I’ve found one!). Here’s a simple example of a query that shows how to associate a metadata record with a value:

let
    Source = "Some Random Value" meta [message = "Hello World", somenumber = 123],
    Output = Source
in
    Output

 

The output of this query is just the text “Some Random Value”:

image

At this point we’re setting the metadata but not doing anything with it. The following query uses the Value.Metadata() function to get the metadata record:

let
    Source = "Some Random Value" meta [message = "Hello World", somenumber = 123],
    Output = Value.Metadata(Source)
in
    Output

 

The output is now a record value:

image 

Basically, this means that in scenarios where we want to handle a data source error we can always return a table and at the same time return any error messages in the metadata.

Here’s a similar scenario to the one I showed in my original blog post, but where the query does a select distinct on the EnglishDayNameOfWeek column in the DimDate table in the Adventure Works DW database:

let
    //Connect to SQL Server
    Source = Sql.Database("localhost", "adventure works dw"),
    //Get DimDate table
    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data],
    //Remove all other columns except EnglishDayNameOfWeek
    #"Removed Other Columns" = Table.SelectColumns(dbo_DimDate,{"EnglishDayNameOfWeek"}),
    //Get distinct values from this column
    #"Removed Duplicates" = Table.Distinct(#"Removed Other Columns"),
    //Output in case of error
    AlternativeOutput=#table(type table [EnglishDayNameOfWeek=text], {}),
    //Does the Removed Duplicates set error?
    TestForError= try #"Removed Duplicates",
    Output = if TestForError[HasError] 
             then 
             //In case of error return empty table
             //and attach error message in metadata
             (AlternativeOutput meta [message = TestForError[Error]]) 
             else 
             //If there's no error
             //just return the table plus a message
             (#"Removed Duplicates" meta [message = "Query executed successfully"])
in
    Output

 

The output when the query (called DimDate) executes successfully, shown in an Excel table, is just a list of day names:

image

You can now create a second query in the same workbook with the following definition to get the metadata record associated with the output of this query:

let
    Source = Value.Metadata(DimDate),
    message = Source[message]
in
    message

The output of this query, when the first query is successful, is:

image

However, if you rename the EnglishDayNameOfWeek column in SQL Server, the first query now returns an empty table:

image

The second query now returns the error message when it’s refreshed:

image

One thing to be careful of, bearing in mind what we learned about Power Query and M in my last post, is that the output of the first query is not cached after it has been executed so executing the second query to get the error message will involve at least a partial re-execution of the first query. However in this particular case if you look in Profiler you can see Power Query checking to see whether the EnglishDayNameOfWeek column exists in the DimDate table before it tries to do the select distinct SQL query, and this is enough to know whether the query will fail or not – so whether there is an error or not, running the second query to get the message does not result in the (potentially expensive) select distinct SQL query being executed.

You can download the sample workbook for this post here.

Timing Power Query Queries

If you’re interested in performance-tuning a Power Query query, you’re going to need a reliable way of telling how long your query takes to run. It’s not as easy as it seems to adapt your query to do this though, and that’s because of the way the M formula language works. This post does show you how to find how long a query takes to run, but I have to admit that I’m not 100% satisfied with the approach and I suspect there are other, better ways of achieving the same thing that do not involve any M code. However this post does highlight a lot of interesting points about Power Query and M so I think it’s worth reading anyway…

M language background

In my last post, showing how to create a Power Query function that waits a given number of seconds, I used the following query to demonstrate the function:

	
let
    //Find current time
    One = DateTime.LocalNow(),
    //Define Wait function
    Wait = (seconds as number, action as function) => 
            if (List.Count(
             List.Generate(
              () => DateTimeZone.LocalNow() + #duration(0,0,0,seconds), 
              (x) => DateTimeZone.LocalNow() < x, 
              (x) => x)
              ) = 0) 
               then null else action(),
    //Call Wait function to wait 5 seconds
    //then return the current time again
    Two = Wait(5,DateTime.LocalNow),
    //Display start time, finish time, and duration
    Output = "Start Time: " & DateTime.ToText(One) & 
                   " Finish Time: " & DateTime.ToText(Two) & 
                   " Difference: " & Duration.ToText(Two-One)
in
    Output

 

The output of this seems very straightforward:

image

However, consider the output when you change the last step in the query to be:

    Output = "Finish Time: " & DateTime.ToText(Two) & 
                   " Start Time: " & DateTime.ToText(One) & 
                   " Difference: " & Duration.ToText(Two-One)

 

The only difference is that the finish time is displayed before the start time, and yet the output is now:

image

The start time and the finish time are now the same, and the duration is 0. However the query does still take the same amount of time to run. What’s going on here?

Looking at the M code for a query, you’d be forgiven for thinking that the first step in a query is evaluated first, the second step is evaluated second, the last step is evaluated last, and so on. This is not the case though. From page 10 of the Language Reference document (which can be downloaded here):

List and record member expressions (as well as let expressions, introduced further below) are

evaluated using lazy evaluation, which means that they are evaluated only as needed.

A Power Query query is usually a single let expression, and within a let expression steps are evaluated as and when they are needed; if a step’s output is not needed anywhere else then it may not be evaluated at all. Therefore, in the second example above, even though the step called One is the first step in the query, it is evaluated after the step called Two because that is the order that the values are needed in the expression calculated for the step Output.

Another important point to understand is that (from page 30 of the Language Reference):

Once a value has been calculated, it is immutable, meaning it can no longer be changed.

This simplifies the model for evaluating an expression and makes it easier to reason about the

result since it is not possible to change a value once it has been used to evaluate a subsequent

part of the expression.

In our example, this means that once StartTime and EndTime have been evaluated for the first time, the value they return will not change on subsequent calls.

The final thing to point out is that I’m using the DateTime.LocalNow() function here to get the system date and time at the point that the expression is evaluated. If you use DateTime.FixedLocalNow() it will return the same value every time you call it in the same query – which is obviously not going to be much use for our purposes.

How to time your queries

Now we know all this we can alter a query that does something useful so we can see how long it takes to execute. Getting status updates from Facebook is sufficiently slow for us to be able to measure a duration, and here’s an example query that shows how to measure how long the request takes:

let
    //Get the current time, which will be the query start time
    StartTime = DateTime.LocalNow(),
    //Make sure StartTime is evaluated just before we request data from Facebook
    Source = if StartTime<>null
	     then 
             Facebook.Graph("https://graph.facebook.com/me/statuses")
             else
	     null,
    //Find the number of rows returned
    NumberOfRows = Number.ToText(Table.RowCount(Source)),
    //Get the current time, which will be the query end time
    EndTime = DateTime.LocalNow(),
    //Make sure PQ evvaluates all expressions in the right order:
    //first, get the Number of rows, which ensure that
    //the Source expression is evaluated, and in turn StartTime is evaluated
    //second, the EndTime is evaluated as part of the duration calculation
    Output = "Query returned " & NumberOfRows 
             & " rows and took " 
             & Duration.ToText(EndTime - StartTime)
in
    Output

Here’s an example of the output of the query:

image

How does this work?

  • The Output step has the expression that returns the query output
  • The first value evaluated in the Output step is the NumberOfRows step, which finds the number of rows returned by Facebook in turn forces the evaluation of the Source step. If we didn’t include an expression in the output that was in some way derived from data loaded from Facebook, Power Query would not bother to evaluate the Source step.
  • The Source step gets data from Facebook, but before that happens there is an if expression that checks to see whether StartTime is null or not. Of course this always returns true, but the point here is that by making this check we are forcing StartTime to be evaluated.
  • The second value evaluated in Output is the duration. The duration is the EndTime (evaluated for the first time at this point) minus the StartTime (which has already been evaluated, remember), so we get the amount of time elapsed between the start of the request to Facebook and the end of the query.

With all of the other stuff stripped out, here’s the basic template I’ve used:

let
    StartTime = DateTime.LocalNow(),
    Source = if StartTime<>null
	     then 
             //do whatever we were going to do in our first step
             else
	     null,

    //insert all other steps here

    NumberOfRows = Number.ToText(Table.RowCount(WhateverThePreviousStepIs)),
    EndTime = DateTime.LocalNow(),
    Output = "Query returned " & NumberOfRows 
             & " rows and took " 
             & Duration.ToText(EndTime - StartTime)
in
    Output

I’m making a pretty big assumption here, though, and that is that finding the number of rows in the NumberOfRows step will force the evaluation of all previous steps, which may not be the case! You might have to think carefully about how you write an expression that forces evaluation of all of your steps.

This is all very complicated (indeed, probably too complicated) so I’m also investigating what the Power Query trace file can tell you about things like query execution times. Hopefully I’ll be able to blog about that in a few weeks.

You can download an Excel workbook with some sample Power Query queries here.

Waiting Between Web Service Requests In Power Query

Often when you’re working with web services in Power Query you’ll be making a large number of requests to the web service in a small amount of time – and you’ll notice that your query seems to be a lot slower than you would expect. The reason is, of course, that many public websites and APIs implement throttling to prevent you from making too many requests all at once, I guess to prevent denial-of-service attacks.

While looking into this issue I found a very interesting post containing the solution to this problem on the Power Query forum, from Alejandro Lopez-Lago. In it he shows how to create a function that creates a delay of a specified number of seconds in a Power Query query. Here’s an example of how to use it in a query:

let
    //Find current time
    One = DateTime.LocalNow(),
    //Define Wait function
    Wait = (seconds as number, action as function) => 
            if (List.Count(
             List.Generate(
              () => DateTimeZone.LocalNow() + #duration(0,0,0,seconds), 
              (x) => DateTimeZone.LocalNow() < x, 
              (x) => x)
              ) = 0) 
               then null else action(),
    //Call Wait function to wait 5 seconds
    //then return the current time again
    Two = Wait(5,DateTime.LocalNow),
    //Display start time, finish time, and duration
    Output = "Start Time: " & DateTime.ToText(One) & 
                   " Finish Time:" & DateTime.ToText(Two) & 
                   " Difference: " & Duration.ToText(Two-One)
in
    Output 

 

Here’s an example of the output:

image

The important step in this query is Wait, which defines the function. It works by using List.Generate() to keep on generating a list until a given number of seconds (passed in though the parameter seconds) have elapsed; when that has happened, it will execute the function that is passed in through the parameter action. Very clever stuff!

UPDATE: There’s now a better way of doing all of this, using the Function.InvokeAfter() function https://cwebbbi.wordpress.com/2015/04/30/using-function-invokeafter-in-power-query/

News On SSAS Data Source Support In Power BI

Yesterday we heard (again) that SSAS will be supported as a data source for cloud-based reports in Power BI. Today, in a session, two new important details on this emerged:

  • It will work with both Tabular and Multidimensional
  • It will connect as the user running the query, so SSAS security (eg dimension security) will work just the same as it does on-premises. No special setup will be needed; there were no details apart from the fact it will work using the EffectiveUserName connection string property.

I’m sure a lot of people will be interested to hear this…