Power Query/Get & Transform: The Future Of Data Loading In Excel

Last week, Dominic Petri shared a link on Twitter that contained some important news from Microsoft: as of the March release of the Office 365 click-to-run version of Excel 2016, the old methods of loading data into Excel will be replaced by the newer “Get & Transform” functionality that is better known as Power Query. All the details are here:

https://support.office.com/en-us/article/Unified-Get-Transform-ad78befd-eb1c-4ea7-a55d-79d1d67cf9b3?ui=en-US&rs=en-US&ad=US&fromAR=1

Up to now Excel 2016 users have had to deal with a rather confusing compromise whereby the older data loading functionality existed side-by-side with the new “Get & Transform” functionality on the Data tab of the ribbon. From the March update onwards though the Data tab will look like this:

Get & Transform options on the Data tab

There are also several other cool new features being introduced, such as the ability to export Power Query queries to .odc files so they can be shared between workbooks, and new UI for managing connections and queries. You’ll still be able to get the old data loading functionality back via the Options dialog if you want.

If you’re a regular reader of this blog you’ll know that I’m a massive fan of Power Query, so naturally I’m really happy to see this happen. I believe it’s a big step forward for Excel in terms of its BI and reporting capabilities and the fact that this functionality is almost identical to the data loading functionality in Power BI means that Microsoft’s overall BI story becomes even more compelling: the skills you learn in Excel are transferable to Power BI and vice versa. On the other hand I can guess this is going to cause confusion for some users – similar to what happened when the ribbon first appeared in Office 2007 – but I don’t think this can be avoided. Ultimately the benefits will outweigh any disruption caused as users have to learn the new way of doing things.

Exporting Your Queries’ M Code From Power Query And Power BI Using Copy/Paste

Two years ago I blogged about a method to export all the M code for all of your queries in Power Query using the Send A Frown button – useful if you need the code for documentation purposes. This trick doesn’t work with Power BI Desktop, unfortunately, but the good news is that there’s a better way to do this now in Power Query and Power BI Desktop using copy/paste. It’s pretty simple really: when you copy a query from the Power Query or Power BI Desktop Query Editor you can not only paste the query to another Query Editor (pasting from Power Query to Power BI and vice versa works too) but you can also paste the query to a text editor like Notepad and get the M code for the query. What’s more, you can also select more than one query in the Query Editor and when you paste you get all the code for all of the selected queries:

image

Remember that because the properties of each step in a query become comments in your M code, they get copied too.

Thanks to VossF for telling me about this on this thread.

The List.* M Functions And The equationCriteria Argument

Last week a reader left a comment on my blog asking what the third argument of the List.Contains() M function (somewhat cryptically called equationCriteria) does. I didn’t know, so I did some research and found out that lots of the List.* functions have the same argument. The documentation for List.Distinct() has a few examples but no real explanation of how they work. It also says:

For more information about equationCriteria, see Parameter Values.

…but there’s no link or indication where the documentation on ‘Parameter Values’ can be found. A bit more digging led me to the bottom of this page:

Equation criteria

Equation criteria for list values can be specified as either a

  • A function value that is either

    • A key selector that determines the value in the list to apply the equality criteria, or

    • A comparer function that is used to specify the kind of comparison to apply. Built in comparer functions can be specified, see section for Comparer functions.

  • A list value which has

    • Exactly two items

    • The first element is the key selector as specified above

    • The second element is a comparer as specified above.

Still not exactly helpful, is it? After a bit of time testing, though, I think I’ve worked out how what’s possible with the equationCriteria argument and this blog post will, I hope, help any future M coders who are struggling with the same question. Let’s see some examples…

The basics

First of all, the basics. The following expression using List.Contains() returns TRUE because the text value “apples” appears in the list {“apples”, “oranges”, “pears”}:

List.Contains({"apples", "oranges", "pears"}, "apples")

image

The following returns FALSE because the text value “grapes” does not appear in the list {“apples”, “oranges”, “pears”}:

List.Contains({"apples", "oranges", "pears"}, "grapes")

image

However there are lots of different ways that text values can be compared and the equationCriteria argument allows you to specify which rules to follow.

Case sensitivity and culture

If you’ve written any M code you’ll know that it is case sensitive. As a result, the following returns FALSE:

List.Contains({"apples", "oranges", "pears"}, "Apples")

What happens if you want to do a case-insensitive comparison though? This is where the Comparer functions come in. The Comparer.FromCulture() function returns a function that compares two values according to the rules of a given culture or locale and optionally ignore case, and can be used in the equationCriteria argument. The following example returns TRUE:

List.Contains(
	{"apples", "oranges", "pears"}, 
	"Apples", 
	Comparer.FromCulture("en-GB", true)
)

In this case Comparer.FromCulture("en-GB", true) returns a function that compares two values for the English – Great Britain culture (for a full list of culture values, see the Language Tag column of the table on this page); the second, optional argument here makes the function ignore case when making the comparison. The function that Comparer.FromCulture() returns is then used by List.Contains() to make the comparison.

Rather than specify a culture you can also use the Culture.Current function to return the current system culture. For me, Culture.Current returns the value “en-GB” because I live in Great Britain and have my PC configured to use a British English locale:

image

The following example shows how Culture.Current can be used with Comparer.FromCulture and also returns TRUE, at least for me:

List.Contains(
	{"apples", "oranges", "pears"}, 
	"Apples", 
	Comparer.FromCulture(
		Culture.Current, 
		true
	)
)

If you’re curious to see an example where different cultures produce different results here’s one I stole from this article on string comparisons and sorting in .NET. Apparently in English the character æ is treated the same as the combination of the two characters ae but this is not the case in Danish. As a result the following returns TRUE:

List.Contains(
	{"aepples", "oranges", "pears"}, 
	"æpples", 
	Comparer.FromCulture(
		"en-GB", 
		true
	)
)

Whereas this returns FALSE:

List.Contains(
	{"aepples", "oranges", "pears"}, 
	"æpples", 
	Comparer.FromCulture(
		"da-DK", 
		true
	)
)

Ordinal comparisons

If you don’t want all the uncertainty of cultures and case sensitivity you can just make an ordinal comparison, which will compare two strings by finding the unicode character value for each character in each string and compare those values. To do this you can use the Comparer.Ordinal() function. The following returns FALSE:

List.Contains(
	{"apples", "oranges", "pears"}, 
	"Apples", 
	Comparer.Ordinal
)

…because “a” is not the same unicode character as “A”, and so “apples” and “Apples” are not treated as equal.

Custom comparer functions

As the documentation hints you can also write your own function to do the comparison. A comparer function is just – as far as I can see – a function that has two arguments and returns a logical value. Here’s an example of a custom function that takes two text values, x and y, and returns true if the first three characters of x are the same as y:

(x as text, y as text)=>Text.Start(x,3)=y

It can be used with List.Contains() as in the following example, which returns TRUE:

List.Contains(
	{"apples", "oranges", "pears"}, 
	"app",
        (x as text, y as text)=>Text.Start(x,3)=y
)

What must be happening here is that the function is called three times, every value in the list {“apples”, “oranges”,”pears”} is being passed to the x argument and for each call “app” is passed to y; because the first three characters of “apples” are “app” the function returns true in this case, so List.Contains() returns true.

Key selectors

If you’re working with a list of records you might only want to do the comparison on one field in the record, and this is what key selectors allow you to do. The following example, which returns TRUE:

List.Contains(
	{[Fruit="apples", Colour="Red"], 
         [Fruit="oranges", Colour="Orange"],
         [Fruit="pears", Colour="Green"]}, 
	[Fruit="apples", Colour="Russet"],
        each [Fruit]  
)

…does so because it only compares the Fruit field in each record, and the Fruit fields in [Fruit=”apples”, Colour=”Red”] and [Fruit=”apples”, Colour=”Russet”] are indeed the same. However the following example returns FALSE:

List.Contains(
	{[Fruit="apples", Colour="Red"], 
         [Fruit="oranges", Colour="Orange"],
         [Fruit="pears", Colour="Green"]}, 
	[Fruit="apples", Colour="Russet"],
        each [Colour]  
)

…because the Colour “Russet” does not appear anywhere in the Colour field of any of the records in the first parameter.

Combining key selectors and comparison functions

Finally, as the documentation suggests, you can combine the above methods of comparison by passing a list containing two items to equationCriteria: the first item in the list must be a key selector, the second must be a comparer function. For example, the following returns TRUE:

List.Contains(
	{[Fruit="apples", Colour="Red"], 
         [Fruit="oranges", Colour="Orange"],
         [Fruit="pears", Colour="Green"]}, 
	[Fruit="Apples", Colour="Russet"],
        {each [Fruit], Comparer.FromCulture("en-GB", true)} 
)

…because it only looks at the Fruit field of each record, and it does a case-insensitive comparison using the en-GB culture, so “apples” and “Apples” are equal.

Record.AddField(), Functions And The Delayed Option In M

Today I was looking at the Record.AddField() M function and saw it had a mysterious – and badly documented – optional fourth argument called delayed. Of course I had to work out what it did, and having done so I thought I’d blog about it in case someone found it useful (and however obscure the topic, there’s always someone out there who finds this stuff useful).

Imagine you have an M function called ReturnANumber that has no arguments at all and returns the number 999:

() as number => 999

image

Now imagine that you want to return this function in a record (which is exactly what #shared does). The following expression:

[MyFunction = ReturnANumber]

…returns a record with one field whose value is of type function:

image

But what if you wanted the field to contain the number that the function returns, not a reference to the function itself? The delayed option of Record.AddField() allows you to do this: if you set it to true, you get the value the function returns.

Record.AddField([], "MyFunction", ReturnANumber, true)

image

Setting delayed to false does the same as the first example above:

Record.AddField([], "MyFunction", ReturnANumber, false)

image

Another way to get the same result as setting delayed to true is to use Function.Invoke():

[MyFunction = Function.Invoke(ReturnANumber,{})]

image

Now I need to think of a real-world use for this…

Web.Contents(), Caching And The ExcludedFromCacheKey Option In Power BI And Power Query

When you’re using the Web.Contents() M function to call a web service from Power Query or Power BI, you don’t necessarily get one HTTP request each time you call the function: some caching takes place, so that if you make the same request multiple times your query won’t waste time asking for the same data over and over. In this post I’m going to share the results of some tests I made to show how caching works with Web.Contents() and what factors influence it.

For my tests I built a simple web service in Microsoft Flow, similar to the one I blogged about here, that accepts a HTTP POST request and calls a stored procedure in an Azure SQL Database. The stored procedure then updates a table in the database and this in turn allows me to count the number of times the web service is called. Finally, the web service returns the value 0 if the stored procedure has executed successfully.

This web service can then be called from either Power Query or Power BI using the Web.Contents() function, something like this (because the URL for the web service is very long I stored it in a parameter called WebServiceURL):

let
    Source = Web.Contents(
	WebServiceURL,
	[Content=Text.ToBinary("Hello")]
	),
    #"Imported JSON" = Json.Document(Source,1252)
in
    #"Imported JSON"

The output of the query when run in Power Query and loaded to an Excel table is this:

image

The first important thing to point out is that the above query, when refreshed in the latest versions of Power Query (I’m running Excel 2016 build 7571.2109) and Power BI (build 2.41.4581.361- November 2016 release), results in a single call to the web service. It might seem like I’m stating the obvious but in the past I’ve seen plenty of cases where a data source has been queried multiple times by Power Query/Power BI even if I was only expecting it to be queried once.

Now, let’s look at a query that calls this web service several times. Here’s the query above converted to a function called fnCallWebService:

() => 
let
    Source = Web.Contents(
	WebServiceURL,
	[Content=Text.ToBinary("Hello")]),
    #"Imported JSON" = Json.Document(Source,1252)
in
    #"Imported JSON"

Here’s a query that calls this function once for each row of the following table:

image

let
    Source = Excel.CurrentWorkbook(){[Name="MyTable"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(
	Source,
	{{"Row", Int64.Type}}),
    #"Invoked Custom Function" = Table.AddColumn(
	#"Changed Type", 
	"fnCallWebService", 
	each fnCallWebService())
in
    #"Invoked Custom Function"

In the query above I used the Invoke Custom Function button to call the function for each row in the source table and put the value returned by the function in a new column. The output is this:

image

Even though the function is called four times, once for each row in the original table, that does not mean that the web service is called four times – it isn’t, it’s only called once. In this case Power BI/Power Query knows that each of the four calls to the function is making an identical request to the web service and so it only goes to the web service once, and thereafter uses a cached result the other three times.

One way to stop this caching from taking place is to add an HTTP header to the request to the web service and pass a different value to that header for each call. Here’s another version of my function, now called fnCallWebServiceWithHeaders, which this time takes a number as a parameter and then passes that number to the web service via a header called MyHeader:

(RowNum as number) => let
    Source = Web.Contents(
	WebServiceURL,
	[Content=Text.ToBinary("Hello"), 
	Headers=[MyHeader=Text.From(RowNum)]]),
    #"Imported JSON" = Json.Document(Source,1252)
in
    #"Imported JSON"

Now if I call this function for each row of the table, and for each call pass the value in the [Row] column through to the function like so:

image

let
    Source = Excel.CurrentWorkbook(){[Name="MyTable"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(
	Source,
	{{"Row", Int64.Type}}),
    #"Invoked Custom Function" = Table.AddColumn(
	#"Changed Type", 
	"fnCallWebServiceWithHeaders", 
	each fnCallWebServiceWithHeaders([Row]))
in
    #"Invoked Custom Function"

…the web service gets hit four times. The presence of a different value for the MyHeader header in each request is enough to prevent any caching from taking place.

It is possible, however, to get Power BI/Power Query to ignore one or more headers when working out whether caching should take place using the ExcludedFromCacheKey option in Web.Contents(). Here’s one more version of my function, now called fnCallWebServiceWithHeadersExlCache, which uses this option:

(RowNum as number) => 
let
    Source = Web.Contents(
	WebServiceURL,
	[Content=Text.ToBinary("Hello"), 
	Headers=[MyHeader=Text.From(RowNum)], 
	ExcludedFromCacheKey={"MyHeader"}]),
    #"Imported JSON" = Json.Document(Source,1252)
in
    #"Imported JSON"

The ExcludedFromCacheKey option takes a list of text values which represent the names of headers that are to be ignored when considering which requests can be cached. In the example above my list contains just the one header, MyHeader, and when this is used in a query like so:

let
    Source = Excel.CurrentWorkbook(){[Name="MyTable"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(
	Source,
	{{"Row", Int64.Type}}),
    #"Invoked Custom Function" = Table.AddColumn(
	#"Changed Type", 
	"fnCallWebServiceWithHeaders", 
	each fnCallWebServiceWithHeadersExlCache([Row]))
in
    #"Invoked Custom Function"

…even though the function is called four times, once for each row in the table, and even though each time Web.Contents() is called it is with a different value passed through to the MyHeader header, only one request is made to the web service and the three subsequent requests are answered from cache again.

In summary, if you’re calling a web service multiple times in a query and especially if you’re using the Headers option in Web.Contents(), this is important stuff to understand because caching can make a big difference to the performance of your queries.

First Thoughts On The Integration Of Power Query/M Into Analysis Services Tabular

Last Friday’s big news was the release of the first CTP for Analysis Services v.next. Among several major new pieces of functionality (Ragged hierarchies! Drillthrough that works properly, even for calculations! Table-level security!) probably the biggest is the integration of Power Query/M into Analysis Services. As you can probably guess, I’m incredibly pleased that my two favourite technologies have got together. The technical details are given in this blog post, which I suggest you read if you haven’t done so already, but what I think is missing is an explanation of why this is so important and what kind of opportunities it opens up – hence this post. Of course this is just my take on the subject and not what Microsoft may actually thinking; it’s also very early days, so as the functionality develops and I have more chance to think about this my opinions may change. If you have any ideas on this subject I would be interested to hear them so please leave a comment!

Why this had to happen: Power BI

There is an obvious reason why Microsoft decided to integrate Power Query/M into SSAS, and that is because it needs to support the conversion of Power BI models into Analysis Services Tabular models. There are two scenarios where this will be necessary.

The first is the ability to convert a Power BI model into an Azure Analysis Services Tabular model (listed as ‘planned’ here), something that will be a key selling point for Azure Analysis Services when it releases. The engine behind Power BI is essentially the same as the one used in Analysis Services so migrating the data model should be straightforward, but since Power BI uses Power Query/M to load data then a migrated Azure Analysis Services model will also have to use Power Query/M.

The second scenario is similar to the first. We now know that on-premises Power BI will be delivered through Reporting Services v.next, and it’s reasonable to assume Reporting Services will need a database engine to store the data for published Power BI reports. That engine will have to be an Analysis Services instance of some kind (either standalone or running in-process inside Reporting Services) and again for that to work Analysis Services will have to support the same data access mechanisms as Power BI.

Better support for a larger number of data sources

I’ve just argued why Microsoft was obliged to include this functionality in SSAS v.next but in fact there are many positive reasons for doing this too. The most obvious one is to do with support for more data sources. At the moment SSAS Tabular supports a pretty good range of data sources, but the world of BI is getting more and more diverse and in order to stay relevant SSAS needs to support far more than it does today. By using Power Query/M as its data access mechanism, SSAS v.next will immediately support a much larger number of data sources and this number is going to keep on growing: any investment that Microsoft or third parties make for Power BI in this area will also benefit SSAS. Also, because Power Query/M can query and fold to more than just relational databases, I suspect that in the future this will allow for DirectQuery connections to many of these non-relational data sources too.

Different data sources for partitions in the same table

Another benefit of this change is that we’ll have a lot more flexibility with partitioning tables in an SSAS Tabular model. As the blog post says:

As long as a partition’s M query adheres to the column mappings of the table, you are free to perform any transformations and pull in data from any data source defined in the model.

In SSAS 2016 the partitions in a table all have to get data from the same data source whereas in v.next we’ll be able to get data from different data sources in different partitions, and this opens up some interesting new possibilities. For example, I can imagine a simple budgeting application where the partitions in a table get data from different Excel workbooks stored in OneDrive for Business, and where the each partition gets processed automatically when changes are saved to one of these workbooks.

Does this replace SSIS and my data warehouse? 

The short answer is no. Power Query/M is not a full-featured ETL tool and I don’t think it ever will be; it certainly does not have the kind of functionality needed to perform enterprise-level ETL right now. My view is that Microsoft have built Power Query/M into SSAS for the reasons above and not to encourage enterprise SSAS users to do their own quick-and-dirty ETL when loading data (although there is a risk that that will happen anyway). That said, I think the dividing line between corporate and self-service BI will become increasingly blurred over the next few years as the Microsoft BI stack develops, and we’ll see Analysis Services being used in self-service scenarios as well as the more traditional corporate ones.

Centralised data source objects

One last thing to point out is that the way SSAS v.next makes a distinction between data sources and other queries is very interesting. In Power BI and Power Query it’s easy to end up with data source connection information duplicated across multiple queries unless you know what you’re doing, and this can cause no end of problems later on in a project. As far as I can see, in SSAS v.next a “data source object” is an M query that only contains the connection to external data, while all other queries have to reference a data source to be able to access external data. This means, as the blog post says:

Referring to data source objects helps to centralize data source settings for multiple queries and simplifies deployments and maintenance if data source definitions must be updated later on. When updating a data source definition, all M queries that refer to it automatically use the new settings.

I wonder whether this concept is coming to Power BI and Power Query at some point? I hope so – it makes a lot of sense.

Passing Parameters To SQL Queries With Value.NativeQuery() In Power Query And Power BI

I first came across the Value.NativeQuery() M function about six months ago, but it didn’t do anything useful then so I didn’t blog about it. I checked it again recently though and now it does something very handy indeed: it allows you to pass parameters to SQL queries. Before this, if you wanted to use parameters in your SQL, you had to do some nasty string manipulation in the way Ken Puls shows here. Now, with Value.NativeQuery(), you can handle SQL query parameters properly in M.

Here’s a simple example that shows how it works, passing two parameters to a SQL query on the Adventure Works DW database:

let
    Source = Sql.Database("localhost", "Adventure Works DW"),
    Test = Value.NativeQuery(
            Source, 
            "SELECT * FROM DimDate 
            WHERE EnglishMonthName=@MonthName 
            AND
            EnglishDayNameOfWeek=@DayName", 
            [MonthName="March", DayName="Tuesday"])
in
    Test

image

Some explanation of what’s happening here:

  • The Source step returns a reference to a SQL Server database, and this is passed to the first parameter of Value.NativeQuery().
  • The second parameter of the Value.NativeQuery() function is the SQL query to be executed. It contains two parameters called @MonthName and @DayName.
  • The parameters to the SQL query are passed using a record in the third parameter of Value.NativeQuery(). Note how the named of the fields in the records match the names of the parameters in the SQL query.

It looks like, eventually, this will be the way that any type of ‘native’ query (ie a query that you write and give to Power Query, rather than a query that is generated for you) is run against any kind of data source – instead of the situation we have today where different M functions are needed to run queries against different types of data source. I guess at some point the UI will be updated to use this function. I don’t think it’s ‘finished’ yet either, because it doesn’t work on Analysis Services data sources, although it may work with other relational data sources – I haven’t tested it on anything other than SQL Server and SSAS. There’s also a fourth parameter for Value.NativeQuery() that can be used to pass data source specific options, but I have no idea what these could be and I don’t think there are any supported for SQL Server. It will be interesting to see how it develops over the next few releases.