Query Folding And Writing Your Own SQL Queries In Power Query/Power BI/Excel Get & Transform

When you connect to a relational database like SQL Server in Power BI/Power Query/Excel Get & Transform you have two choices about how to get the data you need:

  1. You can choose a table from the database and then either use the Query Editor UI or write some M to get the data you need from that table. For example, you might choose a table that has one row for every product that your company sells and then, using the UI, filter that down to only the products that are red.
  2. You can enter a SQL query that gets the data you need.

Something that you might not realise is that if you choose the second option and then subsequently use the UI to apply even more filtering or transformation, then those subsequent steps will not be able to make use of query folding.

As an example of option (1), imagine you connect to the DimProduct table in the SQL Server Adventure Works DW database like so:

image

image

The following M query is generated by the Query Editor when you filter the table to only return the red products and remove all columns except EnglishProductName. That’s very easy to do so I won’t describe it, but here’s the M:

let
    Source = 
	Sql.Databases("localhost"),
    #"Adventure Works DW" = 
	Source{
		[Name="Adventure Works DW"]
	}[Data],
    dbo_DimProduct = 
	#"Adventure Works DW"{
		[Schema="dbo",Item="DimProduct"]
	}[Data],
    #"Filtered Rows" = 
	Table.SelectRows(
		dbo_DimProduct, 
		each ([Color] = "Red")
	),
    #"Removed Other Columns" = 
	Table.SelectColumns(
		#"Filtered Rows",
		{"EnglishProductName"}
	)
in
    #"Removed Other Columns"

image

Using the View Native Query option, you can find out that the following SQL is generated to get this data:

select [_].[EnglishProductName]
from [dbo].[DimProduct] as [_]
where [_].[Color] = 'Red'

image

image

It’s pretty clear that query folding is taking place for the filter on “red” and for the selection of the required column.

However, if you enter the following SQL query when you first connect to the database:

select * from dimproduct

image

And then, after that, filter the table and remove columns in exactly the same way, you get the following M query:

let
    Source = 
	Sql.Database(
		"localhost", 
		"Adventure Works DW", 
		[Query="select * from dimproduct"]),
    #"Filtered Rows" = 
	Table.SelectRows(
		Source, 
		each ([Color] = "Red")),
    #"Removed Other Columns" = 
	Table.SelectColumns(
		#"Filtered Rows",
		{"EnglishProductName"})
in
    #"Removed Other Columns"

If you now try to use the View Native Query option on either the Removed Other Columns or Filtered Rows steps you’ll find it’s greyed out, indicating query folding is not taking place for those steps:

image

The query you enter is run and then Power BI applies the filter and selects the column itself in the resultset that the SQL query returns.

This obviously has big implications for performance. The lesson here is that if you’re going to write your own SQL query in the Query Editor, you should make sure it does all of the expensive filters and transformations you need because anything else you do in the query will happen outside the database in Power BI or Excel.

Exporting Power Query/M Queries To ODC Files In Excel 2016 Get & Transform

A really useful new feature was added to Get & Transform (the functionality previously known as Power Query) in the latest updates for the Office 365 click-to-run version of Excel 2016: the ability to export and import ODC files containing M queries. This makes sense given that Get & Transform is the new default way for loading data into Excel, but it’s nonetheless very welcome.

It’s very straightforward to use, and all the details are available in the section on “How do I get data from recently used sources, Office Database Connection (ODC) files, other workbook connections, or tables?” halfway down this article:

https://support.office.com/en-us/article/Unified-Get-Transform-ad78befd-eb1c-4ea7-a55d-79d1d67cf9b3?ui=en-US&rs=en-US&ad=US&fromAR=1

You just need to right-click on your query in the Queries & Connections pane to export it:

image

…and after that you can import the ODC file in the normal way when you want to create a new connection.

As always, I want more though. Some ideas/requests:

  • Power BI Desktop should be able to import and export ODC files in this format too: it would make it much easier to reuse queries. Vote here if you agree.
  • Power BI should have a central repository for Power BI and Excel users to store these ODC files for easy sharing and reuse by report developers. It’s a shame that the Azure Data Catalog integration with Excel/Power BI hasn’t had any love recently because that would have been the obvious place to create such a repository.
  • We also desperately need some kind of source control for M queries inside Excel and Power BI (not quite the same requirement as the previous point). I know a lot of people are doing this manually with services like Git, but I would love to be able to check my code in and out directly from the Query Editor.

I’ve also noticed that the old “Load To” dialog (that I found incredibly confusing) has been replaced by the standard Excel Import Data dialog in this release – another improvement. After you hit Close & Load in the Query Editor, this is what you now see:

image

Data Privacy Settings In Power BI/Power Query, Part 2: Preventing Query Execution

In part 1 of this series I showed how the data privacy settings in Excel Power Query/Get & Transform and Power BI could impact the performance of your queries. In this post I’m going to show you how they can stop a query from running at all.

Let’s say you have the Excel workbook from part 1 of this series, but now instead of using the day name to filter data from a SQL Server table you want to pass that value to a web service. The web service I’m going to use for my examples is one that allows you to search for open data published by the UK government on https://data.gov.uk. It’s very simple: you give it a search term and it returns a JSON document with the search results in, no authentication or anything else required. For example:

https://data.gov.uk/api/3/action/package_search?q=Friday

In fact it doesn’t really matter what it does, just know that it is a web service that I can pass a text parameter to and get a result from.

Here’s a query that reads a single piece of text from the FilterDay table in my Excel workbook:

image_thumb2

…and then passes that value to the web service:

let
    ExcelSource = 
	Excel.Workbook(
		File.Contents("C:\FilterParameter.xlsx")
	, null, true),
    FilterDay_Table = 
	ExcelSource{[Item="FilterDay",Kind="Table"]}[Data],
    ChangedType = 
	Table.TransformColumnTypes(
		FilterDay_Table,
		{{"Parameter", type text}}
		),
    Day = ChangedType{0}[#"Parameter"],
    Output = 
	Web.Contents(
		"https://data.gov.uk/api/3/action/package_search", 
		[Query=[q=Day]]
	),
    ImportedJSON = Json.Document(Output,65001)
in
    ImportedJSON 

This query succeeds if any of the following conditions are true:

  • The data privacy levels of both the Excel workbook and the web service are set to Public
  • The data privacy levels of both the Excel workbook and the web service are set to Organizational
  • The data privacy level of the Excel workbook is set to None and the data privacy level of the web service is set to Public

[See here to find out how to set privacy levels for a data source. Interestingly the data privacy level of the web service cannot be set to None for this query – the UI always prompts for it to be set before the query will run]

Here’s the output of a successful run:

image

Any other combinations of data privacy settings, for example if both the Excel workbook and the web service are set to Private, result in the following error message:

Formula.Firewall: Query ‘WebFunctionSucceeds’ (step ‘ImportedJSON’) is accessing data sources that have privacy levels which cannot be used together. Please rebuild this data combination.

image

While it’s ok to send data from one Public data source to another Public data source, or from one Organizational data source to another Organizational data source, you cannot send data from one Private data source to any other data source, or even send data from a Public data source to a Private data source.

In the example in my previous post when the engine found it wasn’t allowed to send data from one source to another because of the data privacy rules used it was still able to run the query, but had to do so in a less efficient way. In this example there is no way to run this query without sending data from the Excel workbook to the web service – you can’t call this web service without sending a search term to it. As a result, if incompatible data privacy levels are set then the query returns the error shown.

Notice that in the query above I’m reading data from Excel and sending it to the web service in a single M query. This is deliberate! In the next post in this series I’ll be looking at examples where the engine can’t work out what it’s supposed to do and errors, even if the privacy levels used suggest the query should run.

Referencing Individual Cell Values From Tables In Power Query

[This blog post is relevant to Power Query in Excel 2010/2013, the Get & Transform section on the Data tab in Excel 2016, and the Get Data screen in Power BI Desktop. I’m going to use the term ‘Power Query’ in this post to refer to all of the previously mentioned functionality]

Sometimes, when you’re working with a table of data in Power Query, you want to be able to get the value from just one cell in that table. In this blog post I’ll show you how you can do this both in the UI and in M code, and talk through all of the more advanced options available in M. Incidentally this is a topic I covered in some detail in the M chapter of my Power Query book, but since that book is now somewhat out-of-date I thought it was worth covering again in a blog post.

Referencing Cell Values In The UI

Imagine your data source is an Excel table that looks like this:

image

If you import it into Power Query, and you want to get the value in the cell on the second row in ColumnB, then in the Query Editor window you just need to right-click on that cell and select Drill Down:

image

…and bingo, you have the value 5 returned:

image

Note that this is the value 5, and not the value 5 in a cell in a table – a Power Query query can return a value of any data type, and in this case it’s going to return a single integer value rather than a value of type table. If you load the output of this query into Excel you’ll still see it formatted as a table, but if you’re using the output of this query as an input for another query (for example, you might want to read a value from Excel and use that value as a filter in a SQL query) it’s much more convenient to have an integer value than a table with one column and one row.

Referencing Cell Values in M

You can see in the screenshot above the M code generated for the Drill Down by the UI, and probably guess how it works. Here’s a cleaned-up version of the query from the previous section for reference:

let
    Source = Excel.CurrentWorkbook(){[Name="SourceData"]}[Content],
    ChangeDataTypes = Table.TransformColumnTypes(
                                        Source,
                                        {{"ColumnA", Int64.Type}, 
                                         {"ColumnB", Int64.Type}, 
                                         {"ColumnC", Int64.Type}}),
    GetMiddleCell = ChangeDataTypes{1}[ColumnB]
in
    GetMiddleCell

There are three steps here:

  • Source reads the data from the Excel table
  • ChangeDataTypes sets the data types for the three columns in the table to be Whole Number
  • GetMiddleCell returns the value from the middle cell of the table returned by the ChangeDataTypes step

M allows you to refer to individual values in tables by a system of co-ordinates using the name of the column and the zero-based row number (tables in Power Query are always assumed to be sorted, therefore it making it possible to ask for a value from the nth row). So the expression
ChangeDataTypes{1}[ColumnB]

returns the value from the cell on the second row in the column called ColumnB of the table returned by ChangeDataTypes, which is 5. Similarly, the expression

ChangeDataTypes{0}[ColumnC]
returns the value 3, which is the value in the column ColumnC on the first row.

It’s also worth pointing out that the row and column reference can be in any order, so the expression

ChangeDataTypes{1}[ColumnB]

…returns the same value as

ChangeDataTypes[ColumnB]{1}

As you’ll see in a moment, the order that the row and column reference come in could be important.

Referring To Rows Or Columns That Don’t Exist

What happens if you write an expression that refers to a row and/or column that doesn’t exist? You get an error of course! So using our example query, the expressions

ChangeDataTypes{4}[ColumnB]

and

ChangeDataTypes{1}[ColumnD]

…will both return errors because there isn’t a fifth row in the table, and there isn’t a column called ColumnD.
image

However, instead of an error you can return a null value by using the ? operator after the reference. For example, the expression

ChangeDataTypes{1}[ColumnD]?

returns the value null instead of an error:

image

You have to be careful though! The expression

ChangeDataTypes{4}?[ColumnB]

still returns an error, not because there isn’t a fifth row but because the reference to the fifth row returns a null value and there is no column called ColumnB in this null value.

image

The solution here is to reverse the order of the references, like so:

ChangeDataTypes[ColumnB]{4}?

or even better use ? with both references:

ChangeDataTypes{4}?[ColumnB]?

image

Unfortunately using ? won’t stop you getting an error if you use a negative value in a row reference.

The Effect Of Primary Keys

Did you know that a table in Power Query can have a primary key (ie a column or columns whose values uniquely identify each row) defined on it? No? I’m not surprised: it’s not at all obvious from the UI. However there are several scenarios where Power Query will define a primary key on a table, including:

  • When you import data from a table in a relational database like SQL Server, and that table has a primary key on it
  • When you use the Remove Duplicates button to remove all duplicate values from a column or columns, which behind the scenes uses the Table.Distinct() M function
  • When you use the Table.AddKey() M function on a table

The presence of a primary key affects how the Drill Down functionality works, and gives you another way of referencing individual cells.

Consider the following Excel table, basically the same as the table you’ve seen already but with a new column that uniquely identifies each row:

image

If you load the table into Power Query and then right-click on the column MyKeyColumn and select Remove Duplicates, you will have set this key as a primary key:

image

(By the way, you can use the Table.Keys() function to see what keys are defined on a table in Power Query, and there’s an example query showing how to use this in the sample workbook for this post).

Having done this, if you use the Drill Down functionality to get the value from the second row in ColumnB once again, this time you’ll get a query something like this:

image

let
    Source = Excel.CurrentWorkbook(){[Name="SourceData3"]}[Content],
    ChangedDataTypes = Table.TransformColumnTypes(
                                      Source,
                                      {{"MyKeyColumn", type text}, 
                                      {"ColumnA", Int64.Type}, 
                                      {"ColumnB", Int64.Type}, 
                                      {"ColumnC", Int64.Type}}),
    RemovedDuplicates = Table.Distinct(
                                     ChangedDataTypes, 
                                     {"MyKeyColumn"}),
    SecondRow = RemovedDuplicates{[MyKeyColumn="SecondRow"]}[ColumnB]
in
    SecondRow

The last step is the important one to look at. The row in the reference is no longer by the row number but by the value from the primary key column instead:

RemovedDuplicates{[MyKeyColumn=”SecondRow”]}[ColumnB]

image

You can still use the previous row number-based notation, but when a table has a primary key column defined on it you can also use a value from the primary key column to identify a row.

A Last Warning About Performance

Being able to reference individual values like this is incredibly useful for certain types of query and calculation. However, bear in mind that there are often many different ways of solving the same problem and not all of them will perform as well as each other. One obvious use of the techniques I’ve shown in this post would be to write a previous period growth calculation, where you need to refer to a value in a previous row in a table – but my experience is that writing calculation using row and column references prevents query folding and leads to poor performance, and an alternative approach (maybe like the ones shown here and here involving joins) often performs much better. There aren’t any general rules I can give you though, you just need to make sure you test thoroughly.

You can download the sample workbook for this post here.