Query Folding And Writing Your Own SQL Queries In Power Query/Power BI/Excel Get & Transform

When you connect to a relational database like SQL Server in Power BI/Power Query/Excel Get & Transform you have two choices about how to get the data you need:

  1. You can choose a table from the database and then either use the Query Editor UI or write some M to get the data you need from that table. For example, you might choose a table that has one row for every product that your company sells and then, using the UI, filter that down to only the products that are red.
  2. You can enter a SQL query that gets the data you need.

Something that you might not realise is that if you choose the second option and then subsequently use the UI to apply even more filtering or transformation, then those subsequent steps will not be able to make use of query folding.

As an example of option (1), imagine you connect to the DimProduct table in the SQL Server Adventure Works DW database like so:

image

The following M query is generated by the Query Editor when you filter the table to only return the red products and remove all columns except EnglishProductName. That’s very easy to do so I won’t describe it, but here’s the M:

[sourcecode language=”text”]
let
Source =
Sql.Databases("localhost"),
#"Adventure Works DW" =
Source{
[Name="Adventure Works DW"]
}[Data],
dbo_DimProduct =
#"Adventure Works DW"{
[Schema="dbo",Item="DimProduct"]
}[Data],
#"Filtered Rows" =
Table.SelectRows(
dbo_DimProduct,
each ([Color] = "Red")
),
#"Removed Other Columns" =
Table.SelectColumns(
#"Filtered Rows",
{"EnglishProductName"}
)
in
#"Removed Other Columns"
[/sourcecode]

Using the View Native Query option, you can find out that the following SQL is generated to get this data:

[sourcecode language=”text”]
select [_].[EnglishProductName]
from [dbo].[DimProduct] as [_]
where [_].[Color] = ‘Red’
[/sourcecode]

It’s pretty clear that query folding is taking place for the filter on “red” and for the selection of the required column.

However, if you enter the following SQL query when you first connect to the database:

[sourcecode language=”text”]
select * from dimproduct
[/sourcecode]

And then, after that, filter the table and remove columns in exactly the same way, you get the following M query:

[sourcecode language=”text”]
let
Source =
Sql.Database(
"localhost",
"Adventure Works DW",
[Query="select * from dimproduct"]),
#"Filtered Rows" =
Table.SelectRows(
Source,
each ([Color] = "Red")),
#"Removed Other Columns" =
Table.SelectColumns(
#"Filtered Rows",
{"EnglishProductName"})
in
#"Removed Other Columns"
[/sourcecode]

If you now try to use the View Native Query option on either the Removed Other Columns or Filtered Rows steps you’ll find it’s greyed out, indicating query folding is not taking place for those steps:

The query you enter is run and then Power BI applies the filter and selects the column itself in the resultset that the SQL query returns.

This obviously has big implications for performance. The lesson here is that if you’re going to write your own SQL query in the Query Editor, you should make sure it does all of the expensive filters and transformations you need because anything else you do in the query will happen outside the database in Power BI or Excel.

15 thoughts on “Query Folding And Writing Your Own SQL Queries In Power Query/Power BI/Excel Get & Transform

  1. Hi Chris,

    Could I please ask you to have a look at the issues reported in the threads below? It’s a very interesting issue that I think has to do with query folding vs Power BI native query. Once you get to very large source tables that are accessed via a SQL query, and you have a subsequent merge in Power BI (in my case the merge was with an excel file), every refresh gives different results.

    Power BI seems to issue the query twice and then combines the results. The problem arises if there is no explicit sort in the original query. Power BI assumes an explicit sort exists, and simply combines the first x results returned from QuerySplit1 with the remaining total-x results from QuerySplit2

    As a result, some records are dropped, and others duplicated at random.

    Here is the thread that I created:
    https://community.powerbi.com/t5/Issues/Power-Query-Merge-yields-inconsistent-results/idi-p/119485
    In my report, I linked to other cases where the issue was reported. Subsequent to initially reporting it, I found a way to bypass the problem (as did many other people subsequently), but we all feel that this is an issue that will erode trust in Power BI due to the fact that the query issued is not behaving as expected, and that incorrect results are returned without any error.

    Recently there has been a number of people adding to this thread with the same issue.
    https://community.powerbi.com/t5/Desktop/Each-data-refresh-gives-different-data-results/td-p/60989

    We all would really appreciate your help in bringing this to the attention of someone at Microsoft. I would be very happy to discuss this in more detail if you need further info to replicate

    Thanks very much

    Ina Nortje

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      Hi Ina – I will definitely pass this on. Thanks for letting me know!

    2. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      The dev team have told me they will take a look at it – if I hear anything back, I’ll let you know.

  2. Has anyone had a problem with rounding when dividing columns from SQL database?
    For some reason you get only 6 decimals in result if you want to fold it.
    convert(decimal(38,6), [number])
    If you use Value.Divide, then you get more decimals, bud it doesn’t fold.

    This is my sample https://imgur.com/a/Vq48E6Y

  3. @Milang I had exactly this problem today. There was a Power BI model which was converting between two currencies and one of the exchange rate had 22 decimal points. The calculations were done inside the query editor and because of query folding only 6 decimal places were used resulting in incorrect currency conversion. Since the data was being pulled from a SQL database I wrote a native query to solve the problem to give the correct currency conversion.

  4. Hi @Chris Webb, what would you suggest in the following scenarios , writing own sql, or import table then transform data in power bi:
    I have a few tables with more than 1 million rows, what I need for my report is only limited rows within a date range. I only need the rows in table 1 which are filtered by table 2 condition.
    For instance, there are 1 million rows in Sales table, 3 million rows in Sales item table.
    I only need rows from Sales table which are generated since year 2019, any rows before 2019 should not be imported to Power BI desktop; Also, I only need rows from Sales item table whose Sales is in Sales table.
    Therefore, only sales items is expected to import that Sales.SaleID = SalesItem.SaleID where Sales.Date > = ‘2019-01-01.

    So, should I write my query to achieve, or load both tables first, then merge them together to filter Sales Item table by Sales table?

  5. Excellent solution which, for my Power BI requirement, significantly reduced the time required to retrieve a few dozen rows of data from a View contains 3.3M records!

Leave a ReplyCancel reply