Power BI · Power Query

Power BI/Power Query And Nullable Columns

Recently I’ve been asked by colleagues with various different types of performance problems why Power BI is generating SQL in a particular way, and the answer has been the presence of nullable columns in the underlying database – whether it’s SQL Server, Snowflake or Databricks. Now I’m not a DBA or any kind of database tuning expert so I can’t comment on why a SQL query performs the way it does on any given platform, but what I can do is show you two examples of how the presence of nullable columns changes the way Power BI and Power Query generate SQL.

Consider the following table in a SQL Server table with a single, integer column that does not allow null values:

If you connect to this table in DirectQuery mode, drag the MyNumber field into a card in a Power BI report and select the Distinct Count aggregation type:

…here’s the TSQL that is generated:

SELECT 
COUNT_BIG(DISTINCT [t0].[MyNumber])
 AS [a0]
FROM 
(
(
select [$Table].[MyNumber] as [MyNumber]
from [dbo].[NotNullableColumn] as [$Table]
)
)
 AS [t0] 

Now if you do the same thing with a table that is identical in all respects but where the MyNumber column does allow null values:

…here’s the TSQL that Power BI generates:

SELECT 
(COUNT_BIG(DISTINCT [t1].[MyNumber]) 
+ MAX(CASE WHEN [t1].[MyNumber] IS NULL THEN 1 ELSE 0 END))
 AS [a0]
FROM 
(
(
select [$Table].[MyNumber] as [MyNumber]
from [dbo].[NullableColumn] as [$Table]
)
)
 AS [t1] 

Notice the extra code in the third line of this second query that has been added to handle the possible presence of null values.

It’s not just when you’re using DirectQuery mode that you can see a difference. Let’s say you’re using Import mode and you take each of these tables and join them to themselves in the Power Query Editor like so:

Here’s the M code for this query:

let
  Source = Sql.Databases("localhost"),
  FoldingTest = Source
    {[Name = "FoldingTest"]}
    [Data],
  dbo_NotNullableColumn = FoldingTest
    {
      [
        Schema = "dbo",
        Item   = "NotNullableColumn"
      ]
    }
    [Data],
  #"Merged Queries" = Table.NestedJoin(
    dbo_NotNullableColumn,
    {"MyNumber"},
    dbo_NotNullableColumn,
    {"MyNumber"},
    "dbo_NotNullableColumn",
    JoinKind.Inner
  ),
  #"Expanded dbo_NotNullableColumn"
    = Table.ExpandTableColumn(
    #"Merged Queries",
    "dbo_NotNullableColumn",
    {"MyNumber"},
    {"dbo_NotNullableColumn.MyNumber"}
  )
in
  #"Expanded dbo_NotNullableColumn"

Joining the table with the not nullable column to itself folds and results in the following TSQL query being generated:

select [$Outer].[MyNumber] as [MyNumber],
    [$Inner].[MyNumber2] as [dbo_NotNullableColumn.MyNumber]
from [dbo].[NotNullableColumn] as [$Outer]
inner join 
(
    select [_].[MyNumber] as [MyNumber2]
    from [dbo].[NotNullableColumn] as [_]
) as [$Inner] on ([$Outer].[MyNumber] = [$Inner].[MyNumber2])

If you do the same thing with the table with the nullable column, here’s the TSQL that is generated:

select [$Outer].[MyNumber] as [MyNumber],
    [$Inner].[MyNumber2] as [dbo_NullableColumn.MyNumber]
from [dbo].[NullableColumn] as [$Outer]
inner join 
(
    select [_].[MyNumber] as [MyNumber2]
    from [dbo].[NullableColumn] as [_]
) as [$Inner] on ([$Outer].[MyNumber] = [$Inner].[MyNumber2] 
or [$Outer].[MyNumber] is null and [$Inner].[MyNumber2] is null)

Once again you can see how the SQL generated for an operation on a nullable column is different to the SQL generated for an operation on a non-nullable column. Whether one SQL query performs significantly better or worse than the other is something you need to test.

The last thing to say is that there is no supported way in Power BI or Power Query to treat a nullable column as if it was not nullable. If you have a nullable column and the extra SQL to handle those nulls results in a performance problem then your only option is to alter the design of your table and make the column not nullable.

7 thoughts on “Power BI/Power Query And Nullable Columns

  1. Hi Chris
    We are working with Snowflake and have just encountered this problem.
    Joining a table with seven million rows to a table with 30 million rows via inner join.
    Writing a simple INNER JOIN on X.a = Y.B in SQL results in a query that takes a couple of seconds to execute in Snowflake.
    Power BI insists on generating this strange join SQL – and as a result the query takes a massive amount longer to execute. I cancelled it after 15 minutes – based on the rate of execution so far (from the Snowflake query profile) it would have taken days to complete.
    I suspect the result would be the same in SQL Server (would need to test to be certain). Database engines are generally highly optimised for simple joins via hash join algorithms that have been a standard DW join algorithm for decades – and including OR conditions in the criteria generally blocks the use of a hash join.
    This seems like a poorly thought out implementation in Power BI. I am amazed there isn’t a simple check box that says “Also match null values – may be much slower”.
    Generally speaking, anyone who has worked with SQL/databases would expect an inner join to exclude null values (null should not match null).
    I can see why Microsoft have implemented the SQL the way they have (they want to treat null like any other value so it joins) but it causes massive pain for database engines. It seems like a feature that has been implemented naively without appreciating the performance pain it will cause.
    Chris

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      Can you post an example of the SQL you’re getting and the SQL you think should be generated?

  2. Hello cbailiss,

    I encountered the same problem on gcp. would you have found a solution to force powerbi to generate a simple inner join without doing a cross join with the addition of this piece of sql code OR (X.a IS NULL AND Y.b IS NULL) ?

    Best regards

Leave a ReplyCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.