Why Is Power BI Running My SQL Query Twice?

When you import data from a relational database like SQL Server in Power BI you have the option of entering your own SQL query to use as a starting point:

NativeSQL

Here’s the M code for a query that does this:


let
Source = Sql.Database(
"localhost",
"AdventureWorksDW2017",
[Query
= "SELECT [DateKey]#(lf) ,[FullDateAlternateKey]#(lf) ,
[DayNumberOfWeek]#(lf) ,[EnglishDayNameOfWeek]#(lf)
FROM [AdventureWorksDW2017].[dbo].[DimDate]"]
)
in
Source

If you’re confident writing SQL this might seem like a good option, but as I said in this blog post it has the side-effect of disabling query folding inside the Power Query query, so if you add any other transformations they will always be performed inside the Power Query engine – which may be less efficient than performing them in the data source.

There’s also another drawback: when you refresh your dataset in Power BI Desktop (although not in the Power BI Service) you’ll see that your SQL query is run twice. Here’s the evidence from SQL Server Profiler showing what happens when the query above is refreshed in Power BI Desktop:

Profiler

If your query is slow, or if each query execution costs you money, then this is something you want to avoid.

Why is this happening? It turns out this is just another example of what I blogged about here: Power BI wants to know the schema of the table before the query actually runs, so it asks Power Query to return the top 0 rows. Unfortunately, in this case query folding can’t take place and the top 0 filter can’t be pushed back to the database, so the entire query gets run once to get the schema and once to get the data.

The solution is the same as the blog post I just mentioned too: use the Table.View M function to hard-code the schema returned by the query and implement query folding manually. Here’s the adapted version of the new query:


let
Source = Sql.Database(
"localhost",
"AdventureWorksDW2017",
[Query
= "SELECT [DateKey]#(lf) ,[FullDateAlternateKey]#(lf) ,
[DayNumberOfWeek]#(lf) ,[EnglishDayNameOfWeek]#(lf)
FROM [AdventureWorksDW2017].[dbo].[DimDate]"]
),
OverrideZeroRowFilter = Table.View(
null,
[GetType = () => type table[
DateKey = Int32.Type,
FullDateAlternateKey = DateTime.Type,
DayNumberOfWeek = Byte.Type,
EnglishDayNameOfWeek = Text.Type
], GetRows = () => Source, OnTake
= (count as number) =>
if count = 0
then #table(GetType(), {})
else Table.FirstN(Source, count)]
)
in
OverrideZeroRowFilter

Profiler2

Generally speaking, I think there’s a lot to be said for creating views (if possible) instead of embedding your own SQL into a Power BI dataset – it makes maintenance and tuning much easier, and of course if you can connect straight to the view without writing any SQL in Power BI, then query folding will work and Power BI Desktop will only query the view once when you refresh.

26 responses

  1. Thanks Chris, but I have question though. Does the same behaviour occur when using you own MDX query in the AnalysisServices.Database function?

  2. So if this is just a problem on the desktop why should I care? It would only impact the developer, not the report consumers that access it in the service, right?

  3. Thank you.
    I was using a SQL sentence to reduce the data to import by filtering by model parameter (year). I thought it was more efficient than filtering in PowerQuery. Now I see it maybe wasn’t the best approach…

  4. Does this same query behavior also happen when refreshing a dataset or dataflow in the Power BI service?

  5. Are you able to go over in more details the steps with that function to extract the fields in a query so that it can be plugged in? I tried to follow your directions but was not quite getting it. Our company uses almost exclusively SQL queries so that would be a great help Thank you!

      • So I paste that code in and then afterwords right click on the query and say “convert to function”. When I do that there is a dialogue box that pops up that says “The query does not reference any parameters. Are you sure you want to create the function without any parameters. If I say OK there, I don’t really see how to get the function to work to pass the name in. That’s where I’m getting lost. Do you happen to have a sample small PBIX file by chance?

      • You don’t need to right click and “convert to function” – the code is already a function, so when you paste it into a blank query you’ll have a function ready to use. When you click on the new query that returns the function you should see a dropdown box and be able to select any other query to pass in, and when you do this and invoke the function it will create yet another query that returns the results.

  6. OK…I appreciate you getting a sample file. That would be amazing. I don’t see the dropdown box you’re talking about after creating the query but I’ll keep digging. I just don’t get what I’m missing. Thanks again.

  7. Hello Chris,

    I tried to implement what you present. But I am having problems with null values. I get the error “The type of the value does not match the type of the column” in many rows. I used Text.Type and Decimal.Type. How do you manage null values ?

    Thank you.

  8. Hi Chris,

    I have just tried your examples and regarding the first example I have discovered the following:

    Even though Power BI Desktop starts the statement twice, it seems that the first statement is more a “tracer bullet” to make sure that the schema is correct, as the first statement is aborted. In SQL Profiler I included the EventClass SQL:BatchCompleted and the Column Error and it reveals that the first statement returns the error code “2 – Abort”.

    Then afterwards the second statement is executed and it completes with an “0 – OK” and the actual rows are returned.

    So it looks like Power BI Desktop somehow breaks the connection for the first statement. This of course have a price, but not as high as if the result set were returned twice.

    By the way I am using the June 2020 Power BI Desktop – 2.82.5858.1161

    – Your blog posts is very deep and interesting 🙂

    • Thanks! I didn’t catch that. I do know that for other data sources this is a much bigger problem, I guess because Power BI isn’t able to abort queries in the same way.

  9. Here’s a direct proof of Chris’ assertion.

    Create a table on SQL Server:

    CREATE TABLE [dbo].[test](
    [id] [int] IDENTITY(1,1) NOT NULL,
    [timestamp] [datetime2](7) NULL,
    [machine] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    CONSTRAINT [PK_test] PRIMARY KEY CLUSTERED
    (
    [id] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]

    Run this Power Query code:

    Sql.Database(“server”, “database”, [Query=”insert test (machine) select ‘this one’#(lf)select * from test”, CreateNavigationProperties=false])

    This will insert two rows each time.

  10. Hi, Chris. I tried to do this in a Oracle connection but it didn’t work, actually sometime it works but not always. Could you help me understand why? Another thing is that my sql take 5 min just to connect, what can it be?

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: