Why Is Power BI Running My SQL Query Twice?

When you import data from a relational database like SQL Server in Power BI you have the option of entering your own SQL query to use as a starting point:

NativeSQL

Here’s the M code for a query that does this:


let
Source = Sql.Database(
"localhost",
"AdventureWorksDW2017",
[Query
= "SELECT [DateKey]#(lf) ,[FullDateAlternateKey]#(lf) ,
[DayNumberOfWeek]#(lf) ,[EnglishDayNameOfWeek]#(lf)
FROM [AdventureWorksDW2017].[dbo].[DimDate]"]
)
in
Source

If you’re confident writing SQL this might seem like a good option, but as I said in this blog post it has the side-effect of disabling query folding inside the Power Query query, so if you add any other transformations they will always be performed inside the Power Query engine – which may be less efficient than performing them in the data source.

There’s also another drawback: when you refresh your dataset in Power BI Desktop (although not in the Power BI Service) you’ll see that your SQL query is run twice. Here’s the evidence from SQL Server Profiler showing what happens when the query above is refreshed in Power BI Desktop:

Profiler

If your query is slow, or if each query execution costs you money, then this is something you want to avoid.

Why is this happening? It turns out this is just another example of what I blogged about here: Power BI wants to know the schema of the table before the query actually runs, so it asks Power Query to return the top 0 rows. Unfortunately, in this case query folding can’t take place and the top 0 filter can’t be pushed back to the database, so the entire query gets run once to get the schema and once to get the data.

The solution is the same as the blog post I just mentioned too: use the Table.View M function to hard-code the schema returned by the query and implement query folding manually. Here’s the adapted version of the new query:


let
Source = Sql.Database(
"localhost",
"AdventureWorksDW2017",
[Query
= "SELECT [DateKey]#(lf) ,[FullDateAlternateKey]#(lf) ,
[DayNumberOfWeek]#(lf) ,[EnglishDayNameOfWeek]#(lf)
FROM [AdventureWorksDW2017].[dbo].[DimDate]"]
),
OverrideZeroRowFilter = Table.View(
null,
[GetType = () => type table[
DateKey = Int32.Type,
FullDateAlternateKey = DateTime.Type,
DayNumberOfWeek = Byte.Type,
EnglishDayNameOfWeek = Text.Type
], GetRows = () => Source, OnTake
= (count as number) =>
if count = 0
then #table(GetType(), {})
else Table.FirstN(Source, count)]
)
in
OverrideZeroRowFilter

Profiler2

Generally speaking, I think there’s a lot to be said for creating views (if possible) instead of embedding your own SQL into a Power BI dataset – it makes maintenance and tuning much easier, and of course if you can connect straight to the view without writing any SQL in Power BI, then query folding will work and Power BI Desktop will only query the view once when you refresh.

29 responses

  1. Thanks Chris, but I have question though. Does the same behaviour occur when using you own MDX query in the AnalysisServices.Database function?

  2. So if this is just a problem on the desktop why should I care? It would only impact the developer, not the report consumers that access it in the service, right?

  3. Thank you.
    I was using a SQL sentence to reduce the data to import by filtering by model parameter (year). I thought it was more efficient than filtering in PowerQuery. Now I see it maybe wasn’t the best approach…

  4. Does this same query behavior also happen when refreshing a dataset or dataflow in the Power BI service?

  5. Are you able to go over in more details the steps with that function to extract the fields in a query so that it can be plugged in? I tried to follow your directions but was not quite getting it. Our company uses almost exclusively SQL queries so that would be a great help Thank you!

      • So I paste that code in and then afterwords right click on the query and say “convert to function”. When I do that there is a dialogue box that pops up that says “The query does not reference any parameters. Are you sure you want to create the function without any parameters. If I say OK there, I don’t really see how to get the function to work to pass the name in. That’s where I’m getting lost. Do you happen to have a sample small PBIX file by chance?

      • You don’t need to right click and “convert to function” – the code is already a function, so when you paste it into a blank query you’ll have a function ready to use. When you click on the new query that returns the function you should see a dropdown box and be able to select any other query to pass in, and when you do this and invoke the function it will create yet another query that returns the results.

  6. OK…I appreciate you getting a sample file. That would be amazing. I don’t see the dropdown box you’re talking about after creating the query but I’ll keep digging. I just don’t get what I’m missing. Thanks again.

  7. Hello Chris,

    I tried to implement what you present. But I am having problems with null values. I get the error “The type of the value does not match the type of the column” in many rows. I used Text.Type and Decimal.Type. How do you manage null values ?

    Thank you.

  8. Hi Chris,

    I have just tried your examples and regarding the first example I have discovered the following:

    Even though Power BI Desktop starts the statement twice, it seems that the first statement is more a “tracer bullet” to make sure that the schema is correct, as the first statement is aborted. In SQL Profiler I included the EventClass SQL:BatchCompleted and the Column Error and it reveals that the first statement returns the error code “2 – Abort”.

    Then afterwards the second statement is executed and it completes with an “0 – OK” and the actual rows are returned.

    So it looks like Power BI Desktop somehow breaks the connection for the first statement. This of course have a price, but not as high as if the result set were returned twice.

    By the way I am using the June 2020 Power BI Desktop – 2.82.5858.1161

    – Your blog posts is very deep and interesting 🙂

    • Thanks! I didn’t catch that. I do know that for other data sources this is a much bigger problem, I guess because Power BI isn’t able to abort queries in the same way.

  9. Here’s a direct proof of Chris’ assertion.

    Create a table on SQL Server:

    CREATE TABLE [dbo].[test](
    [id] [int] IDENTITY(1,1) NOT NULL,
    [timestamp] [datetime2](7) NULL,
    [machine] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    CONSTRAINT [PK_test] PRIMARY KEY CLUSTERED
    (
    [id] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]

    Run this Power Query code:

    Sql.Database(“server”, “database”, [Query=”insert test (machine) select ‘this one’#(lf)select * from test”, CreateNavigationProperties=false])

    This will insert two rows each time.

  10. Hi, Chris. I tried to do this in a Oracle connection but it didn’t work, actually sometime it works but not always. Could you help me understand why? Another thing is that my sql take 5 min just to connect, what can it be?

  11. Dear Chris,

    I have some Power Queries embedded in Excel sheets which I use to do updates and inserts on a Microsoft SQL db. Today I faced an issue with the insert, where the SQL server was complaining that I was attempting to write two records with the same primary key, and I immediately realized that it was Power Query that was running twice my native query against the db.

    A few months ago I did not encounter this issue, and the Excel sheet (with the embedded query) was exactly the same. However, I was using Excel 2016, whereas recently my company switched to Excel 365. Are you aware of any change in the folding algorithm between these two versions of Excel?

    Thanks in advance and best regards,

    Enrico Menotti

    • Using Power Query to do inserts is not supported (I know I blogged about it once, but I think I was clear that you shouldn’t do it in the real world) precisely because of this issue.

  12. Right, I know it’s not supported, and now that you tell me I remember that the first time I googled about Power Query SQL updates I stepped upon that old post of yours. It was clear there’s this double-run issue, and also there’s a similar issue with previews, which run any query just when you move your mouse pointer over the query name in the “Query and connections” tab at the right of the Excel sheet. Not to talk about any edit in the editor and advanced editor, which run the query immediately.

    However, doing an update two times with the same value cannot do any harm to the db, as far as I know (please correct me if I’m wrong). What worries me is instead the chance of updating a field while the record is locked by another user from the ERP GUI; this is something I’ve never investigated on.

    Based on these considerations, I have successfully used Power Query for SQL updates, where in any case the use is just for non-ordinary mass modifications which cannot be done from our ERP GUI, and I’m the only user who does this. Whenever I do such things, I always first test thoroughly on a single record, and only after that go for mass modifications, and at the end always check thoroughly the results. Since my data always come from Excel spreadsheets, this technique is pretty useful and fast, and has the advantage of being quite minimal (it’s based on Excel’s built-in features only). Maybe one may try even with the old Microsoft Query which appears to be still at the basis of Excel external connections, but the M language provides nice functions for preparing the data to be sent to the SQL server: I use M to take care of localizations (the most tough thing to do, I have to say) and prepare a proper query text, so that when I run it the server is asked to create a temporary table (in an @TEMP variable) with the data, and this then serves as the source for the update statement.

    In any case, I’d think very well before even considering a chance to go to live production environments, for daily use by other people I mean, with such a technique. Too fragile, actually.

    For what concerns the insert statements, after many months of doing updates this way without encountering any minimal issue, I decided to try with inserts as well. Here what takes care of avoiding duplicates is the primary key, which cannot be duplicated. So the worst that may happen is an error message. And again, first I test with a single record, and at the end check all the results.

    Usually I don’t like “quick and dirty” things, but this is a case where being quick and dirty saves a lot of man hours, provided the end results are clean.

    Anyway, my question remains: why does Excel 365 raise an error where Excel 2016 doesn’t? And, just to close this comment, let me say that your suggestion to use table.view works great in removing the Excel 365 error. I go this way: by M, I prepare a string “Data” for asking SQL to prepare the @TEMP variable, and then do

    Dest = Sql.Database(“…”, “…”, [Query = ”

    declare @TEMP table
    (
    … [table structure]
    );
    insert into @TEMP values ” & Data & “;

    insert into … [name of the db table] select * from @TEMP;

    select * from @TEMP [this is just to see what has been actually sent to SQL]

    “])

    in

    Dest

    The way I have been able to remove the error has been by substituting the final “Dest” by:

    Table.View(null, [GetType = () => Value.Type(GetRows()), GetRows = () => Dest])

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: