Why Is Power BI Running My SQL Query Twice?

Chris Webb Power BI, Power Query July 5, 2020March 24, 2023 2 Minutes

When you import data from a relational database like SQL Server in Power BI you have the option of entering your own SQL query to use as a starting point:

Here’s the M code for a query that does this:

If you’re confident writing SQL this might seem like a good option, but as I said in this blog post it has the side-effect of disabling query folding inside the Power Query query, so if you add any other transformations they will always be performed inside the Power Query engine – which may be less efficient than performing them in the data source.


let
Source=Sql.Database(
"localhost",
"AdventureWorksDW2017",
[Query
="SELECT [DateKey]#(lf) ,[FullDateAlternateKey]#(lf) ,
[DayNumberOfWeek]#(lf) ,[EnglishDayNameOfWeek]#(lf)
FROM [AdventureWorksDW2017].[dbo].[DimDate]"]
)
in
Source

There’s also another drawback: when you refresh your dataset in Power BI Desktop (although not in the Power BI Service) you’ll see that your SQL query is run twice. Here’s the evidence from SQL Server Profiler showing what happens when the query above is refreshed in Power BI Desktop:

If your query is slow, or if each query execution costs you money, then this is something you want to avoid.

Why is this happening? It turns out this is just another example of what I blogged about here: Power BI wants to know the schema of the table before the query actually runs, so it asks Power Query to return the top 0 rows. Unfortunately, in this case query folding can’t take place and the top 0 filter can’t be pushed back to the database, so the entire query gets run once to get the schema and once to get the data.

The solution is the same as the blog post I just mentioned too: use the Table.View M function to hard-code the schema returned by the query and implement query folding manually. Here’s the adapted version of the new query:

let
Source = Sql.Database(
"localhost",
"AdventureWorksDW2017",
[Query
= "SELECT [DateKey]#(lf) ,[FullDateAlternateKey]#(lf) ,
[DayNumberOfWeek]#(lf) ,[EnglishDayNameOfWeek]#(lf)
FROM [AdventureWorksDW2017].[dbo].[DimDate]"]
),
OverrideZeroRowFilter = Table.View(
null,
[GetType = () =&amp;gt; type table[
DateKey = Int32.Type,
FullDateAlternateKey = DateTime.Type,
DayNumberOfWeek = Byte.Type,
EnglishDayNameOfWeek = Text.Type
], GetRows = () =&amp;gt; Source, OnTake
= (count as number) =&amp;gt;
if count = 0
then #table(GetType(), {})
else Table.FirstN(Source, count)]
)
in
OverrideZeroRowFilter

Generally speaking, I think there’s a lot to be said for creating views (if possible) instead of embedding your own SQL into a Power BI dataset – it makes maintenance and tuning much easier, and of course if you can connect straight to the view without writing any SQL in Power BI, then query folding will work and Power BI Desktop will only query the view once when you refresh.

UPDATE: there’s now a much better way of solving this problem, which involves using Value.NativeQuery function and the EnableFolding=true option https://blog.crossjoin.co.uk/2021/02/21/query-folding-on-sql-queries-in-power-query-using-value-nativequery-and-enablefoldingtrue/

Published by Chris Webb

My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel. View all posts by Chris Webb

Published July 5, 2020March 24, 2023

32 thoughts on “Why Is Power BI Running My SQL Query Twice?”

Luca says:

July 5, 2020 at 6:26 pm

Thank you Chris for your posts ! I’m learning a lot from you

Loading...

Reply
David Eldersveld says:

July 5, 2020 at 6:35 pm

Thanks for bringing some clarity to this Chris.

Loading...

Reply
Nikola Ilic says:

July 5, 2020 at 6:49 pm

Thanks for the explanation Chris, so useful! I didn’t have an idea that this happens in the background.

Loading...

Reply
Daryl says:

July 6, 2020 at 8:09 am

Thanks Chris, but I have question though. Does the same behaviour occur when using you own MDX query in the AnalysisServices.Database function?

Loading...

Reply
Amit says:

July 6, 2020 at 9:32 am

What is the syntax if the Column names have spaces in them?

Loading...

Reply
1. lbendlin says:
  
  July 7, 2020 at 4:05 am
  
  Same as usual, hash sign and quotes.
  
  #”Field with spaces in name” = Text.Type
  
  Loading...
  
  Reply
lbendlin says:

July 6, 2020 at 12:04 pm

So if this is just a problem on the desktop why should I care? It would only impact the developer, not the report consumers that access it in the service, right?

Loading...

Reply
1. Chris Webb says:
  
  July 6, 2020 at 12:11 pm
  
  Yes, but it can make your life as a developer harder and as I said, some data sources charge you per query.
  
  Loading...
  
  Reply
DAVID says:

July 6, 2020 at 12:19 pm

Thank you.
I was using a SQL sentence to reduce the data to import by filtering by model parameter (year). I thought it was more efficient than filtering in PowerQuery. Now I see it maybe wasn’t the best approach…

Loading...

Reply
lbendlin says:

July 6, 2020 at 2:44 pm

Do I create the view immediately after the initial source step, or do I do it when all other transforms are done?

Loading...

Reply
Bryan says:

July 7, 2020 at 3:19 pm

Does this same query behavior also happen when refreshing a dataset or dataflow in the Power BI service?

Loading...

Reply
1. Chris Webb says:
  
  July 7, 2020 at 3:20 pm
  
  No, as I said in the post, it doesn’t happen when you refresh in the Power BI Service.
  
  Loading...
  
  Reply
mcnater says:

July 7, 2020 at 4:23 pm

Are you able to go over in more details the steps with that function to extract the fields in a query so that it can be plugged in? I tried to follow your directions but was not quite getting it. Our company uses almost exclusively SQL queries so that would be a great help Thank you!

Loading...

Reply
1. Chris Webb says:
  
  July 7, 2020 at 5:15 pm
  
  You need to take the M codein this blog post https://blog.crossjoin.co.uk/2018/10/03/function-m-table-type/ and then paste it into a new blank query. This will give you a function; you can then pass the name of the query whose data types you want into this function.
  
  Loading...
  
  Reply
  1. mcnater says:
    
    July 7, 2020 at 6:49 pm
    
    So I paste that code in and then afterwords right click on the query and say “convert to function”. When I do that there is a dialogue box that pops up that says “The query does not reference any parameters. Are you sure you want to create the function without any parameters. If I say OK there, I don’t really see how to get the function to work to pass the name in. That’s where I’m getting lost. Do you happen to have a sample small PBIX file by chance?
    
    Loading...
  2. Chris Webb says:
    
    July 7, 2020 at 7:14 pm
    
    You don’t need to right click and “convert to function” – the code is already a function, so when you paste it into a blank query you’ll have a function ready to use. When you click on the new query that returns the function you should see a dropdown box and be able to select any other query to pass in, and when you do this and invoke the function it will create yet another query that returns the results.
    
    Loading...
  3. Chris Webb says:
    
    July 7, 2020 at 7:15 pm
    
    I’ll try to create a pbix file with an example in and post it here within the next day or so.
    
    Loading...
  4. mcnater says:
    
    July 14, 2020 at 5:34 pm
    
    By chance did you post the PBIX anywhere yet? Thanks!
    
    Loading...
  5. Chris Webb says:
    
    July 15, 2020 at 2:58 pm
    
    Sorry for the delay, you can get a sample pbix here: https://1drv.ms/u/s!AppIOcLysIR7gZgCJCs7vmAsB1pNRA?e=XOURY7 There are three queries here: the function to get types, a sample query that gets a table from a public data source, and a query that calls the function on this sample table and returns the data types.
    
    Loading...
2. mcnater says:
  
  July 16, 2020 at 8:39 pm
  
  Thank you! Grabbing it now…really appreciate this and all your work in blogging PBI examples. Cheers.
  
  Loading...
  
  Reply
mcnater says:

July 7, 2020 at 7:40 pm

OK…I appreciate you getting a sample file. That would be amazing. I don’t see the dropdown box you’re talking about after creating the query but I’ll keep digging. I just don’t get what I’m missing. Thanks again.

Loading...

Reply
Rtn says:

July 7, 2020 at 9:56 pm

Hello Chris,

I tried to implement what you present. But I am having problems with null values. I get the error “The type of the value does not match the type of the column” in many rows. I used Text.Type and Decimal.Type. How do you manage null values ?

Thank you.

Loading...

Reply
l9xdhz1qstw9kmx says:

July 9, 2020 at 1:14 pm

Hi Chris,

I have just tried your examples and regarding the first example I have discovered the following:

Even though Power BI Desktop starts the statement twice, it seems that the first statement is more a “tracer bullet” to make sure that the schema is correct, as the first statement is aborted. In SQL Profiler I included the EventClass SQL:BatchCompleted and the Column Error and it reveals that the first statement returns the error code “2 – Abort”.

Then afterwards the second statement is executed and it completes with an “0 – OK” and the actual rows are returned.

So it looks like Power BI Desktop somehow breaks the connection for the first statement. This of course have a price, but not as high as if the result set were returned twice.

By the way I am using the June 2020 Power BI Desktop – 2.82.5858.1161

– Your blog posts is very deep and interesting 🙂

Loading...

Reply
1. Chris Webb says:
  
  July 9, 2020 at 2:20 pm
  
  Thanks! I didn’t catch that. I do know that for other data sources this is a much bigger problem, I guess because Power BI isn’t able to abort queries in the same way.
  
  Loading...
  
  Reply
lbendlin says:

August 25, 2020 at 2:23 pm

Here’s a direct proof of Chris’ assertion.

Create a table on SQL Server:

CREATE TABLE [dbo].[test](
[id] [int] IDENTITY(1,1) NOT NULL,
[timestamp] [datetime2](7) NULL,
[machine] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
CONSTRAINT [PK_test] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

Run this Power Query code:

Sql.Database(“server”, “database”, [Query=”insert test (machine) select ‘this one’#(lf)select * from test”, CreateNavigationProperties=false])

This will insert two rows each time.

Loading...

Reply
Laura says:

October 6, 2020 at 8:34 pm

Hi, Chris. I tried to do this in a Oracle connection but it didn’t work, actually sometime it works but not always. Could you help me understand why? Another thing is that my sql take 5 min just to connect, what can it be?

Loading...

Reply
Enrico Menotti says:

October 11, 2021 at 6:12 pm

Dear Chris,

I have some Power Queries embedded in Excel sheets which I use to do updates and inserts on a Microsoft SQL db. Today I faced an issue with the insert, where the SQL server was complaining that I was attempting to write two records with the same primary key, and I immediately realized that it was Power Query that was running twice my native query against the db.

A few months ago I did not encounter this issue, and the Excel sheet (with the embedded query) was exactly the same. However, I was using Excel 2016, whereas recently my company switched to Excel 365. Are you aware of any change in the folding algorithm between these two versions of Excel?

Thanks in advance and best regards,

Enrico Menotti

Loading...

Reply
1. Chris Webb says:
  
  October 13, 2021 at 6:47 pm
  
  Using Power Query to do inserts is not supported (I know I blogged about it once, but I think I was clear that you shouldn’t do it in the real world) precisely because of this issue.
  
  Loading...
  
  Reply
Enrico Menotti says:

October 14, 2021 at 11:06 am

Right, I know it’s not supported, and now that you tell me I remember that the first time I googled about Power Query SQL updates I stepped upon that old post of yours. It was clear there’s this double-run issue, and also there’s a similar issue with previews, which run any query just when you move your mouse pointer over the query name in the “Query and connections” tab at the right of the Excel sheet. Not to talk about any edit in the editor and advanced editor, which run the query immediately.

However, doing an update two times with the same value cannot do any harm to the db, as far as I know (please correct me if I’m wrong). What worries me is instead the chance of updating a field while the record is locked by another user from the ERP GUI; this is something I’ve never investigated on.

Based on these considerations, I have successfully used Power Query for SQL updates, where in any case the use is just for non-ordinary mass modifications which cannot be done from our ERP GUI, and I’m the only user who does this. Whenever I do such things, I always first test thoroughly on a single record, and only after that go for mass modifications, and at the end always check thoroughly the results. Since my data always come from Excel spreadsheets, this technique is pretty useful and fast, and has the advantage of being quite minimal (it’s based on Excel’s built-in features only). Maybe one may try even with the old Microsoft Query which appears to be still at the basis of Excel external connections, but the M language provides nice functions for preparing the data to be sent to the SQL server: I use M to take care of localizations (the most tough thing to do, I have to say) and prepare a proper query text, so that when I run it the server is asked to create a temporary table (in an @TEMP variable) with the data, and this then serves as the source for the update statement.

In any case, I’d think very well before even considering a chance to go to live production environments, for daily use by other people I mean, with such a technique. Too fragile, actually.

For what concerns the insert statements, after many months of doing updates this way without encountering any minimal issue, I decided to try with inserts as well. Here what takes care of avoiding duplicates is the primary key, which cannot be duplicated. So the worst that may happen is an error message. And again, first I test with a single record, and at the end check all the results.

Usually I don’t like “quick and dirty” things, but this is a case where being quick and dirty saves a lot of man hours, provided the end results are clean.

Anyway, my question remains: why does Excel 365 raise an error where Excel 2016 doesn’t? And, just to close this comment, let me say that your suggestion to use table.view works great in removing the Excel 365 error. I go this way: by M, I prepare a string “Data” for asking SQL to prepare the @TEMP variable, and then do

Dest = Sql.Database(“…”, “…”, [Query = ”

declare @TEMP table
(
… [table structure]
);
insert into @TEMP values ” & Data & “;

insert into … [name of the db table] select * from @TEMP;

select * from @TEMP [this is just to see what has been actually sent to SQL]

“])

in

Dest

The way I have been able to remove the error has been by substituting the final “Dest” by:

Table.View(null, [GetType = () => Value.Type(GetRows()), GetRows = () => Dest])

Loading...

Reply
mmichaels1970 says:

June 23, 2022 at 3:57 pm

This is astonishing. I modified my query in VBA for a power pivot table that returns 800k rows. The query takes about a minute to refresh. While watching the activity monitor on my SQL Server I watched Active Expensive Queries pop twice, which lead me to this blog. There are a bunch of columns so Table.View doesn’t seem viable at this point (perhaps it will be). Since my SQL executes a stored procedure, I fed the current user name in as a parameter then I check a table when the SP is executed to see if this is the first or second time the SP was executed by the user. If first, it returns only the schema from a table variable. If the second time, it populates the full table variable and returns the results. Kind of a nasty workaround and I don’t know if it will hold up. I hope there’s some sort of solution to the Top 0 issue.

Loading...

Reply
1. mmichaels1970 says:
  
  June 24, 2022 at 2:47 pm
  
  My workaround didn’t end up working because SOMETIMES the query is called twice, but sometimes it isn’t. It appears that when the query stays EXACTLY the same those two calls aren’t made. However, if you change a PARAMETER of the query it is executed twice. I noticed that in Excel Power Pivot that your code does not appear to fix the issue.
  
  I wrote an INSERT into my stored procedure to insert into a log table for every execution. Two entries were made when the parameters of my query were significantly changed.
  
  Following code works “sometimes”:
  
  = let
  JCCo = if #”ctJCCo”=null then “2” else #”ctJCCo”,
  Mth = if #”ctMth”=null then “6/1/2022″ else #”ctMth”,
  Job = if #”ctJob”=null then “null” else #”ctJob”,
  
  Source = Sql.Database(“MYONPREMISESQL”, “MYDATABASE”, [Query=”
  EXEC dbo.MyPowerQueryReport_PQ @JCCo=” & JCCo & “, @Mth=N'” & Mth & “‘, @Contract='” & Job & “‘
  “]),
  OverrideZeroRowFilter = Table.View(
  null,
  [GetType = () => type table[
  JCCo = Int64.Type,
  Job = Text.Type,
  JobDesc = Text.Type,
  Item = Text.Type,
  JobStatus = Int64.Type
  ], GetRows = () => Source, OnTake
  = (count as number) =>
  if count = 0
  then #table(GetType(),{})
  else Table.FirstN(Source,count)]
  )
  in
  OverrideZeroRowFilter
  
  This is awful. Power Pivot seems to be an absolute mess. I do all the work finding, gathering, prepping the data. I can write DAX queries to manipulate it. But MS can’t do the simple job of calling a query ONLY ONCE or at least providing some sort of value that I can use as a parameter to notify my query that they only want the schema on “this call”.
  
  Loading...
  
  Reply
Olga says:

February 11, 2023 at 9:37 pm

Hi Chris, thank you for this, it is amazing, I spend quite a long time googling for this. I am loading my data from Oracle database, and the described fix works well, but then in power query I need to append an excel document (there are manual adjustments and no possible workaround at this moment) and after the append the query is loading twice again, taking horrible long time. Do you have any experience with appending SQL and excel and avoiding the double loading? Many thanks, Olga

Loading...

Reply