Timing Power Query Queries

Chris Webb Power Query November 17, 2014January 18, 2020 5 Minutes

If you’re interested in performance-tuning a Power Query query, you’re going to need a reliable way of telling how long your query takes to run. It’s not as easy as it seems to adapt your query to do this though, and that’s because of the way the M formula language works. This post does show you how to find how long a query takes to run, but I have to admit that I’m not 100% satisfied with the approach and I suspect there are other, better ways of achieving the same thing that do not involve any M code. However this post does highlight a lot of interesting points about Power Query and M so I think it’s worth reading anyway…

M language background

In my last post, showing how to create a Power Query function that waits a given number of seconds, I used the following query to demonstrate the function:

[sourcecode language=”text” padlinenumbers=”true”]
let
//Find current time
One = DateTime.LocalNow(),
//Define Wait function
Wait = (seconds as number, action as function) =>
if (List.Count(
List.Generate(
() => DateTimeZone.LocalNow() + #duration(0,0,0,seconds),
(x) => DateTimeZone.LocalNow() < x,
(x) => x)
) = 0)
then null else action(),
//Call Wait function to wait 5 seconds
//then return the current time again
Two = Wait(5,DateTime.LocalNow),
//Display start time, finish time, and duration
Output = "Start Time: " & DateTime.ToText(One) &
" Finish Time: " & DateTime.ToText(Two) &
" Difference: " & Duration.ToText(Two-One)
in
Output
[/sourcecode]

The output of this seems very straightforward:

However, consider the output when you change the last step in the query to be:

[sourcecode language=”text”]
Output = "Finish Time: " & DateTime.ToText(Two) &
" Start Time: " & DateTime.ToText(One) &
" Difference: " & Duration.ToText(Two-One)
[/sourcecode]

The only difference is that the finish time is displayed before the start time, and yet the output is now:

The start time and the finish time are now the same, and the duration is 0. However the query does still take the same amount of time to run. What’s going on here?

Looking at the M code for a query, you’d be forgiven for thinking that the first step in a query is evaluated first, the second step is evaluated second, the last step is evaluated last, and so on. This is not the case though. From page 10 of the Language Reference document (which can be downloaded here):

List and record member expressions (as well as let expressions, introduced further below) are

evaluated using lazy evaluation, which means that they are evaluated only as needed.

A Power Query query is usually a single let expression, and within a let expression steps are evaluated as and when they are needed; if a step’s output is not needed anywhere else then it may not be evaluated at all. Therefore, in the second example above, even though the step called One is the first step in the query, it is evaluated after the step called Two because that is the order that the values are needed in the expression calculated for the step Output.

Another important point to understand is that (from page 30 of the Language Reference):

Once a value has been calculated, it is immutable, meaning it can no longer be changed.

This simplifies the model for evaluating an expression and makes it easier to reason about the

result since it is not possible to change a value once it has been used to evaluate a subsequent

part of the expression.

In our example, this means that once StartTime and EndTime have been evaluated for the first time, the value they return will not change on subsequent calls.

The final thing to point out is that I’m using the DateTime.LocalNow() function here to get the system date and time at the point that the expression is evaluated. If you use DateTime.FixedLocalNow() it will return the same value every time you call it in the same query – which is obviously not going to be much use for our purposes.

How to time your queries

Now we know all this we can alter a query that does something useful so we can see how long it takes to execute. Getting status updates from Facebook is sufficiently slow for us to be able to measure a duration, and here’s an example query that shows how to measure how long the request takes:

[sourcecode language=”text” padlinenumbers=”true”]
let
//Get the current time, which will be the query start time
StartTime = DateTime.LocalNow(),
//Make sure StartTime is evaluated just before we request data from Facebook
Source = if StartTime<>null
then
Facebook.Graph("https://graph.facebook.com/me/statuses")
else
null,
//Find the number of rows returned
NumberOfRows = Number.ToText(Table.RowCount(Source)),
//Get the current time, which will be the query end time
EndTime = DateTime.LocalNow(),
//Make sure PQ evvaluates all expressions in the right order:
//first, get the Number of rows, which ensure that
//the Source expression is evaluated, and in turn StartTime is evaluated
//second, the EndTime is evaluated as part of the duration calculation
Output = "Query returned " & NumberOfRows
& " rows and took "
& Duration.ToText(EndTime – StartTime)
in
Output
[/sourcecode]

Here’s an example of the output of the query:

How does this work?

The Output step has the expression that returns the query output
The first value evaluated in the Output step is the NumberOfRows step, which finds the number of rows returned by Facebook in turn forces the evaluation of the Source step. If we didn’t include an expression in the output that was in some way derived from data loaded from Facebook, Power Query would not bother to evaluate the Source step.
The Source step gets data from Facebook, but before that happens there is an if expression that checks to see whether StartTime is null or not. Of course this always returns true, but the point here is that by making this check we are forcing StartTime to be evaluated.
The second value evaluated in Output is the duration. The duration is the EndTime (evaluated for the first time at this point) minus the StartTime (which has already been evaluated, remember), so we get the amount of time elapsed between the start of the request to Facebook and the end of the query.

With all of the other stuff stripped out, here’s the basic template I’ve used:

[sourcecode language=”text”]
let
StartTime = DateTime.LocalNow(),
Source = if StartTime<>null
then
//do whatever we were going to do in our first step
else
null,

//insert all other steps here

NumberOfRows = Number.ToText(Table.RowCount(WhateverThePreviousStepIs)),
EndTime = DateTime.LocalNow(),
Output = "Query returned " & NumberOfRows
& " rows and took "
& Duration.ToText(EndTime – StartTime)
in
Output
[/sourcecode]

I’m making a pretty big assumption here, though, and that is that finding the number of rows in the NumberOfRows step will force the evaluation of all previous steps, which may not be the case! You might have to think carefully about how you write an expression that forces evaluation of all of your steps.

This is all very complicated (indeed, probably too complicated) so I’m also investigating what the Power Query trace file can tell you about things like query execution times. Hopefully I’ll be able to blog about that in a few weeks.

You can download an Excel workbook with some sample Power Query queries here.

Published by Chris Webb

My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel. View all posts by Chris Webb

Published November 17, 2014January 18, 2020

29 thoughts on “Timing Power Query Queries”

Pingback: Returning Error Messages As Well As Results In Power Query | Chris Webb's BI Blog
1. Sedat Onat says:
  
  February 22, 2015 at 6:16 am
  
  Mr. Webb
  Could you please help me?
  
  I could not add to my query as shown below;
  
  let
  ArchivedData = (page as number) =>
  let
  Source = Web.Page(Web.Contents(“http://uzmanpara.milliyet.com.tr/borsa/gecmis-kapanislar/?Pagenum=” & Number.ToText(page) & “&tip=Hisse&gun=20&ay=2&yil=2015&Harf=-1″)),
  Data0 = Source{0}[Data],
  #”Changed Type” = Table.TransformColumnTypes(Data0,{{“Menkul Adı”, type text}})
  in
  #”Changed Type”,
  #”Invoked FunctionArchivedData” = ArchivedData(5)
  in
  #”Invoked FunctionArchivedData”
  
  Best regards
  
  Loading...
  
  Reply
  1. Chris Webb says:
    
    February 22, 2015 at 10:34 am
    
    Can you tell me what error you get? This query worked for me when I pasted it into Power Query, although I did have to change the double quotation marks from your query to the ” character after pasting it from the web.
    
    Loading...
  2. sedatonat says:
    
    February 22, 2015 at 10:45 am
    
    Mr Webb
    
    Error is; “Waiting for uzmanpara.milliyet.com.tr”
    I have waited very long time but no result
    I think web site blocks me due to large number of requests in a short time
    
    Loading...
  3. Chris Webb says:
    
    February 22, 2015 at 10:47 am
    
    Maybe this will help?
    https://cwebbbi.wordpress.com/2014/11/14/waiting-between-web-service-requests-in-power-query/
    
    Loading...
sedatonat says:

February 22, 2015 at 10:53 am

Yes I’ve seen that but unfortunately couldn’t add to my query
Could not figure out how to do

Loading...

Reply
Colin Banfield says:

March 8, 2016 at 10:02 pm

Hey Chris,

I think that Date.FixedLocalNow() is the correct function for StartTime, since it is immutable for multiple calls within the script. Date.LocalNow() is mutable, and changes every time it is called within the script. After running the Facebook script with Date.LocalNow(), the timing was sub-one second. Using Date.FixedLocalNow(), the timing was between 1 and 2 seconds. I measured 2 seconds on my stopwatch, suggesting the real time was between 1 and two seconds.

Could you do a similar timing checks to see if you get similar results?

Loading...

Reply
1. Chris Webb says:
  
  March 9, 2016 at 8:43 am
  
  Hi Colin, it’s difficult to know for sure but I don’t see how using Date.FixedLocalNow() just the start time would make any difference if it was only being called once anyway? When I test I using the Facebook API I get a lot of variation in the response time, which means I can’t compare the two functions. There could also be all kinds of other stuff happening in Power Query and Excel after the query has finished executing that makes it even harder to measure. This is one for the dev team, I think.
  
  Loading...
  
  Reply
Colin Banfield says:

March 9, 2016 at 3:53 pm

Hi Chris,

After posting my last message, it occurred to me that I had a better way of testing the timing assumptions. I used a prime number generator (a tail-recursive version of the one that Curt posted on the TechNet forum). On my desktop, it takes about 15 sec to generate 70K prime numbers (stopwatch timing). Using your timing technique, I recorded around 14.3 sec. This number was roughly the same for StartTime set to either DateTime.LocalNow() or DateTime.FixedLocalNow(). That being said, I missed the mark in the following two key areas:

1) You are absolutely correct in stating that StartTime is immutable after being set, period.

2) You are also correct about the behaviour of DateTime.FixedLocalNow(). When both Start and End times are set to DateTime.FixedLocalNow(), the difference is zero, even though the function is being called at different times. So what is the single fixed time based on? The result would indicate that it’s set to the current date & time when first called during the query execution (when StartTime is called in this case). All subsequent calls to DateTime.FixedLocalNow() (EndTime in this case) uses the first set value – it’s not recalculated at all until the query is refreshed. I completely misinterpreted the function’s description, although it now makes sense after re-reading it. 🙂

Lastly, although I very much like your technique, I am somewhat concerned by the variations I get in milliseconds. The prime number generator is not influenced by any external source, so it’s not clear why successive refreshes should vary much at all.

Loading...

Reply
1. Chris Webb says:
  
  March 9, 2016 at 4:18 pm
  
  I’ve asked the questions here:
  https://social.technet.microsoft.com/Forums/en-US/fcf37fe0-144d-4a69-8d4b-7f2188cfaff3/questions-about-timing-queries?forum=powerquery
  Hopefully we’ll get some answers from the dev team!
  
  Loading...
  
  Reply
Colin Banfield says:

March 11, 2016 at 10:18 pm

Ehren’s response is a revelation. I’ve confirmed that you can set StartTime to DateTime.FixedLocalNow(), and you don’t have to explicitly call it in the Source step! This is an exception to the normal workings of M.

Loading...

Reply
Pingback: Timing M Query Execution In Power Query And Power BI (Revisited) – Chris Webb's BI Blog
Pingback: Timing M Query Execution In Power Query And Power BI (Revisited) - SQL Server - SQL Server - Toad World
Pingback: Understanding Let Expressions In M For Power BI And Power Query – Chris Webb's BI Blog
Julian says:

October 10, 2016 at 12:33 am

Hi Chris,
In addition to the information of query time, what the most required is the data table itself. Therefore, could you please show me how to load the table onto the spreadsheet by adding the query time as a footnote or report title? Thanks.

Loading...

Reply
1. Chris Webb says:
  
  October 13, 2016 at 9:36 am
  
  Unfortunately that’s not an easy thing to do – because a query can only return a single table, you cannot return the query time separately.
  
  Loading...
  
  Reply
Menno says:

April 24, 2017 at 12:23 pm

Will definitely give this a shot at some point in the near future.

Have also been reading about buffered tables and the powerquery execution model. So far I’ve been timing execution time of my entire chain with a stopwatch and I have been seeing some pretty varying results (from 50s to 2m30s). This is on a workbook with 3 source tables in Excel (hundreds / thousands of lines), and a chain of about 14 queries of varying complexity.

On what exactly depends the execution time? For instance, should I expect different results after:
-running the same query chain with the same data? is anything reused from cache?
-changing small parts of the query
-restarting excel (saving, closing, reopening)
-making changes to the source tables in excel

I found the posts about the caching model and query execution chain very confusing; even more so when I have measured execution time of 50s before the weekend after making small changes, and now after pasting new data (of similar size) it’s suddenly back to over 2m again.

Loading...

Reply
Menno says:

May 9, 2017 at 12:35 pm

Crossposting at https://social.technet.microsoft.com/Forums/en-US/56f0e51a-b8d4-4d20-8dfb-676d05699594/powerquery-execution-time?forum=powerquery

Loading...

Reply
v55 says:

October 17, 2018 at 2:02 pm

———–M code —————–

let
ParseJsonLine = (#”input content” as binary) => let
#”Imported Text” = Table.FromColumns({Lines.FromBinary(#”input content”,null,null,1252)}),
#”Parsed JSON” = Table.TransformColumns(#”Imported Text”,{},Json.Document)
in
#”Parsed JSON”,

//Expand each JSON content value with encoding as 1252
ReadAllJsonBlobs = Table.AddColumn(RemoveOtherColumns , “JSONContent”, each Json.Document([Content],1252)),

//Json line format will fail with json parser – rows with error will return jsonline format
JsonLineFormatTable = Table.SelectRowsWithErrors(ReadAllJsonBlobs, {“JSONContent”}),
ReadAllJsonLineBlobs = Table.AddColumn(JsonLineFormatTable, “JSONLineContent”, each ParseJsonLine([Content])),
RemoveOtherColumns1 = Table.RemoveColumns(ReadAllJsonLineBlobs,{“Content”, “JSONContent”}),
JsonLineFormatRecords = Table.ExpandTableColumn(RemoveOtherColumns1, “JSONLineContent”, {“Column1”}, {“records”}),

//Json format will be parsed by json parser – rows without error will return json format
CurrentJsonFormatTable = Table.RemoveRowsWithErrors(ReadAllJsonBlobs, {“JSONContent”}),
ExpandJSON = Table.ExpandRecordColumn(CurrentJsonFormatTable, “JSONContent”, {“records”}, {“records”}),
ExpandRecords = Table.ExpandListColumn(ExpandJSON , “records”),
CurrentJsonFormatRecords = Table.RemoveColumns(ExpandRecords,{“Content”}),

// Combining both current format and jsonline format records
CombinedTable = Table.Combine({CurrentJsonFormatRecords, JsonLineFormatRecords }),

——————————-

The above query is to parse json or json line data from blobs. Basically I have two parts: one to parse json and other to parse jsonline. I am first parsing with json methods and if it fails I will parse with my custom parser (parseJsonLine method). I cant get rid of double parsing as there is no way I can distinguish the blob data format. I split the table into two parts – SlectRowswithErrors (to parse jsonline) and RemoveRowsWithErrors (to parse json data).

Initially I used Table.Combine({CurrentJsonFormatRecords, JsonLineFormatRecords }) but I noticed a massive performance hit. From powerBI UI itself, I could see it was evident it was trying to load multiple times.

I noticed this line from your book on powerquery that Table.RemoveRowsWithErrors prevents query folding.
So when I changed the query to Table.Combine({JsonLineFormatRecords , CurrentJsonFormatRecords}), I did not see this performance issue as this line Table.RemoveRowsWithErrors comes late.

I have three questions:
1. RemoveRowsWithErrors prevent query folding. Will SelectRowsWithErrors also prevent query folding?
2. When I tried with both the code (two reorder of table.combine), I did not see any change in query timing (I used the query timing code mentioned in this blog). Then how come during refresh, there was double loading for one order of table.combine?
3. How do we monitor performance for blob data? We have GBs of data and we see performance hit and timeout happens in powerBI service. Any suggestions for profiling blob data?

Loading...

Reply
1. Chris Webb says:
  
  October 26, 2018 at 8:33 pm
  
  I don’t know any of the answers to your questions, I’m afraid – but if you do find out, I would be interested to know! The trouble is that the Power Query engine is a black box…
  
  Loading...
  
  Reply
Pingback: Finding Power Query Query Execution Times In Power BI And Excel Using SQL Server Profiler « Chris Webb's BI Blog
Pingback: Finding Power Query Query Execution Times In Power BI And Excel Using SQL Server Profiler - Beyond Group
Pingback: Finding Power Query Query Execution Times In Power BI And Excel Using SQL Server Profiler
Pingback: Chris Webb's BI Blog: Understanding Let Expressions In M For Power BI And Power Query Chris Webb's BI Blog
Steve Bateman says:

June 30, 2022 at 8:44 am

This is an old post/thread, but I love the timing technique and wanted to add a potentially useful extension to it.

I’ve been looking for a way to pinpoint which steps in a multi-step query are taking the most time and wanted to get your feedback on the following.

This query lists files in a folder then adds a column and filters for a specific filetype + date created. Finally, it appends the full Source list to the filtered list, with the expectation that the longest execution times will be the first and last steps in the query. It could be any multi-step query, but processing files in a folder is a good example that anyone can try out.

The Timings step builds a table that times each step in the query, while the Result step allows switching between the actual output from the query or the query execution timing table.

let
Source = Folder.Files(fnGetParameter(“%onedrive%”)),
#”Added FileName” = Table.AddColumn(Source, “FileName”, each [Folder Path] & [Name]),
#”Filtered .xlsm after 12/12/21″ = Table.SelectRows(#”Added FileName”, each ([Extension] = “.xlsm”) and [Date created] > #datetime(2021, 12, 12, 0, 0, 0)),
#”Appended unfiltered file listing” = Table.Combine({#”Filtered .xlsm after 12/12/21″, #”Added FileName”}),
Timings = Table.AddColumn(#table({“Query”, “StartTime”, “NumberOfRows”, “EndTime”}, {
{“Source”, DateTime.LocalNow(), Table.RowCount(Source), DateTime.LocalNow()},
{“Added FileName”, DateTime.LocalNow(), Table.RowCount(#”Added FileName”), DateTime.LocalNow()},
{“Filtered .xlsm after 12/12/21″, DateTime.LocalNow(), Table.RowCount(#”Filtered .xlsm after 12/12/21”), DateTime.LocalNow()},
{“Appended unfiltered file listing”, DateTime.LocalNow(), Table.RowCount(#”Appended unfiltered file listing”), DateTime.LocalNow()}
}), “Duration”, each [EndTime] – [StartTime]),
Result = Timings
in
Result

Looking at the timings, I get roughly 3 seconds for the first three steps (Source, AddColumn, Filter) and roughly 6 seconds for the fourth step (Append).

More or less as expected (allowing for some variation due to other CPU usage and/or buffering/caching by PQ).

I’d be interested in feedback about using this technique to pinpoint the query steps that take the most time.

Loading...

Reply
Pingback: powerquery - How can I log the running duration of multiple excel power queries while still returning the query tables? - Farito
Ludovico Ruggeri Laderchi says:

December 17, 2023 at 4:07 pm

I am trying combining meta data and the information of this post to record execution time.

This is my logic

let
StartTime=DateTime.LocalNow(),
Source = if StartTimenull then FileMapping else null, //FileMapping being the first step

LoadFiles = Table.AddColumn(
Source, …

// THE REST OF THE CODE

#”Expanded PeriodLevels” = … // LAST STEP

EndTime=DateTime.LocalNow(),
DurationTime=EndTime-StartTime,
Finish= #”Expanded PeriodLevels” meta [Start=StartTime,Finish=EndTime,Duration=DurationTime, Status=”OK”]
in
Finish

TWO ISSUES

1) I still get StartTime and EndTime to be basically the same
2) Every time I run a query to get the metadata out … results change.

My final goal is to have a table with queries and last successful execution time… I tried also VBA but no success.

Thanks a lot if anyone has suggestions

Ludovico

Loading...

Reply
1. Kenneth Barber says:
  
  February 21, 2024 at 9:49 am
  
  I used Table.Buffer, with no 2nd argument, just before calculating the end time and that worked for me. Table.RowCount did not work.
  
  Loading...
  
  Reply
Kenneth Barber says:

February 21, 2024 at 10:07 am

In the Source expression, I prefer to use “if StartTime = null then null else”. This way, all of the code that is required for the timing is bunched up into 1 place and it can all be on its own line.

Good post. I am currently using this technique to add a “Refresh Duration” column to all of my tables so then I can analyze the refresh time from within the Data Model with basically no work after the initial setup. I am also using a parameter to toggle the calculation of these columns on and off because I am using Table.Buffer to force most of the whole query to evaluate before the end time is evaluated, and Table.Buffer can slow things down in some cases. Lastly, I add the “Refresh Duration” column as part of an “Add Refresh Columns” function, which also adds the “Last Refreshed” column, which shows the date and time when the table was last refreshed, and this “Last Refreshed” column is also used in measures and visuals to track how each table is doing. I use the “Add Refresh Columns” function on every table that is loaded to the Data Model.

Loading...

Reply