Timing Power Query Queries

If you’re interested in performance-tuning a Power Query query, you’re going to need a reliable way of telling how long your query takes to run. It’s not as easy as it seems to adapt your query to do this though, and that’s because of the way the M formula language works. This post does show you how to find how long a query takes to run, but I have to admit that I’m not 100% satisfied with the approach and I suspect there are other, better ways of achieving the same thing that do not involve any M code. However this post does highlight a lot of interesting points about Power Query and M so I think it’s worth reading anyway…

M language background

In my last post, showing how to create a Power Query function that waits a given number of seconds, I used the following query to demonstrate the function:

[sourcecode language=”text” padlinenumbers=”true”]
let
//Find current time
One = DateTime.LocalNow(),
//Define Wait function
Wait = (seconds as number, action as function) =>
if (List.Count(
List.Generate(
() => DateTimeZone.LocalNow() + #duration(0,0,0,seconds),
(x) => DateTimeZone.LocalNow() < x,
(x) => x)
) = 0)
then null else action(),
//Call Wait function to wait 5 seconds
//then return the current time again
Two = Wait(5,DateTime.LocalNow),
//Display start time, finish time, and duration
Output = "Start Time: " & DateTime.ToText(One) &
" Finish Time: " & DateTime.ToText(Two) &
" Difference: " & Duration.ToText(Two-One)
in
Output
[/sourcecode]

 

The output of this seems very straightforward:

image

However, consider the output when you change the last step in the query to be:

[sourcecode language=”text”]
Output = "Finish Time: " & DateTime.ToText(Two) &
" Start Time: " & DateTime.ToText(One) &
" Difference: " & Duration.ToText(Two-One)
[/sourcecode]

 

The only difference is that the finish time is displayed before the start time, and yet the output is now:

image

The start time and the finish time are now the same, and the duration is 0. However the query does still take the same amount of time to run. What’s going on here?

Looking at the M code for a query, you’d be forgiven for thinking that the first step in a query is evaluated first, the second step is evaluated second, the last step is evaluated last, and so on. This is not the case though. From page 10 of the Language Reference document (which can be downloaded here):

List and record member expressions (as well as let expressions, introduced further below) are

evaluated using lazy evaluation, which means that they are evaluated only as needed.

A Power Query query is usually a single let expression, and within a let expression steps are evaluated as and when they are needed; if a step’s output is not needed anywhere else then it may not be evaluated at all. Therefore, in the second example above, even though the step called One is the first step in the query, it is evaluated after the step called Two because that is the order that the values are needed in the expression calculated for the step Output.

Another important point to understand is that (from page 30 of the Language Reference):

Once a value has been calculated, it is immutable, meaning it can no longer be changed.

This simplifies the model for evaluating an expression and makes it easier to reason about the

result since it is not possible to change a value once it has been used to evaluate a subsequent

part of the expression.

In our example, this means that once StartTime and EndTime have been evaluated for the first time, the value they return will not change on subsequent calls.

The final thing to point out is that I’m using the DateTime.LocalNow() function here to get the system date and time at the point that the expression is evaluated. If you use DateTime.FixedLocalNow() it will return the same value every time you call it in the same query – which is obviously not going to be much use for our purposes.

How to time your queries

Now we know all this we can alter a query that does something useful so we can see how long it takes to execute. Getting status updates from Facebook is sufficiently slow for us to be able to measure a duration, and here’s an example query that shows how to measure how long the request takes:

[sourcecode language=”text” padlinenumbers=”true”]
let
//Get the current time, which will be the query start time
StartTime = DateTime.LocalNow(),
//Make sure StartTime is evaluated just before we request data from Facebook
Source = if StartTime<>null
then
Facebook.Graph("https://graph.facebook.com/me/statuses&quot;)
else
null,
//Find the number of rows returned
NumberOfRows = Number.ToText(Table.RowCount(Source)),
//Get the current time, which will be the query end time
EndTime = DateTime.LocalNow(),
//Make sure PQ evvaluates all expressions in the right order:
//first, get the Number of rows, which ensure that
//the Source expression is evaluated, and in turn StartTime is evaluated
//second, the EndTime is evaluated as part of the duration calculation
Output = "Query returned " & NumberOfRows
& " rows and took "
& Duration.ToText(EndTime – StartTime)
in
Output
[/sourcecode]

Here’s an example of the output of the query:

image

How does this work?

  • The Output step has the expression that returns the query output
  • The first value evaluated in the Output step is the NumberOfRows step, which finds the number of rows returned by Facebook in turn forces the evaluation of the Source step. If we didn’t include an expression in the output that was in some way derived from data loaded from Facebook, Power Query would not bother to evaluate the Source step.
  • The Source step gets data from Facebook, but before that happens there is an if expression that checks to see whether StartTime is null or not. Of course this always returns true, but the point here is that by making this check we are forcing StartTime to be evaluated.
  • The second value evaluated in Output is the duration. The duration is the EndTime (evaluated for the first time at this point) minus the StartTime (which has already been evaluated, remember), so we get the amount of time elapsed between the start of the request to Facebook and the end of the query.

With all of the other stuff stripped out, here’s the basic template I’ve used:

[sourcecode language=”text”]
let
StartTime = DateTime.LocalNow(),
Source = if StartTime<>null
then
//do whatever we were going to do in our first step
else
null,

//insert all other steps here

NumberOfRows = Number.ToText(Table.RowCount(WhateverThePreviousStepIs)),
EndTime = DateTime.LocalNow(),
Output = "Query returned " & NumberOfRows
& " rows and took "
& Duration.ToText(EndTime – StartTime)
in
Output
[/sourcecode]

I’m making a pretty big assumption here, though, and that is that finding the number of rows in the NumberOfRows step will force the evaluation of all previous steps, which may not be the case! You might have to think carefully about how you write an expression that forces evaluation of all of your steps.

This is all very complicated (indeed, probably too complicated) so I’m also investigating what the Power Query trace file can tell you about things like query execution times. Hopefully I’ll be able to blog about that in a few weeks.

You can download an Excel workbook with some sample Power Query queries here.

29 thoughts on “Timing Power Query Queries

    1. Mr. Webb
      Could you please help me?

      I could not add to my query as shown below;

      let
      ArchivedData = (page as number) =>
      let
      Source = Web.Page(Web.Contents(“http://uzmanpara.milliyet.com.tr/borsa/gecmis-kapanislar/?Pagenum=” & Number.ToText(page) & “&tip=Hisse&gun=20&ay=2&yil=2015&Harf=-1″)),
      Data0 = Source{0}[Data],
      #”Changed Type” = Table.TransformColumnTypes(Data0,{{“Menkul Adı”, type text}})
      in
      #”Changed Type”,
      #”Invoked FunctionArchivedData” = ArchivedData(5)
      in
      #”Invoked FunctionArchivedData”

      Best regards

      1. Can you tell me what error you get? This query worked for me when I pasted it into Power Query, although I did have to change the double quotation marks from your query to the ” character after pasting it from the web.

      2. Mr Webb

        Error is; “Waiting for uzmanpara.milliyet.com.tr”
        I have waited very long time but no result
        I think web site blocks me due to large number of requests in a short time

  1. Hey Chris,

    I think that Date.FixedLocalNow() is the correct function for StartTime, since it is immutable for multiple calls within the script. Date.LocalNow() is mutable, and changes every time it is called within the script. After running the Facebook script with Date.LocalNow(), the timing was sub-one second. Using Date.FixedLocalNow(), the timing was between 1 and 2 seconds. I measured 2 seconds on my stopwatch, suggesting the real time was between 1 and two seconds.

    Could you do a similar timing checks to see if you get similar results?

    1. Hi Colin, it’s difficult to know for sure but I don’t see how using Date.FixedLocalNow() just the start time would make any difference if it was only being called once anyway? When I test I using the Facebook API I get a lot of variation in the response time, which means I can’t compare the two functions. There could also be all kinds of other stuff happening in Power Query and Excel after the query has finished executing that makes it even harder to measure. This is one for the dev team, I think.

  2. Hi Chris,

    After posting my last message, it occurred to me that I had a better way of testing the timing assumptions. I used a prime number generator (a tail-recursive version of the one that Curt posted on the TechNet forum). On my desktop, it takes about 15 sec to generate 70K prime numbers (stopwatch timing). Using your timing technique, I recorded around 14.3 sec. This number was roughly the same for StartTime set to either DateTime.LocalNow() or DateTime.FixedLocalNow(). That being said, I missed the mark in the following two key areas:

    1) You are absolutely correct in stating that StartTime is immutable after being set, period.

    2) You are also correct about the behaviour of DateTime.FixedLocalNow(). When both Start and End times are set to DateTime.FixedLocalNow(), the difference is zero, even though the function is being called at different times. So what is the single fixed time based on? The result would indicate that it’s set to the current date & time when first called during the query execution (when StartTime is called in this case). All subsequent calls to DateTime.FixedLocalNow() (EndTime in this case) uses the first set value – it’s not recalculated at all until the query is refreshed. I completely misinterpreted the function’s description, although it now makes sense after re-reading it. 🙂

    Lastly, although I very much like your technique, I am somewhat concerned by the variations I get in milliseconds. The prime number generator is not influenced by any external source, so it’s not clear why successive refreshes should vary much at all.

  3. Ehren’s response is a revelation. I’ve confirmed that you can set StartTime to DateTime.FixedLocalNow(), and you don’t have to explicitly call it in the Source step! This is an exception to the normal workings of M.

  4. Hi Chris,
    In addition to the information of query time, what the most required is the data table itself. Therefore, could you please show me how to load the table onto the spreadsheet by adding the query time as a footnote or report title? Thanks.

  5. Will definitely give this a shot at some point in the near future.

    Have also been reading about buffered tables and the powerquery execution model. So far I’ve been timing execution time of my entire chain with a stopwatch and I have been seeing some pretty varying results (from 50s to 2m30s). This is on a workbook with 3 source tables in Excel (hundreds / thousands of lines), and a chain of about 14 queries of varying complexity.

    On what exactly depends the execution time? For instance, should I expect different results after:
    -running the same query chain with the same data? is anything reused from cache?
    -changing small parts of the query
    -restarting excel (saving, closing, reopening)
    -making changes to the source tables in excel

    I found the posts about the caching model and query execution chain very confusing; even more so when I have measured execution time of 50s before the weekend after making small changes, and now after pasting new data (of similar size) it’s suddenly back to over 2m again.

  6. ———–M code —————–

    let
    ParseJsonLine = (#”input content” as binary) => let
    #”Imported Text” = Table.FromColumns({Lines.FromBinary(#”input content”,null,null,1252)}),
    #”Parsed JSON” = Table.TransformColumns(#”Imported Text”,{},Json.Document)
    in
    #”Parsed JSON”,

    //Expand each JSON content value with encoding as 1252
    ReadAllJsonBlobs = Table.AddColumn(RemoveOtherColumns , “JSONContent”, each Json.Document([Content],1252)),

    //Json line format will fail with json parser – rows with error will return jsonline format
    JsonLineFormatTable = Table.SelectRowsWithErrors(ReadAllJsonBlobs, {“JSONContent”}),
    ReadAllJsonLineBlobs = Table.AddColumn(JsonLineFormatTable, “JSONLineContent”, each ParseJsonLine([Content])),
    RemoveOtherColumns1 = Table.RemoveColumns(ReadAllJsonLineBlobs,{“Content”, “JSONContent”}),
    JsonLineFormatRecords = Table.ExpandTableColumn(RemoveOtherColumns1, “JSONLineContent”, {“Column1”}, {“records”}),

    //Json format will be parsed by json parser – rows without error will return json format
    CurrentJsonFormatTable = Table.RemoveRowsWithErrors(ReadAllJsonBlobs, {“JSONContent”}),
    ExpandJSON = Table.ExpandRecordColumn(CurrentJsonFormatTable, “JSONContent”, {“records”}, {“records”}),
    ExpandRecords = Table.ExpandListColumn(ExpandJSON , “records”),
    CurrentJsonFormatRecords = Table.RemoveColumns(ExpandRecords,{“Content”}),

    // Combining both current format and jsonline format records
    CombinedTable = Table.Combine({CurrentJsonFormatRecords, JsonLineFormatRecords }),

    ——————————-

    The above query is to parse json or json line data from blobs. Basically I have two parts: one to parse json and other to parse jsonline. I am first parsing with json methods and if it fails I will parse with my custom parser (parseJsonLine method). I cant get rid of double parsing as there is no way I can distinguish the blob data format. I split the table into two parts – SlectRowswithErrors (to parse jsonline) and RemoveRowsWithErrors (to parse json data).

    Initially I used Table.Combine({CurrentJsonFormatRecords, JsonLineFormatRecords }) but I noticed a massive performance hit. From powerBI UI itself, I could see it was evident it was trying to load multiple times.

    I noticed this line from your book on powerquery that Table.RemoveRowsWithErrors prevents query folding.
    So when I changed the query to Table.Combine({JsonLineFormatRecords , CurrentJsonFormatRecords}), I did not see this performance issue as this line Table.RemoveRowsWithErrors comes late.

    I have three questions:
    1. RemoveRowsWithErrors prevent query folding. Will SelectRowsWithErrors also prevent query folding?
    2. When I tried with both the code (two reorder of table.combine), I did not see any change in query timing (I used the query timing code mentioned in this blog). Then how come during refresh, there was double loading for one order of table.combine?
    3. How do we monitor performance for blob data? We have GBs of data and we see performance hit and timeout happens in powerBI service. Any suggestions for profiling blob data?

    1. I don’t know any of the answers to your questions, I’m afraid – but if you do find out, I would be interested to know! The trouble is that the Power Query engine is a black box…

  7. This is an old post/thread, but I love the timing technique and wanted to add a potentially useful extension to it.

    I’ve been looking for a way to pinpoint which steps in a multi-step query are taking the most time and wanted to get your feedback on the following.

    This query lists files in a folder then adds a column and filters for a specific filetype + date created. Finally, it appends the full Source list to the filtered list, with the expectation that the longest execution times will be the first and last steps in the query. It could be any multi-step query, but processing files in a folder is a good example that anyone can try out.

    The Timings step builds a table that times each step in the query, while the Result step allows switching between the actual output from the query or the query execution timing table.

    let
    Source = Folder.Files(fnGetParameter(“%onedrive%”)),
    #”Added FileName” = Table.AddColumn(Source, “FileName”, each [Folder Path] & [Name]),
    #”Filtered .xlsm after 12/12/21″ = Table.SelectRows(#”Added FileName”, each ([Extension] = “.xlsm”) and [Date created] > #datetime(2021, 12, 12, 0, 0, 0)),
    #”Appended unfiltered file listing” = Table.Combine({#”Filtered .xlsm after 12/12/21″, #”Added FileName”}),
    Timings = Table.AddColumn(#table({“Query”, “StartTime”, “NumberOfRows”, “EndTime”}, {
    {“Source”, DateTime.LocalNow(), Table.RowCount(Source), DateTime.LocalNow()},
    {“Added FileName”, DateTime.LocalNow(), Table.RowCount(#”Added FileName”), DateTime.LocalNow()},
    {“Filtered .xlsm after 12/12/21″, DateTime.LocalNow(), Table.RowCount(#”Filtered .xlsm after 12/12/21”), DateTime.LocalNow()},
    {“Appended unfiltered file listing”, DateTime.LocalNow(), Table.RowCount(#”Appended unfiltered file listing”), DateTime.LocalNow()}
    }), “Duration”, each [EndTime] – [StartTime]),
    Result = Timings
    in
    Result

    Looking at the timings, I get roughly 3 seconds for the first three steps (Source, AddColumn, Filter) and roughly 6 seconds for the fourth step (Append).

    More or less as expected (allowing for some variation due to other CPU usage and/or buffering/caching by PQ).

    I’d be interested in feedback about using this technique to pinpoint the query steps that take the most time.

  8. I am trying combining meta data and the information of this post to record execution time.

    This is my logic

    let
    StartTime=DateTime.LocalNow(),
    Source = if StartTimenull then FileMapping else null, //FileMapping being the first step

    LoadFiles = Table.AddColumn(
    Source, …

    // THE REST OF THE CODE

    #”Expanded PeriodLevels” = … // LAST STEP

    EndTime=DateTime.LocalNow(),
    DurationTime=EndTime-StartTime,
    Finish= #”Expanded PeriodLevels” meta [Start=StartTime,Finish=EndTime,Duration=DurationTime, Status=”OK”]
    in
    Finish

    TWO ISSUES

    1) I still get StartTime and EndTime to be basically the same
    2) Every time I run a query to get the metadata out … results change.

    My final goal is to have a table with queries and last successful execution time… I tried also VBA but no success.

    Thanks a lot if anyone has suggestions

    Ludovico

    1. I used Table.Buffer, with no 2nd argument, just before calculating the end time and that worked for me. Table.RowCount did not work.

  9. In the Source expression, I prefer to use “if StartTime = null then null else”. This way, all of the code that is required for the timing is bunched up into 1 place and it can all be on its own line.

    Good post. I am currently using this technique to add a “Refresh Duration” column to all of my tables so then I can analyze the refresh time from within the Data Model with basically no work after the initial setup. I am also using a parameter to toggle the calculation of these columns on and off because I am using Table.Buffer to force most of the whole query to evaluate before the end time is evaluated, and Table.Buffer can slow things down in some cases. Lastly, I add the “Refresh Duration” column as part of an “Add Refresh Columns” function, which also adds the “Last Refreshed” column, which shows the date and time when the table was last refreshed, and this “Last Refreshed” column is also used in measures and visuals to track how each table is doing. I use the “Add Refresh Columns” function on every table that is loaded to the Data Model.

Leave a Reply to MennoCancel reply