Timing Power Query Queries

If you’re interested in performance-tuning a Power Query query, you’re going to need a reliable way of telling how long your query takes to run. It’s not as easy as it seems to adapt your query to do this though, and that’s because of the way the M formula language works. This post does show you how to find how long a query takes to run, but I have to admit that I’m not 100% satisfied with the approach and I suspect there are other, better ways of achieving the same thing that do not involve any M code. However this post does highlight a lot of interesting points about Power Query and M so I think it’s worth reading anyway…

M language background

In my last post, showing how to create a Power Query function that waits a given number of seconds, I used the following query to demonstrate the function:

	
let
    //Find current time
    One = DateTime.LocalNow(),
    //Define Wait function
    Wait = (seconds as number, action as function) => 
            if (List.Count(
             List.Generate(
              () => DateTimeZone.LocalNow() + #duration(0,0,0,seconds), 
              (x) => DateTimeZone.LocalNow() < x, 
              (x) => x)
              ) = 0) 
               then null else action(),
    //Call Wait function to wait 5 seconds
    //then return the current time again
    Two = Wait(5,DateTime.LocalNow),
    //Display start time, finish time, and duration
    Output = "Start Time: " & DateTime.ToText(One) & 
                   " Finish Time: " & DateTime.ToText(Two) & 
                   " Difference: " & Duration.ToText(Two-One)
in
    Output

 

The output of this seems very straightforward:

image

However, consider the output when you change the last step in the query to be:

    Output = "Finish Time: " & DateTime.ToText(Two) & 
                   " Start Time: " & DateTime.ToText(One) & 
                   " Difference: " & Duration.ToText(Two-One)

 

The only difference is that the finish time is displayed before the start time, and yet the output is now:

image

The start time and the finish time are now the same, and the duration is 0. However the query does still take the same amount of time to run. What’s going on here?

Looking at the M code for a query, you’d be forgiven for thinking that the first step in a query is evaluated first, the second step is evaluated second, the last step is evaluated last, and so on. This is not the case though. From page 10 of the Language Reference document (which can be downloaded here):

List and record member expressions (as well as let expressions, introduced further below) are

evaluated using lazy evaluation, which means that they are evaluated only as needed.

A Power Query query is usually a single let expression, and within a let expression steps are evaluated as and when they are needed; if a step’s output is not needed anywhere else then it may not be evaluated at all. Therefore, in the second example above, even though the step called One is the first step in the query, it is evaluated after the step called Two because that is the order that the values are needed in the expression calculated for the step Output.

Another important point to understand is that (from page 30 of the Language Reference):

Once a value has been calculated, it is immutable, meaning it can no longer be changed.

This simplifies the model for evaluating an expression and makes it easier to reason about the

result since it is not possible to change a value once it has been used to evaluate a subsequent

part of the expression.

In our example, this means that once StartTime and EndTime have been evaluated for the first time, the value they return will not change on subsequent calls.

The final thing to point out is that I’m using the DateTime.LocalNow() function here to get the system date and time at the point that the expression is evaluated. If you use DateTime.FixedLocalNow() it will return the same value every time you call it in the same query – which is obviously not going to be much use for our purposes.

How to time your queries

Now we know all this we can alter a query that does something useful so we can see how long it takes to execute. Getting status updates from Facebook is sufficiently slow for us to be able to measure a duration, and here’s an example query that shows how to measure how long the request takes:

let
    //Get the current time, which will be the query start time
    StartTime = DateTime.LocalNow(),
    //Make sure StartTime is evaluated just before we request data from Facebook
    Source = if StartTime<>null
	     then 
             Facebook.Graph("https://graph.facebook.com/me/statuses")
             else
	     null,
    //Find the number of rows returned
    NumberOfRows = Number.ToText(Table.RowCount(Source)),
    //Get the current time, which will be the query end time
    EndTime = DateTime.LocalNow(),
    //Make sure PQ evvaluates all expressions in the right order:
    //first, get the Number of rows, which ensure that
    //the Source expression is evaluated, and in turn StartTime is evaluated
    //second, the EndTime is evaluated as part of the duration calculation
    Output = "Query returned " & NumberOfRows 
             & " rows and took " 
             & Duration.ToText(EndTime - StartTime)
in
    Output

Here’s an example of the output of the query:

image

How does this work?

  • The Output step has the expression that returns the query output
  • The first value evaluated in the Output step is the NumberOfRows step, which finds the number of rows returned by Facebook in turn forces the evaluation of the Source step. If we didn’t include an expression in the output that was in some way derived from data loaded from Facebook, Power Query would not bother to evaluate the Source step.
  • The Source step gets data from Facebook, but before that happens there is an if expression that checks to see whether StartTime is null or not. Of course this always returns true, but the point here is that by making this check we are forcing StartTime to be evaluated.
  • The second value evaluated in Output is the duration. The duration is the EndTime (evaluated for the first time at this point) minus the StartTime (which has already been evaluated, remember), so we get the amount of time elapsed between the start of the request to Facebook and the end of the query.

With all of the other stuff stripped out, here’s the basic template I’ve used:

let
    StartTime = DateTime.LocalNow(),
    Source = if StartTime<>null
	     then 
             //do whatever we were going to do in our first step
             else
	     null,

    //insert all other steps here

    NumberOfRows = Number.ToText(Table.RowCount(WhateverThePreviousStepIs)),
    EndTime = DateTime.LocalNow(),
    Output = "Query returned " & NumberOfRows 
             & " rows and took " 
             & Duration.ToText(EndTime - StartTime)
in
    Output

I’m making a pretty big assumption here, though, and that is that finding the number of rows in the NumberOfRows step will force the evaluation of all previous steps, which may not be the case! You might have to think carefully about how you write an expression that forces evaluation of all of your steps.

This is all very complicated (indeed, probably too complicated) so I’m also investigating what the Power Query trace file can tell you about things like query execution times. Hopefully I’ll be able to blog about that in a few weeks.

You can download an Excel workbook with some sample Power Query queries here.

18 thoughts on “Timing Power Query Queries

    • Mr. Webb
      Could you please help me?

      I could not add to my query as shown below;

      let
      ArchivedData = (page as number) =>
      let
      Source = Web.Page(Web.Contents(“http://uzmanpara.milliyet.com.tr/borsa/gecmis-kapanislar/?Pagenum=” & Number.ToText(page) & “&tip=Hisse&gun=20&ay=2&yil=2015&Harf=-1″)),
      Data0 = Source{0}[Data],
      #”Changed Type” = Table.TransformColumnTypes(Data0,{{“Menkul Adı”, type text}})
      in
      #”Changed Type”,
      #”Invoked FunctionArchivedData” = ArchivedData(5)
      in
      #”Invoked FunctionArchivedData”

      Best regards

  1. Hey Chris,

    I think that Date.FixedLocalNow() is the correct function for StartTime, since it is immutable for multiple calls within the script. Date.LocalNow() is mutable, and changes every time it is called within the script. After running the Facebook script with Date.LocalNow(), the timing was sub-one second. Using Date.FixedLocalNow(), the timing was between 1 and 2 seconds. I measured 2 seconds on my stopwatch, suggesting the real time was between 1 and two seconds.

    Could you do a similar timing checks to see if you get similar results?

    • Hi Colin, it’s difficult to know for sure but I don’t see how using Date.FixedLocalNow() just the start time would make any difference if it was only being called once anyway? When I test I using the Facebook API I get a lot of variation in the response time, which means I can’t compare the two functions. There could also be all kinds of other stuff happening in Power Query and Excel after the query has finished executing that makes it even harder to measure. This is one for the dev team, I think.

  2. Hi Chris,

    After posting my last message, it occurred to me that I had a better way of testing the timing assumptions. I used a prime number generator (a tail-recursive version of the one that Curt posted on the TechNet forum). On my desktop, it takes about 15 sec to generate 70K prime numbers (stopwatch timing). Using your timing technique, I recorded around 14.3 sec. This number was roughly the same for StartTime set to either DateTime.LocalNow() or DateTime.FixedLocalNow(). That being said, I missed the mark in the following two key areas:

    1) You are absolutely correct in stating that StartTime is immutable after being set, period.

    2) You are also correct about the behaviour of DateTime.FixedLocalNow(). When both Start and End times are set to DateTime.FixedLocalNow(), the difference is zero, even though the function is being called at different times. So what is the single fixed time based on? The result would indicate that it’s set to the current date & time when first called during the query execution (when StartTime is called in this case). All subsequent calls to DateTime.FixedLocalNow() (EndTime in this case) uses the first set value – it’s not recalculated at all until the query is refreshed. I completely misinterpreted the function’s description, although it now makes sense after re-reading it. 🙂

    Lastly, although I very much like your technique, I am somewhat concerned by the variations I get in milliseconds. The prime number generator is not influenced by any external source, so it’s not clear why successive refreshes should vary much at all.

  3. Ehren’s response is a revelation. I’ve confirmed that you can set StartTime to DateTime.FixedLocalNow(), and you don’t have to explicitly call it in the Source step! This is an exception to the normal workings of M.

  4. Hi Chris,
    In addition to the information of query time, what the most required is the data table itself. Therefore, could you please show me how to load the table onto the spreadsheet by adding the query time as a footnote or report title? Thanks.

  5. Will definitely give this a shot at some point in the near future.

    Have also been reading about buffered tables and the powerquery execution model. So far I’ve been timing execution time of my entire chain with a stopwatch and I have been seeing some pretty varying results (from 50s to 2m30s). This is on a workbook with 3 source tables in Excel (hundreds / thousands of lines), and a chain of about 14 queries of varying complexity.

    On what exactly depends the execution time? For instance, should I expect different results after:
    -running the same query chain with the same data? is anything reused from cache?
    -changing small parts of the query
    -restarting excel (saving, closing, reopening)
    -making changes to the source tables in excel

    I found the posts about the caching model and query execution chain very confusing; even more so when I have measured execution time of 50s before the weekend after making small changes, and now after pasting new data (of similar size) it’s suddenly back to over 2m again.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s