Timing Power Query Queries

If you’re interested in performance-tuning a Power Query query, you’re going to need a reliable way of telling how long your query takes to run. It’s not as easy as it seems to adapt your query to do this though, and that’s because of the way the M formula language works. This post does show you how to find how long a query takes to run, but I have to admit that I’m not 100% satisfied with the approach and I suspect there are other, better ways of achieving the same thing that do not involve any M code. However this post does highlight a lot of interesting points about Power Query and M so I think it’s worth reading anyway…

M language background

In my last post, showing how to create a Power Query function that waits a given number of seconds, I used the following query to demonstrate the function:

```let
//Find current time
One = DateTime.LocalNow(),
//Define Wait function
Wait = (seconds as number, action as function) =>
if (List.Count(
List.Generate(
() => DateTimeZone.LocalNow() + #duration(0,0,0,seconds),
(x) => DateTimeZone.LocalNow() < x,
(x) => x)
) = 0)
then null else action(),
//Call Wait function to wait 5 seconds
//then return the current time again
Two = Wait(5,DateTime.LocalNow),
//Display start time, finish time, and duration
Output = "Start Time: " & DateTime.ToText(One) &
" Finish Time: " & DateTime.ToText(Two) &
" Difference: " & Duration.ToText(Two-One)
in
Output
```

The output of this seems very straightforward:

However, consider the output when you change the last step in the query to be:

```    Output = "Finish Time: " & DateTime.ToText(Two) &
" Start Time: " & DateTime.ToText(One) &
" Difference: " & Duration.ToText(Two-One)
```

The only difference is that the finish time is displayed before the start time, and yet the output is now:

The start time and the finish time are now the same, and the duration is 0. However the query does still take the same amount of time to run. What’s going on here?

Looking at the M code for a query, you’d be forgiven for thinking that the first step in a query is evaluated first, the second step is evaluated second, the last step is evaluated last, and so on. This is not the case though. From page 10 of the Language Reference document (which can be downloaded here):

List and record member expressions (as well as let expressions, introduced further below) are

evaluated using lazy evaluation, which means that they are evaluated only as needed.

A Power Query query is usually a single let expression, and within a let expression steps are evaluated as and when they are needed; if a step’s output is not needed anywhere else then it may not be evaluated at all. Therefore, in the second example above, even though the step called One is the first step in the query, it is evaluated after the step called Two because that is the order that the values are needed in the expression calculated for the step Output.

Another important point to understand is that (from page 30 of the Language Reference):

Once a value has been calculated, it is immutable, meaning it can no longer be changed.

This simplifies the model for evaluating an expression and makes it easier to reason about the

result since it is not possible to change a value once it has been used to evaluate a subsequent

part of the expression.

In our example, this means that once StartTime and EndTime have been evaluated for the first time, the value they return will not change on subsequent calls.

The final thing to point out is that I’m using the DateTime.LocalNow() function here to get the system date and time at the point that the expression is evaluated. If you use DateTime.FixedLocalNow() it will return the same value every time you call it in the same query – which is obviously not going to be much use for our purposes.

How to time your queries

Now we know all this we can alter a query that does something useful so we can see how long it takes to execute. Getting status updates from Facebook is sufficiently slow for us to be able to measure a duration, and here’s an example query that shows how to measure how long the request takes:

```let
//Get the current time, which will be the query start time
StartTime = DateTime.LocalNow(),
//Make sure StartTime is evaluated just before we request data from Facebook
Source = if StartTime<>null
then
else
null,
//Find the number of rows returned
NumberOfRows = Number.ToText(Table.RowCount(Source)),
//Get the current time, which will be the query end time
EndTime = DateTime.LocalNow(),
//Make sure PQ evvaluates all expressions in the right order:
//first, get the Number of rows, which ensure that
//the Source expression is evaluated, and in turn StartTime is evaluated
//second, the EndTime is evaluated as part of the duration calculation
Output = "Query returned " & NumberOfRows
& " rows and took "
& Duration.ToText(EndTime - StartTime)
in
Output
```

Here’s an example of the output of the query:

How does this work?

• The Output step has the expression that returns the query output
• The first value evaluated in the Output step is the NumberOfRows step, which finds the number of rows returned by Facebook in turn forces the evaluation of the Source step. If we didn’t include an expression in the output that was in some way derived from data loaded from Facebook, Power Query would not bother to evaluate the Source step.
• The Source step gets data from Facebook, but before that happens there is an if expression that checks to see whether StartTime is null or not. Of course this always returns true, but the point here is that by making this check we are forcing StartTime to be evaluated.
• The second value evaluated in Output is the duration. The duration is the EndTime (evaluated for the first time at this point) minus the StartTime (which has already been evaluated, remember), so we get the amount of time elapsed between the start of the request to Facebook and the end of the query.

With all of the other stuff stripped out, here’s the basic template I’ve used:

```let
StartTime = DateTime.LocalNow(),
Source = if StartTime<>null
then
//do whatever we were going to do in our first step
else
null,

//insert all other steps here

NumberOfRows = Number.ToText(Table.RowCount(WhateverThePreviousStepIs)),
EndTime = DateTime.LocalNow(),
Output = "Query returned " & NumberOfRows
& " rows and took "
& Duration.ToText(EndTime - StartTime)
in
Output
```

I’m making a pretty big assumption here, though, and that is that finding the number of rows in the NumberOfRows step will force the evaluation of all previous steps, which may not be the case! You might have to think carefully about how you write an expression that forces evaluation of all of your steps.

This is all very complicated (indeed, probably too complicated) so I’m also investigating what the Power Query trace file can tell you about things like query execution times. Hopefully I’ll be able to blog about that in a few weeks.

You can download an Excel workbook with some sample Power Query queries here.

24 responses

• Sedat Onat says:

Mr. Webb

I could not add to my query as shown below;

let
ArchivedData = (page as number) =>
let
Source = Web.Page(Web.Contents(“http://uzmanpara.milliyet.com.tr/borsa/gecmis-kapanislar/?Pagenum=” & Number.ToText(page) & “&tip=Hisse&gun=20&ay=2&yil=2015&Harf=-1″)),
Data0 = Source{0}[Data],
#”Changed Type” = Table.TransformColumnTypes(Data0,{{“Menkul Adı”, type text}})
in
#”Changed Type”,
#”Invoked FunctionArchivedData” = ArchivedData(5)
in
#”Invoked FunctionArchivedData”

Best regards

• Can you tell me what error you get? This query worked for me when I pasted it into Power Query, although I did have to change the double quotation marks from your query to the ” character after pasting it from the web.

• Mr Webb

Error is; “Waiting for uzmanpara.milliyet.com.tr”
I have waited very long time but no result
I think web site blocks me due to large number of requests in a short time

1. Yes I’ve seen that but unfortunately couldn’t add to my query
Could not figure out how to do

2. Colin Banfield says:

Hey Chris,

I think that Date.FixedLocalNow() is the correct function for StartTime, since it is immutable for multiple calls within the script. Date.LocalNow() is mutable, and changes every time it is called within the script. After running the Facebook script with Date.LocalNow(), the timing was sub-one second. Using Date.FixedLocalNow(), the timing was between 1 and 2 seconds. I measured 2 seconds on my stopwatch, suggesting the real time was between 1 and two seconds.

Could you do a similar timing checks to see if you get similar results?

• Hi Colin, it’s difficult to know for sure but I don’t see how using Date.FixedLocalNow() just the start time would make any difference if it was only being called once anyway? When I test I using the Facebook API I get a lot of variation in the response time, which means I can’t compare the two functions. There could also be all kinds of other stuff happening in Power Query and Excel after the query has finished executing that makes it even harder to measure. This is one for the dev team, I think.

3. Colin Banfield says:

Hi Chris,

After posting my last message, it occurred to me that I had a better way of testing the timing assumptions. I used a prime number generator (a tail-recursive version of the one that Curt posted on the TechNet forum). On my desktop, it takes about 15 sec to generate 70K prime numbers (stopwatch timing). Using your timing technique, I recorded around 14.3 sec. This number was roughly the same for StartTime set to either DateTime.LocalNow() or DateTime.FixedLocalNow(). That being said, I missed the mark in the following two key areas:

1) You are absolutely correct in stating that StartTime is immutable after being set, period.

2) You are also correct about the behaviour of DateTime.FixedLocalNow(). When both Start and End times are set to DateTime.FixedLocalNow(), the difference is zero, even though the function is being called at different times. So what is the single fixed time based on? The result would indicate that it’s set to the current date & time when first called during the query execution (when StartTime is called in this case). All subsequent calls to DateTime.FixedLocalNow() (EndTime in this case) uses the first set value – it’s not recalculated at all until the query is refreshed. I completely misinterpreted the function’s description, although it now makes sense after re-reading it. 🙂

Lastly, although I very much like your technique, I am somewhat concerned by the variations I get in milliseconds. The prime number generator is not influenced by any external source, so it’s not clear why successive refreshes should vary much at all.

4. Colin Banfield says:

Ehren’s response is a revelation. I’ve confirmed that you can set StartTime to DateTime.FixedLocalNow(), and you don’t have to explicitly call it in the Source step! This is an exception to the normal workings of M.

5. Julian says:

Hi Chris,
In addition to the information of query time, what the most required is the data table itself. Therefore, could you please show me how to load the table onto the spreadsheet by adding the query time as a footnote or report title? Thanks.

• Unfortunately that’s not an easy thing to do – because a query can only return a single table, you cannot return the query time separately.

6. Menno says:

Will definitely give this a shot at some point in the near future.

Have also been reading about buffered tables and the powerquery execution model. So far I’ve been timing execution time of my entire chain with a stopwatch and I have been seeing some pretty varying results (from 50s to 2m30s). This is on a workbook with 3 source tables in Excel (hundreds / thousands of lines), and a chain of about 14 queries of varying complexity.

On what exactly depends the execution time? For instance, should I expect different results after:
-running the same query chain with the same data? is anything reused from cache?
-changing small parts of the query
-restarting excel (saving, closing, reopening)
-making changes to the source tables in excel

I found the posts about the caching model and query execution chain very confusing; even more so when I have measured execution time of 50s before the weekend after making small changes, and now after pasting new data (of similar size) it’s suddenly back to over 2m again.

7. Menno says:
8. v55 says:

———–M code —————–

let
ParseJsonLine = (#”input content” as binary) => let
#”Imported Text” = Table.FromColumns({Lines.FromBinary(#”input content”,null,null,1252)}),
#”Parsed JSON” = Table.TransformColumns(#”Imported Text”,{},Json.Document)
in
#”Parsed JSON”,

//Expand each JSON content value with encoding as 1252
ReadAllJsonBlobs = Table.AddColumn(RemoveOtherColumns , “JSONContent”, each Json.Document([Content],1252)),

//Json line format will fail with json parser – rows with error will return jsonline format
JsonLineFormatTable = Table.SelectRowsWithErrors(ReadAllJsonBlobs, {“JSONContent”}),
RemoveOtherColumns1 = Table.RemoveColumns(ReadAllJsonLineBlobs,{“Content”, “JSONContent”}),
JsonLineFormatRecords = Table.ExpandTableColumn(RemoveOtherColumns1, “JSONLineContent”, {“Column1”}, {“records”}),

//Json format will be parsed by json parser – rows without error will return json format
CurrentJsonFormatTable = Table.RemoveRowsWithErrors(ReadAllJsonBlobs, {“JSONContent”}),
ExpandJSON = Table.ExpandRecordColumn(CurrentJsonFormatTable, “JSONContent”, {“records”}, {“records”}),
ExpandRecords = Table.ExpandListColumn(ExpandJSON , “records”),
CurrentJsonFormatRecords = Table.RemoveColumns(ExpandRecords,{“Content”}),

// Combining both current format and jsonline format records
CombinedTable = Table.Combine({CurrentJsonFormatRecords, JsonLineFormatRecords }),

——————————-

The above query is to parse json or json line data from blobs. Basically I have two parts: one to parse json and other to parse jsonline. I am first parsing with json methods and if it fails I will parse with my custom parser (parseJsonLine method). I cant get rid of double parsing as there is no way I can distinguish the blob data format. I split the table into two parts – SlectRowswithErrors (to parse jsonline) and RemoveRowsWithErrors (to parse json data).

Initially I used Table.Combine({CurrentJsonFormatRecords, JsonLineFormatRecords }) but I noticed a massive performance hit. From powerBI UI itself, I could see it was evident it was trying to load multiple times.

I noticed this line from your book on powerquery that Table.RemoveRowsWithErrors prevents query folding.
So when I changed the query to Table.Combine({JsonLineFormatRecords , CurrentJsonFormatRecords}), I did not see this performance issue as this line Table.RemoveRowsWithErrors comes late.

I have three questions:
1. RemoveRowsWithErrors prevent query folding. Will SelectRowsWithErrors also prevent query folding?
2. When I tried with both the code (two reorder of table.combine), I did not see any change in query timing (I used the query timing code mentioned in this blog). Then how come during refresh, there was double loading for one order of table.combine?
3. How do we monitor performance for blob data? We have GBs of data and we see performance hit and timeout happens in powerBI service. Any suggestions for profiling blob data?

• I don’t know any of the answers to your questions, I’m afraid – but if you do find out, I would be interested to know! The trouble is that the Power Query engine is a black box…

This site uses Akismet to reduce spam. Learn how your comment data is processed.