This post is really just a quick follow-on from my post earlier this week on using Process Monitor to troubleshoot Power Query performance issues with file-based data sources, which I suggest you read before carrying on. I realised, after playing around with Process Monitor some more, that the ReadFile operation actually tells you how much data is being read from a file when a Power Query query is running. For example, here’s a sample of some of the ReadFile operations captured while running the unoptimised version of the query I talked about in my last post:
Since Process Monitor can export captured events to a CSV file, it’s pretty easy to load the events into Power BI, filter the events down to only the ReadFile operations, parse the Detail column to extract the Offset values (which I’m sure you can work out how to do if you’re reading a post like this), and then draw a graph showing how much data gets read from a file when a query is run. Here’s what the graph looks like for the unoptimised version of the query from my previous blog post, with relative time on the x axis and the amount of data read in bytes on the y axis:
In that post I noted that there were six reads of the file – and while that’s clear from the graph above, it’s also possible to see that the first read does not read the whole contents of the file while the next five do (the file is 149MB). So maybe I was right that there is one complete read of the file for each row in the output query? What is that first, partial read for, I wonder?