If you’ve been following some of my recent posts about improving Power BI refresh performance by partitioning tables you will have seen a lot of screenshots that look like the one below:
It’s a visualisation from a report created by my colleague Phil Seamark (as detailed in this blog post) showing how long all the partitions in a dataset take to refresh. If you look at these visualisations you’ll probably ask the same question I did: why does the first partition always start before the others?
It turns out this is because when a table is refreshed, the first thing that has to happen is that a certain amount of data is read so the type of encoding (Value or Hash) used for each column is determined. In most cases tables only contain one partition so it’s not obvious that this is happening, but when a table has more than one partition this happens only for the first partition – which explains why the first partition seems to start before the others. You can’t avoid it happening but you can reduce the impact a little by using encoding hints (see here and here for more details): this process can be skipped for columns that have a Hash encoding hint, or which the engine knows in advance have to use Hash encoding, although it cannot be skipped for columns that have a Value encoding hint. What’s more the Execute SQL event for the first partition will have to complete before the Execute SQL events for all the other partitions can start.
[Thanks to Akshai Mirchandani for the information in this post]