Do you have a a large dataset in Power BI Premium or Premium Per User? Do you have more than six tables that take a significant amount of time to refresh? If so, you may be able to speed up the performance of your dataset’s refresh by increasing the number of tables that are refreshed in parallel, using a feature that was released in August 2022 but which you may have missed.
Some of you may be thinking: haven’t you blogged about refresh parallelism before? Yes: last year I wrote a post about setting the maxParallelism property when refreshing through TMSL scripts, a few months ago I also showed how you could change the same property when refreshing using the Enhanced Refresh API. That was all fairly complex though and what I’m going to show you here is, in contrast, very easy to implement.
To illustrate this I created an Import dataset containing nine tables, each of which loaded data from different CSV files. Each table contained a couple of million rows and took 20-30 seconds to refresh. I then published to a PPU workspace in the Power BI Service and used the technique described in Phil Seamark’s “Visualise your Power BI refresh” blog post to capture what happened during refresh. The refresh took 44 seconds as as you can see from this screenshot of Phil’s report, only six of the tables were refreshed in parallel at any one time:
[Ignore the y axis on this graph – I have no idea what’s going on with it]
This is as you would expect: the default number of objects that can be refreshed in parallel in Premium or Premium Per User is 6. This is also the maximum number of objects that can be refreshed in parallel in Shared capacity.
I then went to my original .pbix file, opened the Options dialog, went to the Current File/Data Load pane and changed the new “Parallel loading of tables” setting on my dataset from Default to Custom and entered the value 9:
This setting sets the maxParallelism property in the refresh commands generated by Power BI in the Power BI Service; it is fully documented here. Rerunning the refresh again in the Power BI Service I found that the overall time taken was down to 29 seconds and that all the tables were refreshed in parallel:
Here’s the Refresh command from the Profiler trace showing maxParallelism set to 9:
In summary: refresh performance got1/3 faster by making this simple change, so if you’re using Power BI Premium and not already doing advanced things with TMSL scripts or the Enhanced Refresh API, this is definitely something you should experiment with. As I said last year you can try setting this setting up to a value of 30 but there’s no guarantee you’ll get the amount of parallelism you ask for (it depends on the Premium capacity SKU and a few other factors). What’s more, setting this property too high could result in slower refresh because it might result in Power BI overloading your data source with queries.
[Thanks to Akshai Mirchandani for answering my questions about this subject]