Fabric Dataflows Gen2 And Concurrent Evaluation, Part 1

Did you know that if your Fabric Dataflows Gen2 contains several queries then you can control how many of them are evaluated in parallel when your dataflow refreshes? In this series I’ll look at how how you can do this and how it may result in better performance – at least in some cases.

Let’s start with the basics. I created a Dataflow Gen2 with ten queries which each returned a table of one row and one column after one minute. I used the #table function to generate the table without connecting to a data source, code from this post to add the delay and the trick in this post to make sure the delay was only applied when the dataflow refreshed. The output of each query was loaded to a Fabric Warehouse.

I then refreshed the dataflow using the default settings and found that the refresh took 2 minutes 9 seconds by looking in Recent Runs; each individual query took somewhere between 1 minute 10 seconds and 1 minute 30 seconds, which matches the 10-30 second overhead on query execution that I normally see when tuning dataflows. I refreshed it two more times and the durations were 1 minute 32 seconds and 3 minutes 38 seconds. This all suggests that the amount of concurrency was variable, and that the dataflow was sometimes able to evaluate all the queries in parallel and sometimes only able to evaluate some of them in parallel.

I then opened the dataflow, went to the Options dialog and the Scale pane, checked the “Limit number of concurrent evaluations” box and set the slider to 1:

I then refreshed the dataflow again and this time the overall refresh took 12 minutes 29 seconds, which is consistent with ten query evaluations that took 1 minute 10 seconds to 1 minute 30 seconds running one after the other.

Finally, I went back into the dataflow and set the Concurrency slider as high as possible to 250:

I then refreshed the dataflow again and this time the refresh took 1 minute 33 seconds, suggesting all the queries were evaluated in parallel.

So from all this we can see that, as you might expect, increasing the amount of concurrency improved dataflow refresh performance. There’s more to learn about concurrency though and as we shall learn later in this series, more concurrency isn’t always better.

Leave a Reply