To finish off my series of posts on concurrent evaluation in Fabric Dataflows Gen2 (see part 1 and part 2) I decided to do some more realistic tests to see how much parallelism I could get. To do this I uploaded 244 identical Excel files containing almost 542000 rows of data each to a SharePoint document library. Excel files are probably the worst-performing file format for dataflows (see here for some tests that show this), while SharePoint is probably the worst-performing place to store data for a dataflow and also has a reputation for throttling applications that make too many requests.
As a baseline I created a Dataflow Gen2 that combined all the data from all the Excel files in the document library and loaded it into a single table in a Fabric Warehouse; I configured this dataflow up so it did not use Partitioned Compute.

Looking at the Recent Runs data for this dataflow I could see that the query that combined the data from all the Excel files took on average 28 minutes 8 seconds to run.
I then created a copy of this dataflow and modified it so that it did use Partitioned Compute. I set the column containing the Excel filename as the partition key which meant that, in theory, the dataflow could read the data from each of the files in parallel. I then refreshed the dataflow with different maximum concurrency settings (see the previous post in this series for how to do this). Here are the results:
| Maximum Concurrency | Average Refresh Duration (Minutes:Seconds) |
| 4 | 18:37 |
| 8 | 9:42 |
| 16 | 5:24 |
| 250 | 5:21 |
| No limit set | 5:16 |
A few interesting things to note:
- Refreshes with a maximum concurrency of 4 were significantly faster than the baseline although nowhere near four times faster, possibly because of the overhead introduced by Partitioned Compute, but
- Refreshes with a maximum concurrency of 8 were around twice as fast as refreshes with a maximum concurrency of 4, and refreshes with a maximum concurrency of 16 were almost four times as fast
- There was no performance gain achieved by setting a maximum concurrency of more than 16
- Setting a limit of more than 16 resulted in the same performance as not setting a limit at all
One last thing to mention is that one of the refreshes when no limit was set failed after two minutes. It was difficult to tell why it failed but after downloading the logs and looking through them it seems that SharePoint was returning a 429 Too Many Requests error, which makes sense given that the dataflow was trying to read data from a lot of Excel files at the same time. Given that no data source, especially SharePoint, can be expected to handle an infinite number of parallel requests for data from a dataflow then this is a good example of where too much parallelism can be a bad thing and cause errors, and where it’s important to limit the amount of parallelism inside the dataflow.
You mentioned setting up Partitioned Compute for Excel files stored in a SharePoint document library. However, the official Fabric documentation for Partitioned Compute (https://learn.microsoft.com/en-us/fabric/data-factory/dataflow-gen2-partitioned-compute) explicitly states:
“Only connectors for Azure Data Lake Storage Gen2, Folder, and Azure Blob Storage emit the correct script to use partitioned compute. The connectors for SharePoint and Fabric Lakehouse do not support it today.”
How did you achieve that?
I edited the M code manually in the way described in the docs. It’s fairly easy to do.