One of the most confusing things about troubleshooting Power BI refresh problems is the way Power BI will sometimes try running a refresh again after it has failed. It means that refreshes seem to take a lot longer than you would expect and also that more queries are run against your data source than you would expect. I don’t have all the answers but I thought it would be useful to highlight a few scenarios where it does happen.
A while ago I wrote a series of posts on dataset refresh timeouts which started out with this post on the fact that no refresh in the Power BI portal can last more than 2 hours (in Shared/Pro capacity) or 5 hours (in Premium), as documented here. While writing that post I built a dataset that take more than 5 hours to refresh; back then, and testing it again now, I found that it seemed to take a lot longer than 5 hours to fail.
What happened? I kicked off the refresh at 07:33 UTC one morning. With Log Analytics enabled on the workspace I could run the following KQL query to see what happened five hours later:
PowerBIDatasetsWorkspace | where TimeGenerated>datetime(2023-05-05 12:32:00) and TimeGenerated<datetime(2023-05-05 12:35:00) | where OperationName in ('CommandBegin', 'CommandEnd', 'Error', 'ProgressReportBegin', 'ProgressReportEnd', 'ProgressReportError') | where OperationDetailName<>'ExtAuth' | project TimeGenerated, OperationName, OperationDetailName, DurationMs, CpuTimeMs, EventText, Status, StatusCode | order by TimeGenerated asc
Sure enough, around 12:34 UTC I could see errors happening for the existing refresh, then (on line 22 in the screenshot below) a CommandEnd event corresponding to the refresh that failed and which had run for five hours. After that (on line 25 in the screenshot below) there was a CommandBegin event for a new refresh on the same dataset:
Interestingly, even though I could see from Log Analytics that the refresh had restarted and I could also see the refresh “spinny” spinning in the workspace, the Refresh History dialog was already showing a timeout:
[I’m told that we have plans to improve this and provide more details on what’s happening in the UI in the future]
The same thing happened again with the next refresh failing at 17:34 UTC that day, five hours later, again at 23:36 UTC, again at 03:46 UTC the next day; there was then a delay of one hour when the refresh started again, at 04:35 UTC and the final failure came one hour after that at 05:37 UTC. Lots of failures and lots of retries.
I happened to capture a similar refresh retry in a Profiler trace recently. I scheduled a refresh for 23:00 UTC on a busy Premium capacity and saw the following:
This trace shows:
- No activity before 23:05, which I think shows how busy the capacity was at the time
- The first attempted refresh kicked off at 23:05:10 (the first highlighted pair of Command Begin/End events in the screenshot) and failed immediately.
- The error message associated with this failure was “You’ve exceeded the capacity limit for dataset refreshes. Try again when fewer datasets are being processed”, an error documented here. This was caused by reaching the limit for the number of concurrent datasets that can be refreshed on a single capacity (as documented in the Model Refresh Parallelism column of the table here).
- The dataset refresh kicked off again at 23:06:10 and finished successfully after 42 seconds (the second highlighted pair of Command Begin/End events in the screenshot above).
So two scenarios where Power BI retried a refresh after an initial failure. As I said, though, not all refresh failures result in retries. For example, when I tried refreshing the dataset in this post which fails with a timeout on the SQL Server side, I could see that Power BI only ran the refresh command once:
I don’t know when the Power BI Service does and doesn’t retry a refresh after a failure but I’m trying to find out; I’ll blog again if I do get more information.
While you don’t have any control over retry behaviour if you schedule a refresh in the Power BI Service or kick off a manual refresh from the portal, you do get more control if you refresh using the Enhanced Refresh API and set the retryCount parameter. Indeed the docs for the Enhanced Refresh API have some extra detail about how retryCount and the five hour timeout interact here, mentioning that even if you keep retrying a refresh it will always fail after 24 hours and that the total time for a refresh includes the time taken for all failures before a success. This, for me, is one more reason to use the Enhanced Refresh API for refreshing enterprise-level datasets.