Power BI Automatic Refresh Retries

One of the most confusing things about troubleshooting Power BI refresh problems is the way Power BI will sometimes try running a refresh again after it has failed. It means that refreshes seem to take a lot longer than you would expect and also that more queries are run against your data source than you would expect. I don’t have all the answers but I thought it would be useful to highlight a few scenarios where it does happen.

A while ago I wrote a series of posts on dataset refresh timeouts which started out with this post on the fact that no refresh in the Power BI portal can last more than 2 hours (in Shared/Pro capacity) or 5 hours (in Premium), as documented here. While writing that post I built a dataset that take more than 5 hours to refresh; back then, and testing it again now, I found that it seemed to take a lot longer than 5 hours to fail.

What happened? I kicked off the refresh at 07:33 UTC one morning. With Log Analytics enabled on the workspace I could run the following KQL query to see what happened five hours later:

PowerBIDatasetsWorkspace
| where TimeGenerated>datetime(2023-05-05 12:32:00) and TimeGenerated<datetime(2023-05-05 12:35:00)
| where OperationName in ('CommandBegin', 'CommandEnd', 'Error', 'ProgressReportBegin', 'ProgressReportEnd', 'ProgressReportError')
| where OperationDetailName<>'ExtAuth'
| project TimeGenerated, OperationName, OperationDetailName, DurationMs, CpuTimeMs, EventText, Status, StatusCode
| order by TimeGenerated asc

Sure enough, around 12:34 UTC I could see errors happening for the existing refresh, then (on line 22 in the screenshot below) a CommandEnd event corresponding to the refresh that failed and which had run for five hours. After that (on line 25 in the screenshot below) there was a CommandBegin event for a new refresh on the same dataset:

Interestingly, even though I could see from Log Analytics that the refresh had restarted and I could also see the refresh “spinny” spinning in the workspace, the Refresh History dialog was already showing a timeout:

[I’m told that we have plans to improve this and provide more details on what’s happening in the UI in the future]

The same thing happened again with the next refresh failing at 17:34 UTC that day, five hours later, again at 23:36 UTC, again at 03:46 UTC the next day; there was then a delay of one hour when the refresh started again, at 04:35 UTC and the final failure came one hour after that at 05:37 UTC. Lots of failures and lots of retries.

I happened to capture a similar refresh retry in a Profiler trace recently. I scheduled a refresh for 23:00 UTC on a busy Premium capacity and saw the following:

This trace shows:

  • No activity before 23:05, which I think shows how busy the capacity was at the time
  • The first attempted refresh kicked off at 23:05:10 (the first highlighted pair of Command Begin/End events in the screenshot) and failed immediately.
  • The error message associated with this failure was “You’ve exceeded the capacity limit for dataset refreshes. Try again when fewer datasets are being processed”, an error documented here. This was caused by reaching the limit for the number of concurrent datasets that can be refreshed on a single capacity (as documented in the Model Refresh Parallelism column of the table here).
  • The dataset refresh kicked off again at 23:06:10 and finished successfully after 42 seconds (the second highlighted pair of Command Begin/End events in the screenshot above).

So two scenarios where Power BI retried a refresh after an initial failure. As I said, though, not all refresh failures result in retries. For example, when I tried refreshing the dataset in this post which fails with a timeout on the SQL Server side, I could see that Power BI only ran the refresh command once:

I don’t know when the Power BI Service does and doesn’t retry a refresh after a failure but I’m trying to find out; I’ll blog again if I do get more information.

While you don’t have any control over retry behaviour if you schedule a refresh in the Power BI Service or kick off a manual refresh from the portal, you do get more control if you refresh using the Enhanced Refresh API and set the retryCount parameter. Indeed the docs for the Enhanced Refresh API have some extra detail about how retryCount and the five hour timeout interact here, mentioning that even if you keep retrying a refresh it will always fail after 24 hours and that the total time for a refresh includes the time taken for all failures before a success. This, for me, is one more reason to use the Enhanced Refresh API for refreshing enterprise-level datasets.

7 thoughts on “Power BI Automatic Refresh Retries

  1. When you recommend “one more reason to use the Enhanced Refresh API for refreshing enterprise-level datasets.” What service you use to manage refreshes? Something like Azure Synapse/Data Factory pipelines or other options?

    Happy about this! [I’m told that we have plans to improve this and provide more details on what’s happening in the UI in the future]

    Great post, thanks for sharing.

  2. Nice post! Another terrible issue regarding refreshes is with Dataflows. If, for some reason, a getway is down and many dataflows have refresh errors, after the gateway turns on again, all the dataflows will try to refresh at the same time (and each failure they will try to trigger the refresh several times either) causing a breakdown on the gateway and then having this problem again and again and again.

  3. The Power BI web UI’s “Refresh history” pop-up seems to have been enhanced recently. If a refresh failed, the Details section now lists the re-tries, and the time taken for each.

    Given these can be a waste of time and chew up capacity, it would be good if we could set the retryCount through the web UI.

  4. Thanks for this post. Was there ever a follow up?

    I think one reason why it is so hard to investigate retries is because there is a huge double standard going on. If the failures which happen because of a problem in the customer’s environment or resources are treated differently. Those are the ones that Microsoft would allow to bubble up and stick right in your face!

    However when the failures are happening because of bugs or reliability issues in the Power BI service (SaaS), then Microsoft will try to obscure them as much as possible. It would be fine if they could make the problems go away without unintended side-effects. But the those side-effects arise regularly and cause a lot of problems (ie. they feel at liberty to launch the same mashup over and over, and hammer our resources via their gateway!)

    Currently I have a case underway which involve duplicate mashups that are launching for unknownable reasons. We were able to find an exception but it is meaningless, and not actionable.

    >> Spooling failed with error: The operation failed due to an explicit cancellation. Exception: System.Threading.Tasks.TaskCanceledException: A task was canceled. at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.PowerBI.DataMovement.Pipeline.Dataflow.TDFHelpers MoveNext

    At the end of the day, I think the implicit retries can cause more problems than they solve, and I don’t know why Microsoft won’t allow us to configure this as a customizable preference.

    Another story – we have noticed Microsoft taking the most liberties in their “azure managed vnet gateway”. That thing has always been pretty buggy; and so they sprinkle lots of retry logic all over the place. Unfortunately one consequence is on our Azure bill for our SQL Server resources. I suppose Microsoft might allow me to forward that Azure bill from the SQL team to the Power BI team. But probably not.

  5. These automatic retries must be made optional. It is our money (in CUs) that is being wasted without anything we can do against it, I can’t even.

Leave a Reply to Lutz BendlinCancel reply