For a while now I’ve had an idea stuck in my head: wouldn’t it be cool to build a Power BI solution where a user could enter data into an Excel workbook and then, as soon as they had done so, they could see their new data in a Power BI report? It would be really useful for planning/budgeting applications and what-if analysis. I had hoped that a DirectQuery model using the CData Excel custom connector (mentioned here) might work but the performance wasn’t good enough; using Flow with the Power BI REST API (see Part 1 of this series for details on how to get this set up) gets me closer to my goal, even if there’s still one major problem with the approach. Here’s how…
First of all, I have two Excel files saved in a OneDrive For Business folder:
The contents of each Excel file is more or less the same, a table containing budget data for four quarters:
These are the files that my hypothetical users will be editing. I also have a very basic Power BI dataset that imports the data from these two files, and a report to display the data:
Now for the Flow. For the trigger I use a OneDrive For Business “When a file is modified” that fires when a file in the above folder is changed:
I had to turn on Concurrency Control for the trigger and set the Degree of Parallelism property to 1, to make sure only one instance of the Flow ran at any one time – there’s no point trying to refresh the dataset if another instance of the Flow is already running.
Next I have an action from the Power BI custom connector created in my last post. It calls the Power BI API’s Get Refresh History In Group endpoint and gets details of the last refresh of the dataset:
This returns a JSON value containing the details of the last refresh; the JSON goes to a Parse JSON action so later on I can get the Status of the last refresh. This action requires a schema, which I generated from a sample payload given in the Power BI REST API documentation. I had to delete endTime from the list of required properties in the schema because it won’t be returned if the most recent refresh is currently in progress:
Next there’s an Apply to each to loop over each refresh returned by the API; since only the most recent refresh is returned I know this will only ever execute once.
Inside the Apply to each I have a Condition that looks at the Status property of the most recent refresh and checks whether the Status is Completed:
If the Status is indeed Completed, then the Flow calls the Refresh Dataset In Group endpoint to refresh the dataset:
If the Status of the last refresh is not Completed (either because it failed for some reason, or because there is already a refresh in progress), there is a delay of one minute and the Flow then starts itself again:
As I’ve said before, I’m not an expert on Flow but this seems to work reliably with the limited testing I’ve done on it. Here it is in action:
So what is the problem with this approach? It’s that if you are using Power BI Pro you can only refresh a dataset eight times a day and if you are using Power BI Premium you can only refresh a dataset forty-eight times a day. There have been rumours that this forty-eight times a day refresh limit for Premium has been lifted, or might be lifted soon, or doesn’t apply when you refresh via the API, but I specifically tested this and got the following error message: “Invalid dataset refresh request. Number of refresh requests in last 24 hours exceeded limit” (interestingly this didn’t appear as a failed refresh in the refresh history of the dataset). Since it’s reasonable to assume that any Excel file will be updated more than forty-eight times per day in a busy period I don’t think this approach is usable in the real world; all the same I’m happy with this as a proof-of-concept showing what might be possible if the refresh limit is lifted in the future though.