When you tune a Power BI report or dataset refresh your first job is to make it run faster – users hate slow reports and late data. However, if you are using Power BI Premium you also need to think about reducing the amount of CPU used by a query or refresh as well, and in this post I’ll explain why.
First of all, what are the metrics I’m talking about? For query performance tuning, if you connect SQL Server Profiler to Power BI Desktop or the Power BI Service (as described here for example) and look at the Query End event you can see all the DAX queries executed when a report is rendered. The TextData column gives the query text, the Duration column shows the amount of time the query took to run in milliseconds and the CPUTime column is a representation of the amount of CPU resources used by the query:
The Command End event gives you similar information for refreshes:
The same data is available in Log Analytics, albeit with slightly different column names: the EventText holds the query text, DurationMs has the duration and CpuTimeMs has the CPU time:
As I said, it’s the duration that, traditionally, you would have wanted to reduce. In fact, in the past, CPU Time didn’t even return accurate values so it wasn’t useful – although that changed recently.
So why is CPU Time so important? In Power BI Premium gen1 you have a fixed set of resources that all your queries and refreshes compete for; if you try to use more CPU than you have available then performance suffers. Power BI Premium gen2 works differently (as explained in detail here and here): you no longer have a fixed set of resources available, but if you use more CPU than your capacity allows you will either pay extra (if you enable autoscale) or get throttled. The Premium Capacity Metrics app gives you a lot of information about CPU usage across your Premium capacities. Power BI Premium Per User is based on the Power BI Premium gen2 platform and some of the same considerations exist for it as for Premium gen2, although they aren’t fully documented or implemented yet and I can’t go into details here.
As a result it’s important for the overall cost and performance of a Premium capacity that all the queries and refreshes that run on it use as little CPU as possible. One last point: while Duration and CPU Time are linked the relationship is not necessarily linear. For example you could tune a Storage Engine-heavy DAX query and achieve a big reduction in CPU Time but a smaller reduction in Duration because the Storage Engine can do a lot of work in parallel across multiple cores; on the other hand tuning a Formula Engine-heavy DAX query might result in an equal reduction in both Duration and CPU Time. Similarly because the overall amount of time taken to refresh a dataset is often determined by the amount of time taken to refresh the biggest table, you might be able to significantly reduce the CPU Time for a refresh without changing the duration much. The CPU used by background operations like refresh is smoothed over a 24 hour period in Premium gen2 so there is no need to try to slow down a refresh to avoid a spike in CPU usage that might lead to throttling or autoscaling, but it is still true that reducing the amount of CPU used by a refresh at any given point in time will mean more resources are available for any queries that are running on the same node at the same time.
[Thanks to Akshai Mirchandani and Giri Nair for the information in this post]