Why You Should Optimise Your Power BI Premium Reports And Refreshes For CPU Time As Well As Duration

When you tune a Power BI report or dataset refresh your first job is to make it run faster – users hate slow reports and late data. However, if you are using Power BI Premium you also need to think about reducing the amount of CPU used by a query or refresh as well, and in this post I’ll explain why.

First of all, what are the metrics I’m talking about? For query performance tuning, if you connect SQL Server Profiler to Power BI Desktop or the Power BI Service (as described here for example) and look at the Query End event you can see all the DAX queries executed when a report is rendered. The TextData column gives the query text, the Duration column shows the amount of time the query took to run in milliseconds and the CPUTime column is a representation of the amount of CPU resources used by the query:

The Command End event gives you similar information for refreshes:

The same data is available in Log Analytics, albeit with slightly different column names: the EventText holds the query text, DurationMs has the duration and CpuTimeMs has the CPU time:

As I said, it’s the duration that, traditionally, you would have wanted to reduce. In fact, in the past, CPU Time didn’t even return accurate values so it wasn’t useful – although that changed recently.

So why is CPU Time so important? In Power BI Premium gen1 you have a fixed set of resources that all your queries and refreshes compete for; if you try to use more CPU than you have available then performance suffers. Power BI Premium gen2 works differently (as explained in detail here and here): you no longer have a fixed set of resources available, but if you use more CPU than your capacity allows you will either pay extra (if you enable autoscale) or get throttled. The Premium Capacity Metrics app gives you a lot of information about CPU usage across your Premium capacities. Power BI Premium Per User is based on the Power BI Premium gen2 platform and some of the same considerations exist for it as for Premium gen2, although they aren’t fully documented or implemented yet and I can’t go into details here.

As a result it’s important for the overall cost and performance of a Premium capacity that all the queries and refreshes that run on it use as little CPU as possible. One last point: while Duration and CPU Time are linked the relationship is not necessarily linear. For example you could tune a Storage Engine-heavy DAX query and achieve a big reduction in CPU Time but a smaller reduction in Duration because the Storage Engine can do a lot of work in parallel across multiple cores; on the other hand tuning a Formula Engine-heavy DAX query might result in an equal reduction in both Duration and CPU Time. Similarly because the overall amount of time taken to refresh a dataset is often determined by the amount of time taken to refresh the biggest table, you might be able to significantly reduce the CPU Time for a refresh without changing the duration much. The CPU used by background operations like refresh is smoothed over a 24 hour period in Premium gen2 so there is no need to try to slow down a refresh to avoid a spike in CPU usage that might lead to throttling or autoscaling, but it is still true that reducing the amount of CPU used by a refresh at any given point in time will mean more resources are available for any queries that are running on the same node at the same time.

[Thanks to Akshai Mirchandani and Giri Nair for the information in this post]

Preserving Data Types With SQL Queries In Power Query And Power BI

My post earlier this year on enabling query folding when using SQL queries as a data source in Power Query provoked a lot of interest. This post adds one more useful detail: how to preserve the original data types of the columns in your query when using this technique with SQL Server-related sources.

Consider the DimDate table in the AdventureWorksDW2017 sample database for SQL Server:

Notice that the FullDateAlternateKey column has the data type Date.

If you connect to this table in the normal way, by selecting it in the Navigation pane when you connect to your SQL Server instance, the M code for your Power Query query will look something like this:

let
  Source = Sql.Databases("localhost"), 
  AdventureWorksDW2017 = Source
    {[Name = "AdventureWorksDW2017"]}
    [Data], 
  dbo_DimDate = AdventureWorksDW2017
    {[Schema = "dbo", Item = "DimDate"]}
    [Data]
in
  dbo_DimDate

Unsurprisingly, the FullDateAlternateKey column in the Power Query query also has a data type of Date, as indicated by the calendar icon on the left side of the column header in the Power Query Editor:

However, if you use Value.NativeQuery to run a SQL query to get the same data and set EnableFolding=true, like so:

let
  Source = Sql.Databases("localhost"), 
  AdventureWorksDW2017 = Source
    {[Name = "AdventureWorksDW2017"]}
    [Data], 
  Q = Value.NativeQuery(
    AdventureWorksDW2017, 
    "Select * From DimDate", 
    null, 
    [EnableFolding = true]
  )
in
  Q

…you’ll see that the FullDateAlternateKey column comes through as a DateTime type instead:

The same thing would happen with a column of type Time too, ie it would come through as a DateTime.

If you want the types in the output of Value.NativeQuery to match the types in the output of the first Power Query query above there’s an extra option you need to add: PreserveTypes=true.

let
  Source = Sql.Databases("localhost"), 
  AdventureWorksDW2017 = Source
    {[Name = "AdventureWorksDW2017"]}
    [Data], 
  Q = Value.NativeQuery(
    AdventureWorksDW2017, 
    "Select * From DimDate", 
    null, 
    [
      PreserveTypes = true, 
      EnableFolding = true
    ]
  )
in
  Q

In the output of this query, FullDateAlternateKey has the data type Date again:

This option is only available for the SQL Server connector (and connectors related to it) at the time of writing.

[Thanks to Curt Hagenlocher for this information]

Matching DAX Queries To Individual Visuals In A Published Power BI Report

The integration between Azure Log Analytics and Power BI opens up a lot of new possibilities for Power BI administrators. It’s still in preview (there’s a lot more cool stuff still to come) but the Analysis Services events that are available at the time of writing are still very useful: they give you pretty much everything you had in Profiler plus some new stuff too. In this post I’ll show you how you can use this data to work out which visual in a published report generated a particular DAX query.

In Log Analytics you can get information about all the queries run against a dataset using a simple KQL query like this one (I haven’t included an explicit time filter in this query, although you would want to do this in real life):

PowerBIDatasetsWorkspace | 
where ArtifactName == "InsertWorkspaceNameHere" 
and OperationName == 'QueryEnd' 

The QueryEnd event is raised every time a query finishes running – it’s an event you may already be familiar with from Profiler, xEvents or the Log Analytics integration with AAS – and it tells you useful things like the time the query finished, the duration, the text of the DAX query, the user running the query and so on. However what is new for Power BI is the contents ApplicationContext column. My colleague Kasper blogged about how you can use this column to get the DatasetId and ReportId when using AAS here; it now also contains a GUID that identifies the exact visual on a report that generated the DAX query.

Taking a closer look at this column shows that it contains a JSON document:

Here’s a more sophisticated KQL query that parses that JSON document, extracts the contents into separate columns and returns other columns you might need for troubleshooting like the DAX query text and query duration:

PowerBIDatasetsWorkspace | 
where ArtifactName == "InsertWorkspaceNameHere" 
and OperationName == 'QueryEnd' |
extend a = todynamic(ApplicationContext)|
extend VisualId = a.Sources[0].VisualId, ReportId = a.Sources[0].ReportId, 
DatasetName = ArtifactName, DAXQuery = EventText |
project TimeGenerated, WorkspaceName, DatasetName, ReportId, VisualId, DurationMs, DAXQuery |
order by TimeGenerated desc 

The ID of the dataset isn’t that interesting because you can get it, and the name of the dataset, in other columns. The ReportId is useful because it tells you which report generated the query and there are a number of ways you can find out which report this relates to. For example, you can just open the report in the browser and look at the url: the Report ID is the GUID between “/reports/” and “/ReportSection” as detailed here.

How can you work out which visual the VisualId relates to though? It’s not obvious, but it is possible. First you have to open your report in Power BI Desktop, open Performance Analyzer, refresh the report page and export the data from Performance Analyzer as a JSON file as detailed here. The data you need is hidden in this JSON file in the Visual Container Lifecycle event; here’s the M code for a Power Query query to get it:

let
  Source = Json.Document(
    File.Contents(
      "C:\InsertFileNameHere.json"
    )
  ), 
  events = Source[events], 
  #"Converted to Table" = Table.FromList(
    events, 
    Splitter.SplitByNothing(), 
    null, 
    null, 
    ExtraValues.Error
  ), 
  #"Expanded Column1"
    = Table.ExpandRecordColumn(
    #"Converted to Table", 
    "Column1", 
    {
      "name", 
      "component", 
      "start", 
      "id", 
      "metrics", 
      "end", 
      "parentId"
    }, 
    {
      "name", 
      "component", 
      "start", 
      "id", 
      "metrics", 
      "end", 
      "parentId"
    }
  ), 
  #"Filtered Rows" = Table.SelectRows(
    #"Expanded Column1", 
    each (
      [name]
        = "Visual Container Lifecycle"
    )
  ), 
  #"Expanded metrics"
    = Table.ExpandRecordColumn(
    #"Filtered Rows", 
    "metrics", 
    {
      "status", 
      "visualTitle", 
      "visualId", 
      "visualType", 
      "initialLoad"
    }, 
    {
      "status", 
      "visualTitle", 
      "visualId", 
      "visualType", 
      "initialLoad"
    }
  ), 
  #"Removed Other Columns"
    = Table.SelectColumns(
    #"Expanded metrics", 
    {
      "visualTitle", 
      "visualId", 
      "visualType"
    }
  )
in
  #"Removed Other Columns"

Here’s an example of the output of this Power Query query:

This shows that, on the report in question, there were three visuals: a card visual called “My Card”, a column chart called “My Column Chart” and a table called “My Table”. It also shows the IDs of these visuals which you can match up with the VisualId values in the ApplicationContext column in Log Analytics.

The last problem is matching these names up with the actual visuals in the report and this will only be easy if the report designer has followed the best practice of giving each visual a meaningful name; you can now do this by double-clicking on the visual in the Selection pane or by editing the Title Text property as Reid Havens shows here.

%d bloggers like this: