Query Pending Event In Power BI Performance Analyzer

A quick note for anyone like me who spends too much time looking at the JSON exports from Performance Analyzer in Power BI Desktop: you may have noticed an event called Query Pending that isn’t (as yet) documented in the Word doc that explains the format of these JSON files.

It turns out that it’s not that interesting – it’s an event that has been added as part of an effort to make sure there are events to cover the whole of the query lifecycle. After the DAX queries for each visual in your report are generated they are added to a queue before they are executed. In some cases there could be several queries in the queue waiting to be executed, in which case they are said to be “pending”, and the Query Pending event tells you how long a query is in this pending state.

I haven’t seen a duration of longer than a couple of milliseconds for this event though, so you probably don’t need to worry much about it. If you ever do see a long Query Pending event please leave a comment – I’m curious to know what the cause might be.

[Thanks to John Vulner and Jon Ludwig for this information]

Capturing SQL Queries Generated By A Power BI DirectQuery Dataset

If you’re using DirectQuery mode for one or more tables in your Power BI dataset, the chances are that you will want to see the SQL (or whatever query language your DirectQuery data source uses) that is generated by Power BI when your report is run. If you can view the queries that are run in the tooling for the data source itself, for example, using Extended Events or SQL Server Profiler for SQL Server, then great – but you may not have permissions to do this. The good news is that you can capture the SQL queries in Power BI Desktop too, even though it’s not always obvious how to do so.

For some data sources like SQL Server then Performance Analyzer will give you the SQL queries generated. All you need to do is go to the View tab in the main Power BI Desktop window, click on the Performance Analyzer button to display the Performance Analyzer pane, click on Start Recording and then Refresh Visuals, find the event corresponding to the visual whose queries you want to view, expand it and then click on the “Copy query” link:

This will copy the DAX query generated by your visual to the clipboard; in the case of SQL Server DirectQuery sources you’ll also get the SQL query generated for that DAX query.

However this method does not work for all DirectQuery data sources; for them you’ll need to use the Query Diagnostics functionality in the Power Query Editor. You need to open the Power Query Editor window, go to the Tools tab on the ribbon, click on the Start Diagnostics button, go back to the main Power BI window, refresh your visuals (you can use the Refresh visuals button in Performance Analyzer for this again) and then go back to the Power Query Editor and click the Stop Diagnostics button. When you do this several new Power Query queries will appear which contain diagnostics data. Go to the one that has a name that starts with “Diagnostics_Detailed” and somewhere in there – where exactly depends on the data source – you’ll find the query generated. For example, for a Snowflake data source you’ll see the SQL generated somewhere in the Data Source Query column:

For an Azure Data Explorer DirectQuery data source the KQL query will be in one of the Record values in the Additional Info column:

One thing to watch out for is that you may also see what look like SQL Server TSQL queries, even when you’re not using a data source that can be queried with TSQL. Here’s an example from the Azure Data Explorer example above:

You can ignore these queries: they’re not useful, although they do give you an interesting insight into how DirectQuery mode works behind the scenes.

Measuring DirectQuery Performance In Power BI

If you have a slow DirectQuery report in Power BI one of the first questions you need to ask is how long the SQL queries that Power BI generates take to run. This is a more complicated question to answer than you might think, though, and in this post I’ll explain why.

I happen to have access to some of the famous New York taxi data in a Snowflake database, and in there is a table with trip data that has 173 million rows that I have a built a Power BI dataset from. The data and the database used are not really important here though – what is important is that it’s DirectQuery and a large-ish amount of data. Here’s a report page with a single table visual on it, showing passenger count aggregated by the hack license field:

It’s slow, but how slow? Here’s what Performance Analyzer shows when I refresh the table:

The DAX query takes 5.4 seconds but the Direct Query time is only 3.3 seconds – and the numbers don’t seem to add up. Here’s what Profiler captures for the same refresh shown in Performance Analyzer:

This shows there’s a gap of 2 seconds between the DirectQuery End event and the Query End event. What if I paste the DAX query into DAX Studio? Here’s what the Server Timings tab shows:

This is a different query execution to the two examples above, both of which show data for the same execution, which explains why the numbers are slightly different here – but again there seems to be an extra second of stuff happening and DAX Studio suggests that it’s in the Formula Engine.

So what is going on? The answer lies in understanding what the DirectQuery End Profiler event actually measures: it’s the amount of time between the Analysis Services engine handing a query over to the Power Query engine and the Analysis Services engine receiving the first row in the resultset back, including the time taken for the Power Query engine to fold the query.

Therefore if it takes a long time to get all the rows in the resultset then that could explain what’s going on here. Unfortunately there’s no way of knowing from Profiler events how long this takes – but there is another way. Going back to Performance Analyzer, if you export the data from it to JSON (by clicking the Export button) and load it into Power Query, you can see more detail about a DirectQuery query execution. Here’s the data from the first execution above:

[There’s a very good paper documenting what’s in the Performance Analyzer JSON file here]

Looking at the record in the metrics column for the Execute Direct Query event you can see the same 3.2 second duration shown above in Profiler. Notice that there are two other metrics here as well: RowsRead, which is the total number of rows returned by the resultset; and DataReadDuration, which is the amount of time to read these rows after the first row has been received plus some other Analysis Services Engine operations such as encoding of column values, joining with unpushed semijoins, projections of aggregations such as Average and saving the resultset to the in-memory cache. In this case the SQL query has returned 43191 rows and this takes 1.95 seconds – which explains the gap between the end of the Execute Direct Query event and the end of the query.

One last question: why this SQL query is returning so many rows when the DAX query is only asking for the top 502 rows?

The reason is that, at the time of writing at least, the Analysis Services engine can only push a top(n) operation down to a DirectQuery SQL query in very simple scenarios where there are no measures and no aggregation involved – and in this case we’re summing up values. As a result, if you’re using DirectQuery mode and have a visual like this that can potentially display a large number of rows and includes a measure or aggregated values, you may end up with slow performance.

[Thanks to Jeffrey Wang for providing the information in this post]

Using Small Multiples In Power BI To Improve Report Performance

While the long-awaited small multiples feature that previewed in the December 2020 release is an obvious boost to Power BI’s data visualisation capabilities, did you know that you can use it to improve report performance too?

Earlier this year I wrote blog posts showing how you can improve report performance by showing the same amount of data in fewer visuals (for example by replacing several cards with a single table) and how the number of visuals on a page affects report performance even if they aren’t displaying any data; several other people have written similar posts too. Small multiples are just another way you can replace several visuals with a single visual that displays the same data.

To illustrate, consider the following report with five separate line chart visuals on it that are identical apart from the fact that there is a different filter set on each one:

Here’s what Performance Analyzer shows when the page is refreshed:

In this case everything is fairly quick, but notice that each DAX query takes 10-12ms and by the time we have reached the “Count Of Sales by Date for Terraced” visual the total render duration has reached 710ms.

Now, here’s the same data in a single line chart visual using small multiples:

It’s the same data and the same charts, but look at what Performance Analyzer shows now:

There are two things to point out:

  1. There is only one DAX query which, at 12ms, performs about the same as each of the five DAX queries in the previous version of the report. In this case by requesting all the data in a single query, rather than five separate queries, Power BI has been able to optimise how it retieves the data it needs. This doesn’t mean that from a DAX point of view that the small multiples version of the report is five times faster than the original because Power BI will have run the five queries in parallel in that version, but in general you will see some improvement in overall performance from this consolidation of queries and in some cases this can be quite significant.
  2. While the sum of the visual display durations for each of the separate visuals is basically the same as the visual display duration for the small multiples visual – which makes sense because they display the same data in the same way – the total duration of the small multiples visual is 486ms compared to 710ms for the total duration of the Count Of Sales by Date for Terraced visual in the original version, so there has been a definite overall improvement in rendering time. In fact, Performance Analyzer doesn’t really give you an accurate way of measuring the overall time taken to render a report page. A much better technique is the one I blogged about here, and this suggests the overall performance saving from using small multiples is almost 500ms.

In conclusion, then, if you have any groups of visuals on your reports that can be replaced by a single small multiples visual then I recommend that you do so – you may see an improvement in performance as a result. Remember also that it’s still early days for small multiples: as more and more features are added to it, and as more and more visuals support small multiples, the more opportunity you will have to consolidate visuals.

Matching DAX Queries To Individual Visuals In A Published Power BI Report

The integration between Azure Log Analytics and Power BI opens up a lot of new possibilities for Power BI administrators. It’s still in preview (there’s a lot more cool stuff still to come) but the Analysis Services events that are available at the time of writing are still very useful: they give you pretty much everything you had in Profiler plus some new stuff too. In this post I’ll show you how you can use this data to work out which visual in a published report generated a particular DAX query.

In Log Analytics you can get information about all the queries run against a dataset using a simple KQL query like this one (I haven’t included an explicit time filter in this query, although you would want to do this in real life):

PowerBIDatasetsWorkspace | 
where ArtifactName == "InsertWorkspaceNameHere" 
and OperationName == 'QueryEnd' 

The QueryEnd event is raised every time a query finishes running – it’s an event you may already be familiar with from Profiler, xEvents or the Log Analytics integration with AAS – and it tells you useful things like the time the query finished, the duration, the text of the DAX query, the user running the query and so on. However what is new for Power BI is the contents ApplicationContext column. My colleague Kasper blogged about how you can use this column to get the DatasetId and ReportId when using AAS here; it now also contains a GUID that identifies the exact visual on a report that generated the DAX query.

Taking a closer look at this column shows that it contains a JSON document:

Here’s a more sophisticated KQL query that parses that JSON document, extracts the contents into separate columns and returns other columns you might need for troubleshooting like the DAX query text and query duration:

PowerBIDatasetsWorkspace | 
where ArtifactName == "InsertWorkspaceNameHere" 
and OperationName == 'QueryEnd' |
extend a = todynamic(ApplicationContext)|
extend VisualId = a.Sources[0].VisualId, ReportId = a.Sources[0].ReportId, 
DatasetName = ArtifactName, DAXQuery = EventText |
project TimeGenerated, WorkspaceName, DatasetName, ReportId, VisualId, DurationMs, DAXQuery |
order by TimeGenerated desc 

The ID of the dataset isn’t that interesting because you can get it, and the name of the dataset, in other columns. The ReportId is useful because it tells you which report generated the query and there are a number of ways you can find out which report this relates to. For example, you can just open the report in the browser and look at the url: the Report ID is the GUID between “/reports/” and “/ReportSection” as detailed here.

How can you work out which visual the VisualId relates to though? It’s not obvious, but it is possible. First you have to open your report in Power BI Desktop, open Performance Analyzer, refresh the report page and export the data from Performance Analyzer as a JSON file as detailed here. The data you need is hidden in this JSON file in the Visual Container Lifecycle event; here’s the M code for a Power Query query to get it:

let
  Source = Json.Document(
    File.Contents(
      "C:\InsertFileNameHere.json"
    )
  ), 
  events = Source[events], 
  #"Converted to Table" = Table.FromList(
    events, 
    Splitter.SplitByNothing(), 
    null, 
    null, 
    ExtraValues.Error
  ), 
  #"Expanded Column1"
    = Table.ExpandRecordColumn(
    #"Converted to Table", 
    "Column1", 
    {
      "name", 
      "component", 
      "start", 
      "id", 
      "metrics", 
      "end", 
      "parentId"
    }, 
    {
      "name", 
      "component", 
      "start", 
      "id", 
      "metrics", 
      "end", 
      "parentId"
    }
  ), 
  #"Filtered Rows" = Table.SelectRows(
    #"Expanded Column1", 
    each (
      [name]
        = "Visual Container Lifecycle"
    )
  ), 
  #"Expanded metrics"
    = Table.ExpandRecordColumn(
    #"Filtered Rows", 
    "metrics", 
    {
      "status", 
      "visualTitle", 
      "visualId", 
      "visualType", 
      "initialLoad"
    }, 
    {
      "status", 
      "visualTitle", 
      "visualId", 
      "visualType", 
      "initialLoad"
    }
  ), 
  #"Removed Other Columns"
    = Table.SelectColumns(
    #"Expanded metrics", 
    {
      "visualTitle", 
      "visualId", 
      "visualType"
    }
  )
in
  #"Removed Other Columns"

Here’s an example of the output of this Power Query query:

This shows that, on the report in question, there were three visuals: a card visual called “My Card”, a column chart called “My Column Chart” and a table called “My Table”. It also shows the IDs of these visuals which you can match up with the VisualId values in the ApplicationContext column in Log Analytics.

The last problem is matching these names up with the actual visuals in the report and this will only be easy if the report designer has followed the best practice of giving each visual a meaningful name; you can now do this by double-clicking on the visual in the Selection pane or by editing the Title Text property as Reid Havens shows here.

Using Power BI Dynamic M Parameters In DAX Queries

As I’m sure you’ve guessed by now I’m a big fan of dynamic M parameters in Power BI. They’re easy to use in Power BI Desktop but what if you want to use them in your own DAX queries? Documentation for this is coming soon, but in the meantime I thought it would be useful to show the additions to DAX query syntax to support them – something you can see for yourself if you take a look at the DAX queries generated by Power BI Desktop using Performance Analyzer.

Here’s an example of a query generated by Power BI Desktop where there are three dynamic M parameters defined: DateParameter, TextParameter and NumericParameter.

DEFINE
  MPARAMETER DateParameter = 
    DATE(2020, 1, 1)

  MPARAMETER TextParameter = 
    "January"

  MPARAMETER NumericParameter = 
    1

  VAR __DS0FilterTable = 
    TREATAS({DATE(2020, 1, 1)}, 'ParamValues'[DateP])

  VAR __DS0FilterTable2 = 
    TREATAS({"January"}, 'ParamValues'[MonthNameP])

  VAR __DS0FilterTable3 = 
    TREATAS({1}, 'ParamValues'[MonthNoP])

  VAR __DS0Core = 
    CALCULATETABLE(
      DISTINCT('DATE'[DATE]),
      KEEPFILTERS(__DS0FilterTable),
      KEEPFILTERS(__DS0FilterTable2),
      KEEPFILTERS(__DS0FilterTable3)
    )

  VAR __DS0PrimaryWindowed = 
    TOPN(501, __DS0Core, 'DATE'[DATE], 1)

EVALUATE
  __DS0PrimaryWindowed

ORDER BY
  'DATE'[DATE]

The dynamic M parameters are set using a DEFINE statement and the new MParameter keyword. The name of the parameter here is the same as the name of the parameter defined in the Power Query Editor; one thing to point out is that if your M parameter name contains a space, it should be surrounded by single quotes.

If you have enabled multi-select on your dynamic M parameter, you will need to pass a table of values to it in your DAX query. Here’s an example of how to do this with a table constructor:

DEFINE
  MPARAMETER 'MonthParameter' = 
    {"January",
      "February"}

I’ll admit I haven’t tried this yet, but you should be able to do some really cool stuff with this in a paginated report connected to a DirectQuery dataset if it contains dynamic M parameters. Maybe in a future post…

Getting The IDs Of All Visuals In A Power BI Report Page Using The Power BI Embedded Analytics Playground

Log Analytics contains information on the dataset, report and visual that are associated with a DAX query but that information is in the form of IDs rather than names. Getting the IDs for specific datasets and reports is fairly straightforward – you can get them from urls in the Power BI Portal – and as I wrote here, it’s possible to get a list of IDs and names for the visuals in a report from the JSON file you get when you export from Performance Analyzer in Power BI Desktop. However, my colleague Rui Romano recently showed me a different way to get the same information using the Power BI Embedded Analytics Playgound, which may be an easier option to use in some cases.

The Power BI Embedded Analytics Playground (more details here and here) is a site where developers can learn how to use Power BI’s APIs for embedding reports and dashboards in their own applications. “But Chris!”, I hear you cry, “I’m not using Power BI Embedded!” – don’t worry, this is all about embedding not Power BI Embedded (yes, there’s a difference) so it works with the regular Power BI Service. “I’m not a developer though!”, you add – don’t worry, neither am I and you don’t need to understand any code to do what I’m going to show you.

The first thing you need to do is go to:

https://playground.powerbi.com/en-us/dev-sandbox

When you go there you’ll see the following prompt:

Choose “Select report” under “Use my own Power BI report” and select the report whose visuals you want to get the IDs for.

At this point a page will open with a code editor at the top and your report shown at the bottom. Before you continue you will also need to open the Console pane in your browser’s developer tools. If you’re using Microsoft Edge you can learn how to do this here; if you’re using Chrome you can learn how to do this here.

At this point you should see something like this:

At this point you can start to generate some code by dragging and dropping from the left-hand pane to the code pane in the top-centre of the screen. There are two things you will need to do here: first, generate code to get the report to display the right page and second generate code to get all the visual IDs.

The easiest way to set the page is to generate code to set the page is to expand the Navigation node on the left hand pane and drag the “Page – Set active” item onto the bottom of the code in the code pane. You should then change the page index in the code to select the page; it’s zero-based, so to get the first page you set the index to 0, for the second page set it to 1 and so on.

Next, underneath that code, drag the “Get visuals” item and then click the Run button:

Finally, in the Console pane on the right-hand side of the screen, you’ll see a line was added that you can expand and when you do so, you’ll see the ID, visual type and title of all the visuals on the page:

There’s still a lot of manual work to do but it’s still a fairly easy process. I’m also sure there’s a developer out there who can write a script that can be pasted into the code window that a) loops through all the pages in the report and b) returns the IDs, name and titles in a more friendly format. I’m very impressed with how easy the Embedded Analytics Playground makes all this, though, even for a non-developer like me.

Testing Performance Of Power BI Reports In The Browser, Part 3

In part 1 of this series I talked about why you might want to test the performance of your Power BI report in the browser; in part 2 I showed how you could test report performance in the browser; and in this blog post I’ll walk through an example of a report that is slow and show how to find out what the problem is.

Let’s say you are building reports for a chain of fast food restaurants and you have just created a new one showing sales of your products. It consists of one page with one rather smart-looking column chart visual on it:

image

Performance Analyzer in Power BI Desktop shows that the report is nice and fast:

image

…so you go ahead and publish. You view the report after publishing and it still seems fast. Then the complaints start coming in: the report is slow!?! It seems to be users who are viewing the report on their phone who are having the most problems. So, following the instructions in my last post, you open up Chrome DevTools and run an audit using a simulated slow 4G connection:

image

You get the following results:

image

They look pretty awful – 30.5 seconds for the report to render! Scrolling down on the audit results you also see the likely cause:

image

A large network payload, with the number one culprit a large jpg? So next you go to the Network tab and do a hard refresh of the report, disabling the cache and simulating a fast 3G connection:

image

Here’s what the report render looks like:

SlowReportRefresh

As you can see, it not only takes over 30 seconds to render the report, but worst of all the column chart is only visible right at the end. The waterfall shows something is being downloaded that is 1.2MB in size and that this takes 7.23 seconds:

image

Hovering over this request displays a tooltip that gives the full filename which, again, is a jpg file. The only image on the report is the background image used in the column chart and it turns out it is 1.2MB in size:

image

So it’s the background image used in the column chart that is the main problem! You remove the background image from the visual:

image

…and as expected, when you re-run the audit using the same settings as before the report renders seven seconds faster, taking 23.4 seconds:

image

Still not great, I know, but remember this is worst-case performance: not only does the audit recreate slow network and hardware, but it also recreates a cold cache – something your users will rarely encounter.

Hopefully this fairly simple example shows how useful the tools and techniques shown in the first two parts of this series are for troubleshooting certain types of report performance problem. It also goes to show how important it is to make sure any images you use in your report are as small as possible – something that, in my experience, many Power BI developers don’t always do.

Connecting SQL Server Profiler To Power BI Premium

Back in December when I wrote a series of posts on testing the performance of Power BI reports in the browser, I mentioned that it was important to test in the browser because some aspects of the performance of a report may be different there compared to in Power BI Desktop. Following on from this, if you’re testing performance of a report in the browser you are also going to want to take a closer look at the DAX queries generated by your report, even if it is just to check that what you see there is the same as what you see in Performance Analyzer in Power BI Desktop. If your report uses a Live Connection to Analysis Services this is easy to do using either SQL Server Profiler, Azure Analysis Services’s diagnostic logging feature or XEvents. If you’re using a dataset stored in Power BI we have a range of options for monitoring what’s going on including Usage Metrics and the Premium Capacity Metrics Apps and of course there’s also DAX Studio, but for an old-school guy like me, connecting to a Power BI Premium workspace using SQL Server Profiler is a great way to go to get detailed information about what’s going on when queries run. In this blog post I’ll show you how to connect Profiler to Power BI Premium.

First you’ll need to install the latest version of SQL Server Management Studio, which includes SQL Server Profiler – you can get it here. If you have an older version installed you’ll probably need to upgrade. And before anyone leaves a comment about Profiler being deprecated, let me point you to the note on this page:

Note

The feature that allows you to connect Profiler to a Power BI Premium workspace is XMLA Endpoints: it’s in preview right now but basically this allows you to connect any tool that works with Azure Analysis Services up to Power BI Premium. SQL Server Profiler wants to connect to an instance of Analysis Services; XMLA Endpoints mean that you can connect it to a dataset in a Power BI Premium workspace but for this to happen you need to know the url for Profiler to connect to. You can find this by going to your workspace, clicking on Settings:

Workspace Settings

and then going to the Premium tab and copying the Workspace Connection string:

Workspace Connection

You can then open up Profiler, go to the File menu and select New Trace and a connection dialog will appear:

ProfilerConnection1

In this dialog:

  • Set the Server type to Analysis Services
  • In Server name paste the Workspace Connection that you copied from Power BI earlier
  • In Authentication select Azure Active Directory – Universal with MFA and enter your username

Next click the Options button and go to the Connection Properties tab and on the Connect to database dropdown select <Browse server…>:

ProfilerConnection2

Click Yes on the dialog that appears and then choose the name of the dataset in your workspace that you want to connect to in the Connect to database dropdown. If you don’t do this you’ll get errors later on.

Next you’ll see the Properties dialog:

ProfilerProperties1

The default template is Blank, which means no events are selected to monitor; to select events go to the Events Selection tab:

ProfilerProperties2

…select the events you want, and click Run to start tracing.

Trace

Which events should you choose? That’s a big topic and not one that I have time to go into here, but the Query End event is perhaps the one I look at most – it’s fired every time a query finishes executing and gives you a lot of important information such as the start time, end time and duration of the query in milliseconds. Books such as “The Definitive Guide To DAX” have a lot of information on using Profiler with Analysis Services and a lot of that information is relevant to Power BI Premium too. In future blog posts I dare say I’ll show you some interesting things you can do using Profiler and Power BI too…

Make Your Power BI Report Run Faster By Showing The Same Data In Fewer Visuals

Do you have a Power BI report that has a row of card visuals on it, something like this:

Cards

?

It’s a very common thing to do, but if your report is too slow to render you may find that a design like this is part of the problem. Now look at the following report which contains a single matrix that has been configured to look as un-matrix-like as possible but shows the same data:

Matrix

This is likely to perform faster, but why?

Before we go any further, I don’t want you to go and change your reports if you’re not going to get any benefit from doing so. Use Performance Analyzer (as shown here) to determine which visuals on your report are the cause of slow performance – there’s no point redesigning visuals that are fast anyway.

As a general rule the more visuals you put on a report page the slower it’s going to get. It’s logical if you think about it: the more visuals there are, the more queries have to be run against your dataset and the more work Power BI has to do to render the report. I know there is a tendency to try to pack as much information onto a page as possible and this often happens when someone else has designed the report you’re trying to build, but you should always try to resist this. Splitting a single large page into multiple smaller pages, using slicers or filters to reduce the amount of data shown at any one time and avoiding gigantic Excel-like tables are a good idea.

Once you’ve done that you need to see whether you can reduce the number of visuals needed to display the same amount of information: in the example above, the four separate cards show the same data as the single matrix. In part this is because the cards generate four separate DAX queries against the dataset to get the data they need whereas the matrix only generates one DAX query. There is a certain overhead to running a DAX query, so reducing the number of DAX queries needed to get the same amount of data is a good thing. More importantly, in this example Power BI can get the four values required for the single DAX query generated by the matrix much more efficiently than it can in the four separate DAX queries needed by the cards. This is because of something called “DAX Fusion”, which my colleague Phil Seamark blogged about in great detail here, and this can make a signficant difference to performance (Marco Russo’s post on why Analyze In Excel reports may be slower than Power BI reports here also contains some useful information on DAX Fusion).

This is only a simple example; if you want to see a really sophisticated demonstration of replacing several visuals with a matrix I suggest you watch this video by another colleague of mine, Miguel Myers. And of course it doesn’t just apply to cards: any time you can replace multiple visuals with a single visual (remember to be careful using custom visuals though) you’re likely to gain some performance.

%d bloggers like this: