Power BI, Parallelism And Dependencies Between SQL Queries In DirectQuery Mode

This is going to sound strange, but one of the things I like about tuning Power BI DirectQuery semantic models is that their generally-slower performance and the fact you can see the SQL queries that are generated to get data makes it much easier to understand some of the innermost workings of the Power BI engine. For example this week I was trying to tune a DAX query on a DirectQuery model using DAX Studio and the Server Timings showed me something like this:

As I described here, Power BI can send SQL queries in parallel in DirectQuery mode and you can see from the Timeline column there is some parallelism happening here – the last two SQL queries generated by the DAX query run at the same time – but everything has to wait for that first SQL query to complete. Why? Can this be tuned?

Here’s the scenario that produced the query above. I have a DirectQuery semantic model built from the ContosoDW SQL Server sample database:

There are three base measures defined:

Distinct Customers = DISTINCTCOUNT(FactOnlineSales[CustomerKey])
January Customers =
CALCULATE([Distinct Customers],
KEEPFILTERS('DimDate'[CalendarMonthLabel]="January"))
Monday Customers =
CALCULATE([Distinct Customers],
KEEPFILTERS('DimDate'[CalendarDayOfWeekLabel]="Monday"))

Note that these measures are written specifically to prevent fusion from taking place: each measure generates a separate SQL query. Here’s what DAX Studio’s Server Timings shows for the DAX query generated for the table shown above:

As you can see, the three SQL queries generated by the DAX query are run in parallel.

Now consider the following measure:

IF Test = IF([Distinct Customers]>3000, [January Customers], [Monday Customers])

Here’s what this measure returns:

If you run the query generated for this visual in DAX Studio, Server Timings shows what I showed in the first screenshot in this post:

The last two substantial SQL queries, on lines 4 and 5, can only run when the first SQL query, on line 1, has finished. The details of SQL queries tell you more about what’s going on here. The first SQL query, on line 1, just gets the values for the [Distinct Customers] measure for all rows in the table:

The WHERE clauses for the SQL queries on line 4:

..and line 5:

…show that these last two queries only get the values for the [January Customers] and [Monday Customers] measures for the rows where the [IF Test] measure needs to display them. And this explains why the first SQL query has to finish before these last two SQL queries can be run: the WHERE clauses of these last two SQL queries are constructed using the results returned by the first SQL query.

There is another way of evaluating the IF condition in the [IF Test] measure. Instead of “strict” evaluation, where the engine only gets the value of [January Customers] for the rows in the table where [Distinct Customers] is greater than 3000 and only gets the value [Monday Customers] for the remaining rows, it can get values for [January Customers] and [Monday Customers] for all rows in the table and then throw away the values it doesn’t need. This is “eager” evaluation and as you would expect, Marco and Alberto have a great article explaining strict and eager evaluation here that is worth reading; Power BI can decide to use either strict or eager evaluation with the IF function depending on which one it thinks will be more efficient. However you can force the use of eager evaluation by using the IF.EAGER DAX function instead of IF:

IF EAGER Test =
IF.EAGER([Distinct Customers]>3000, [January Customers], [Monday Customers])

Here’s what Server Timings shows for the DAX query that uses IF.EAGER:

As you can see, the use of IF.EAGER means that the three substantial SQL queries generated by Power BI for this DAX query can now be run in parallel because there are no dependencies between them: they get the values of [Distinct Customers], [January Customers] and [Monday Customers] for all rows in the table. However, even though these three SQL queries are now run in parallel, it doesn’t result in any performance benefits here because it looks like the three queries are slower as a result of all being run at the same time. Power BI has made the right call to use strict evaluation with the IF function in this case but if you see it using strict evaluation I think it’s worth experimenting with IF.EAGER to see if it performs better – especially in DirectQuery mode where Power BI knows less about the performance characteristics of the database you’re using as your data source.

[Thanks to Phil Seamark for helping me understand this behaviour]

Diagnosing Power BI DirectQuery Performance Problems Caused By SQL Queries That Return Large Resultsets

One very common cause of Power BI performance problems is having a table with a large number of rows on your report. It’s a problem I wrote about here, and while I used an Import mode for my example in that post I also mentioned that this can be an even bigger problem in DirectQuery mode: while the DAX query for the table visual might have a TOPN filter that asks for 502 rows, the query going back to the DirectQuery data source (usually a SQL query) may not have that filter applied and could return a much larger number of rows, which could then take a long time for Power BI to read. I wrote about this in more detail here and showed how you can diagnose the problem in Performance Analyzer by looking at the Execute DirectQuery event and ActualQueryDuration, RowsRead and DataReadDuration metrics. But now I have a custom visual to display Performance Analyzer export data, what does this look like? Also, what can Execution Metrics tell us?

Using a simple DirectQuery model built from the ContosoDW SQL Server sample:

…I built a report with a table visual whose DAX query triggered a single SQL query that returned 475038 rows:

Here’s what a Profiler trace that included the Execution Metrics event showed me:

Some things to notice here:

  • The DAX query takes 3.5 seconds, as seen in the Duration column for the Query End event and the durationMs Execution Metric
  • The DirectQuery End event has a Duration of 2 seconds, leaving a gap of 1.5 seconds that needs to be explained
  • This Duration of 2 seconds for the DirectQuery End event matches to the externalQueryExecutionTimeMs Exection Metric, which is 2.054 seconds, but the docs only say that this is the “Total time spent on executing all external datasource queries during the request” which is a bit vague
  • The actual explanation for the gap comes from the directQueryIterationTimeMs Execution Metric which is 1.1 seconds, although this is still 0.4 seconds short of the 1.5 second gap mentioned above
  • The directQueryTotalRows Execution Metric shows that 475038 rows were returned by the SQL query
  • Execution Metrics provide an aggregated summary of metrics at the DAX query level; in this case there is only one SQL query generated but if (as is often the case) there was more than one, it would be hard to know what each SQL query was contributing to the problem

Here’s what my custom visual showed with data from Performance Analyzer for the same DAX query:

Now here’s the same visual with the tooltip from the Execute Direct Query event which shows some of the metrics associated with that event, shown:

This shows something very similar to what the Execution Metrics event in Profiler showed:

  • The Execute DAX Query event has a duration of 3.5 seconds
  • The Execute Direct Query event has a duration of 2.1 seconds – meaning that again there is a gap to be explained, a gap where no activity seems to be taking place in the visual (clearly visible in the first of the two screenshots immediately above – the tooltip obscures this gap)
  • As mentioned in my older blog post, this gap is explained by the DataReadDuration metric (documented here) from the Execute Direct Query event – which, as shown in the tooltip in the screenshot above, is 1.1 seconds
  • The amount of time it takes to read all the data from a large resultset can only be measured from the client (ie Power BI) side – a lot of customers I work with measure SQL query performance on the server side and see fast queries, but a fast query that returns a large number of rows that all need to be sent to Power BI can of course be slow
  • The ActualQueryDuration metric, also shown in the tooltip, gives you the amount of time it took to get the first row back from the SQL query
  • Unlike the Execution Metrics Profiler event, this DataReadDuration metric is available for each SQL query generated by a DAX query, which means you can tell exactly which SQL query/queries are causing problems

What can we learn from this? Apart from the fact that table visuals with vertical scrollbars can be a really bad thing, any time you have a DirectQuery model that generates SQL queries that return a very large number of rows, you could be paying a very heavy price to read all those rows – especially if you are getting close to the Max Intermediate Row Set Count limit, which is set to 1 million rows by default. What can you do about this? Apart from redesigning your report, I blogged about a technique here where aggregations can help for scenarios involving degenerate dimensions; using the new calendar-based time intelligence feature can also help to reduce the number of rows returned by SQL queries, as I described here. In general you’ll have to try to tune the DAX in your measures and your model to see what you can do to optimise the SQL queries Power BI generates so they return fewer rows.

Diagnosing Power BI DirectQuery Connection Limit Problems With Performance Analyzer

To kick off my series on diagnosing Power BI performance problems with Performance Analyzer in the browser (which I introduced last week with my post on vide-coding a custom visual to visualise Performance Analyzer data), I want to revisit a subject I blogged about two years ago: how hitting the limit on the maximum number of connections to a DirectQuery data source can lead to queries queuing for an available connection and performance problems. In my original post on this topic I showed how you can use the Execution Metrics event in Profiler/Log Analytics/Workspace Monitoring to see when this queuing happens. In this post I will show how you can do exactly the same thing with Performance Analyzer.

Here’s the semantic model I used in my previous post: it has three tables in DirectQuery mode connected to SQL Server. Each table consists of a single row and column and is bound to a SQL query that takes 10 seconds to run (using the TSQL custom function I blogged about here).

Here’s the report connected to this model, containing three cards, each of which display the single value returned by each of these three tables. As you would expect, the DAX queries associated with each of these card visuals takes 10 seconds to run when run in isolation.

With the Max Connections Per Data Source property set to the default value of 10:

…I ran the report in the browser with Performance Analyzer running. Here’s what I saw in the Performance Analyzer pane:

No surprises: the DirectQuery timings are all around 10 seconds. I exported the Performance Analyzer data and loaded it into my custom visual. The events for the three card visuals were all very similar:

I then set the Max Connections Per Data Source property on the semantic model to 1, so there was only one connection available back to SQL Server, and reran the report with Performance Analyzer running. Here’s what Performance Analyzer showed in the browser this time:

The fact that the DirectQuery activity for Table C took 13 seconds, the DirectQuery activity for Table B took 24 seconds and the DirectQuery activity for Table A took 35 seconds suggests that there’s some queuing happening but there’s nothing here that tells you that for sure. But exporting the data from Performance Analyzer and loading it into my visual showed the following for Table C:

Table B:

And Table A:

Note how for Table C the bar for the Get Source Connection event is very small, but for Table B it’s around 12 seconds and for Table A it’s around 24 seconds. This tells you exactly what the problem was: queuing for a connection.

As I said, you can get the same information from the Execution Metrics event but installing Profiler or capturing this data with Log Analytics or Workspace Monitoring isn’t always an option; this is a lot more convenient.

A Look At The Impact Of Calendar Based Time Intelligence On Power BI DirectQuery Performance

Calendar-based time intelligence (see here for the announcement and here for Marco and Alberto’s more in-depth article) is at least the second-most exciting thing to happen in DAX in the last few months: it makes many types of time intelligence calculation much easier to implement. But as far as I know only Reid Havens, in this video, has mentioned the performance impact of using this new feature and that was for Import mode. So I wondered: do these benefits also apply to DirectQuery mode? The answer is on balance yes but it’s not clear-cut.

To illustrate what I mean, I built a simple DirectQuery model against the Adventure Works DW sample database in SQL Server:

This model used the old “Mark as date table” time intelligence.

Here are the definitions of some of the measures:

Sales Amount =
SUM ( 'Internet Sales'[SalesAmount] )

YTD Sales Amount =
CALCULATE ( [Sales Amount], DATESYTD ( 'Date'[FullDateAlternateKey] ) )

PY YTD Sales Amount =
CALCULATE (
    [YTD Sales Amount],
    SAMEPERIODLASTYEAR ( 'Date'[FullDateAlternateKey] )
)

The Sales Amount measure returns the sum of the values in the SalesAmount column; the YTD Sales Amount finds the year-to-date sum of Sales Amount; and PY YTD Sales Amount finds the value of this measure in the same period of the previous year.

I then created a matrix visual showing the PY YTD Sales Amount measure with EnglishProductName from the Product dimension on columns and CalendarYear and EnglishMonthName from the Date dimension on rows:

I copied the DAX query for this visual from Performance Analyzer, pasted it into DAX Studio and then ran it on a cold cache with Server Timings enabled. Here’s what Server Timings showed:

A total duration of 1.9 seconds and 5 SE queries doesn’t look too bad. But here are the Execution Metrics for this query with some important metrics highlighted:

{ 
"timeStart": "2025-11-29T17:17:16.461Z", 
"timeEnd": "2025-11-29T17:17:18.350Z",  
"durationMs": 1890, 
"datasourceConnectionThrottleTimeMs": 0, 
"directQueryConnectionTimeMs": 24, 
"directQueryIterationTimeMs": 166, 
"directQueryTotalTimeMs": 1681, 
"externalQueryExecutionTimeMs": 1493, 
"queryProcessingCpuTimeMs": 16, 
"totalCpuTimeMs": 828, 
"executionDelayMs": 3,  
"approximatePeakMemConsumptionKB": 20977,  
"directQueryTimeoutMs": 3599000, 
"tabularConnectionTimeoutMs": 3600000,  
"commandType": "Statement", 
"queryDialect": 3, 
"queryResultRows": 1613, 
"directQueryRequestCount": 5, 
"directQueryTotalRows": 33756 
}

The important thing to notice is that while the DAX query returns 1613 rows (see the queryResultRows metric) the SQL queries generated for that DAX query return 33756 rows between them (see the directQueryTotalRows metric). Why the big difference? This is because to do the year-to-date calculation using the old time intelligence functionality, Power BI has to run a query at the date granularity, which explains why there are so many more rows returned by the SQL queries. For example here’s a snippet of the last SQL query generated:

Yuck. What’s more, bringing this number of rows from the source can be time-consuming and even after these rows have made it to Power BI, they need to be iterated over (see the directQueryIterationTimeMs metric of 166ms) and aggregated up to get the final result of the calculation. This requires memory (see the approximatePeakMemConsumptionKB metric of 20977KB) and CPU (see the totalCpuTimeMs metric of 828ms) as well as adding to the overall duration of the DAX query.

I then created a copy of this model and set up a calendar using the new calendar-based time intelligence feature like so:

I then modified the measures above to use this new calendar:

Sales Amount =
SUM ( 'Internet Sales'[SalesAmount] )

YTD Sales Amount =
CALCULATE ( [Sales Amount], DATESYTD ( 'Gregorian' ) )

PY YTD Sales Amount =
CALCULATE ( [YTD Sales Amount], SAMEPERIODLASTYEAR ( 'Gregorian' ) )

I then reran the same DAX query from my matrix visual in DAX Studio for this model. Here are the Server Timings:

The good news is that the query is now much faster: 0.5 seconds instead of 1.9 seconds. But there are more SE queries! I’m told this is because some fusion optimisations (this presentation by Phil Seamark is an excellent introduction to this subject) haven’t yet been implemented for the new calendar-based time intelligence functionality yet, which means more SQL queries are generated than you might expect. Indeed some of the SQL queries run are identical. And since there is a limit on the number of connections that Power BI can use to run SQL queries in DirectQuery mode, and since you can run into performance problems when you hit those limits (see here for more details), then more SQL queries can be a bad thing – especially when there are many visuals on a page or a lot of concurrent users using the same semantic model.

However there is more good news if you look closely. Here are the Execution Metrics for this second run:

{ 
"timeStart": "2025-11-29T17:34:56.223Z", 
"timeEnd": "2025-11-29T17:34:56.754Z",  
"durationMs": 531, 
"datasourceConnectionThrottleTimeMs": 0, 
"directQueryConnectionTimeMs": 41, 
"directQueryIterationTimeMs": 38, 
"directQueryTotalTimeMs": 465, 
"externalQueryExecutionTimeMs": 410, 
"queryProcessingCpuTimeMs": 16, 
"totalCpuTimeMs": 141, 
"executionDelayMs": 0,  
"approximatePeakMemConsumptionKB": 3812,  
"directQueryTimeoutMs": 3600000, 
"tabularConnectionTimeoutMs": 3600000,  
"commandType": "Statement", 
"queryDialect": 3, 
"queryResultRows": 1613, 
"directQueryRequestCount": 11, 
"directQueryTotalRows": 3369 
}

Even though there are more SQL queries now the total number of rows returned by them is much less: the directQueryTotalRows metric is only 3369, so about 10% of what it was before. Why? Because instead of having to go down to the date granularity to do the calculations, the new calendar-based time intelligence functionality allows Power BI to do the calculation at the month granularity. Here’s a snippet of one of the SQL queries generated that shows this:

This in turn means that directQueryIterationTimeMs (now only 38ms), totalCpuTimeMs (now only 141ms) and approximatePeakMemConsumptionKB (now only 3812KB) are all much less than before. Also, this could mean you’re less likely to run into the Max Intermediate Row Set Count limit on the maximum number of rows that a DirectQuery SQL query can return and it opens up more opportunities to use aggregations to improve performance.

As a result, if you’re running into query performance, CU or memory-related problems in DirectQuery mode, you should experiment with using the new calendar-based time intelligence feature to see if it can help even if it results in more SQL queries being generated. Hopefully when those fusion optimisations are implemented in the future the benefits will be even greater.

Finally, it’s also worth mentioning that using Visual Calculations or Window functions (as discussed here) have very similar benefits when tuning DirectQuery mode, so you should check them out too and consider using them in combination with calendar-based time intelligence.

Snowflake Query Tags In Power BI And Workspace Monitoring

Since I wrote about the Snowflake query tags generated by Power BI earlier this year, one important piece of functionality has been added: an OperationID that allows you to link a SQL query in Snowflake to events in Workspace Monitoring and the Capacity Metrics app. Let’s see some examples.

I created a DirectQuery semantic model connected to Snowflake in a workspace with Workspace Monitoring enabled and ran a report. Looking in the Snowflake monitoring page I clicked on a SQL query that I knew had been generated by Power BI and found the query tag:

Here are the contents of the query tag:

{"PowerQuery":true,"Host":"PBI_SemanticModel_MWC","HostContext":"PowerBIPremium-DirectQuery","ActivityId":"377da3e0-900c-474f-aca2-c6bb6cd4d5a6"}

The ActivityId in the query tag is the OperationId of the query, and as you’ll know if you have read my recent posts, you can use the OperationId to get more details on the DAX query that generated this SQL query in Workspace Monitoring. Here’s the KQL I used to query Workspace Monitoring:

SemanticModelLogs
| where Timestamp > ago(1hr)
| where OperationId == "377da3e0-900c-474f-aca2-c6bb6cd4d5a6"
| project Timestamp, OperationName, EventText, DurationMs

And here are the results, showing the events associated with this DAX query:

Once you’ve found the events associated with a DAX query in Workspace Monitoring you can then get all kinds of other useful information, such as the IDs of the report and visual that generated the DAX query which are found in the ApplicationContext column in the SemanticModelLogs table.

Here’s more information about this DAX query from the new Timepoint Detail (Preview) page in the Capacity Metrics App, with the OperationId column showing the same ID from the query tag (see here for more details on this feature of the Capacity Metrics App):

This trick also works for Import mode refreshes. Here’s the query tag from Snowflake for a SQL query generated by Power BI during the refresh of an Import mode semantic model:

{"PowerQuery":true,"Host":"PBI_SemanticModel_MWC","HostContext":"PowerBIPremium-Import","ActivityId":"8f552c8e-4f7c-4376-b663-198f7f310d09"}

Again, you can use this ID to query Workspace Monitoring to get all the activity associated with the refresh operation:

SemanticModelLogs
| where Timestamp > ago(1hr)
| where OperationId == "8f552c8e-4f7c-4376-b663-198f7f310d09"
| project Timestamp, OperationName, EventText, DurationMs

And again you can find the OperationId for the refresh in the Timepoint Detail (Preview) page of the Capacity Metrics App (you can search for an ID in the Operation ID slicer on this page too):

This is going to make it a lot easier to troubleshoot slow DirectQuery reports or Import mode refreshes when you’re using Snowflake as a source.

[Thanks to my colleague Thierry Houy for this information]

Performance Implications Of Different Ways Of Fully Expanding A Power BI Matrix Visual

If you have a Power BI report with a matrix visual on it it’s quite likely that you’ll want all the levels in the matrix to be fully expanded by default. But did you know that the way you expand all the levels could have performance implications, especially if you’re using DirectQuery mode? Here’s an example.

I have a DirectQuery semantic model built on top of some of the tables from the SQL Server AdventureWorksDW sample database (apologies for the poor naming):

There are four DAX measures defined on it:

Sales Amount = SUM(FactInternetSales[SalesAmount])

Monday Sales = CALCULATE([Sales Amount], 'DimDate'[EnglishDayNameOfWeek]="Monday")

January Sales = CALCULATE([Sales Amount], 'DimDate'[EnglishMonthName]="January")

Class H Sales = CALCULATE([Sales Amount], 'DimProduct'[Class]="H")

I wrote these measures specifically to exacerbate the problem I’m going to show (by reducing the amount of fusion that is possible) but they are pretty normal, reasonable measures that you might find in any semantic model.

Now let’s say you add a matrix visual to a report page, put these four measures onto the columns axis of the matrix, and drop the CalendarYear column (from the DimDate table), the Color column and the Style column (both from the DimProduct table) onto the rows axis of the matrix. At this point it looks like this:

…but what you want to do now is show all the styles and colours too.

One way to do it – not the most efficient way, but some people like me just love to click – is to expand every year and style individually:

It doesn’t take too long to expand everything and after all you only need to do it once, right? But let’s take the DAX query generated for this visual and paste it into DAX Studio with Server Timings turned on and see what we can see:

There are 14 separate Storage Engine queries – which result in 14 separate SQL queries being sent to SQL Server. The first two Storage Engine/SQL queries get a list of which years and styles have been drilled down on and then there are (4 measures) * (3 levels of granularity) = 12 other Storage Engine queries to get the data shown in the visual. The overall duration of 230ms here is very low but in the real world the SQL queries could be a lot slower, making the DAX query very slow.

The default limits on the number of SQL queries that a DAX query can run in parallel have a big impact on overall performance here as you can see; even though you can increase those limits you may then hit the maximum number of connections that can be opened up to a DirectQuery source, and even though you can increase that limit too if you’re running on a capacity, there are hard limits here. If Power BI needs to open new connections to the data source in order to run these SQL queries, that can also slow things down too because there can sometimes be a noticeable wait when connections are opened. Reducing the number of Storage Engine queries generated by a DAX query is very important when tuning DirectQuery models; the effect is going to be a lot less noticeable on an Import or Direct Lake semantic model but it could still cause problems.

There’s good news though. If you expand the levels in your matrix in a different (and to be honest, much more convenient) way using the “Expand all down one level in the hierarchy” button on the visual header or the “Expand to next level” option on the right-click menu for the rows like so:

…then you get the same result but with a much more efficient DAX query. Here’s what Server Timings shows for the DAX query generated for the fully expanded matrix now:

This time there are only four Storage Engine/SQL queries, one for each measure, and the overall duration is just 50ms. Even though, as you can see from the screenshot, only three Storage Engine/SQL queries can run in parallel and the fourth has to wait for the first query to finish so it can run, that’s less of an issue given the smaller number of queries. I won’t bother showing the DAX for the two versions of the matrix but it’s clear when you look at them the second one is more efficient because it knows it can expand everything on rows rather than just what has been clicked. Of course this type of optimisation is only possible if you are fully expanding your matrix though.

Current Status Of Snowflake Query Tags In Power BI

Since the November 2024 Power BI release blog post announced that queries sent to Snowflake by Power BI include a query tag I’ve had a lot of questions from people who couldn’t see this happening or wanted to know what the query tags contained, so in this blog I thought I would outline the current status.

The query tagging feature for the Power BI Snowflake connector actually didn’t get released in November 2024 and even now, in April 2025, it’s only available for DirectQuery connections and Import mode refreshes that use the V1.0 connector (the V2.0 connector will support query tags soon). Here’s an example of what a query tag looks like for a SQL query generated by Power BI from a DirectQuery semantic model:

{"PowerQuery":true,"Host":"PBI_SemanticModel_MWC","HostContext":"PowerBIPremium-DirectQuery"}

And here’s what a query tag for a SQL query generated for a semantic model refresh:

{"PowerQuery":true,"Host":"PBI_SemanticModel_MWC","HostContext":"PowerBIPremium-Import"}

At the time of writing only SQL queries sent from the Power BI Service contain query tags, not those sent from Power BI Desktop. Also there is no way to customise the contents and unlike SQL queries sent to SQL Server-related sources there is no information on the report or visual that generated the SQL query. In the future some of these limitations may go away.

Another Way To Get The SQL Generated By Power BI DirectQuery Mode

If you’re performance tuning a DirectQuery mode semantic model in Power BI, one of the first things you’ll want to do is look at the SQL that Power BI is generating. That’s easy if you have permissions to monitor your source database but if you don’t, it can be quite difficult to do so from Power BI. I explained the options for getting the SQL generated in DirectQuery mode and why it’s so complicated in a presentation here, but I’ve recently found a new way of doing this in Power BI Desktop (but not the Service) that works for some M-based connectors, for example Snowflake.

The trick is to use Profiler and the M Data Provider Events/Execute Source Query event. To demonstrate this I created a DirectQuery semantic model in Power BI Desktop using Snowflake as a source. I then connected SQL Server Profiler to Power BI Desktop (the easiest way to do this is to install it as an External Tool as described here, although you can connect by finding the diagnostics port as described here) and selected the following events:

I then added a card visual to my report that contained a simple SUM measure.

Here are the events generated in Profiler:

As you would expect there’s a single DirectQuery Begin/End event pair generated because a single SQL query is sent to Snowflake by the card visual. However, the TextData column shows the following SQL – which is definitely not Snowflake SQL:

SELECT SUM([t0].[fare_amount])
 AS [a0]
FROM [NYCTAXIDATA] AS [t0] 

This is because for some data sources, Power BI emits TSQL and the connector used then converts this to the actual dialect of SQL used by the source database.

This is where the Execute Source Query event comes in. The TextData column for this event shows the actual SQL sent to Snowflake for this query, which is:

select sum("fare_amount") as "C1"
from "NYCTAXIDB"."PUBLIC"."NYCTAXIDATA"

Interestingly this event is generated after the Query End event for the whole query, but you can associate this event with its parent DAX query by matching the value in the RequestId column with the RequestId for the other events generated by the query.

Monitoring The Opening Of DirectQuery Connections In Power BI

In the past I have blogged about how important the number of connections from a Power BI DirectQuery semantic model to a data source is for performance. It’s also true that in some cases opening a connection, or some of the operations associated with opening a connection, can be very slow. As a result it can be useful to see when your semantic model opens connections to a data source, and you can do this with Log Analytics.

To show this I published a DirectQuery semantic model to a workspace with Log Analytics enabled and opened a report with one table visual on it. Because the model and report had only just been published I could be sure there were no existing connections back to the data source used by the DirectQuery model. I then ran the following KQL query:

PowerBIDatasetsWorkspace
| where TimeGenerated>ago(1hr)
| where OperationName in ('Notification', 'QueryBegin', 'QueryEnd', 'DirectQueryBegin', 'DirectQueryEnd')
| project TimeGenerated, OperationName, OperationDetailName, EventText, DurationMs
| order by TimeGenerated asc

Here’s the output:

Some things to note:

  • I have highlighted in red the events of type Notification and subtype OpenedConnection. These events are fired when the semantic model opens connections back to the data source. There are three of these events so three connections are being opened.
  • There is one DAX query generated for the single visual on the report page. As you would expect there is a single pair of QueryBegin/QueryEnd events.
  • After the QueryBegin event there are two DirectQueryBegin/DirectQueryEnd events, which mean that for this DAX query Power BI ran two SQL queries against the source.
  • Immediately before these two DirectQueryBegin/DirectQueryEnd pairs, and after the QueryBegin event, there are two Notification events which indicate that two connections are being opened to run these queries.
  • Before the DAX query runs, another connection is opened and there is a DirectQueryBegin/DirectQueryEnd pair that is not associated with a DAX query that I have highlighted in blue. The EventText column contains M code rather than SQL code, and a look at this M code shows that it uses the DirectQueryCapabilities.From function which is marked for “internal use only” in the docs. The DirectQueryEnd event from this pair has a duration of 3 seconds which shows that in this case it’s quite expensive. This query is the Power Query engine understanding the DirectQuery capabilities of the source and one of the side effects of this is that it needs to read metadata from the source, which can be quite slow sometimes. Doing things like disabling relationship columns in the connector and, for some sources like Snowflake, connecting with an identity or with a security role that can only see the tables or views used by the model (as mentioned here), can make this a lot faster.

For some reason the Notification events associated with opening connections do not appear in Profiler traces run on models in the Service, although they do appear if you run a trace on a model in Power BI Desktop. This is why I’ve used Log Analytics here.

Power BI pools connections once they are opened and keeps them alive for a certain amount of time and it also caches data source metadata after it has retrieved it, so as a result you may not see the events I have highlighted here when you run a report.

[Thanks to Curt Hagenlocher and Akshai Mirchandani for some of the information in this post]

Power BI DirectQuery, Gateways And SQL Queries That Return Lots Of Rows

Recently I was working with a customer using DirectQuery mode and where all traffic to the data source had to go through an on-premises data gateway for security reasons. They noticed that report performance got worse when traffic went through the gateway and this was particularly true when Power BI generated SQL queries that returned hundreds of thousands of rows. Looking in the gateway logs we found that spooling (see here and here for more details on spooling, and here for information on monitoring spooling in the logs) was taking place, so I advised the customer to set the StreamBeforeRequestCompletes property on the gateway to true. This eliminated the delays from spooling on the gateway (I’m told it helps performance inside the model too) and resulted in some substantial improvements in report performance. As a result, if you’re using DirectQuery and a gateway, you should experiment with the StreamBeforeRequestCompletes property to see if it can help you too.

That said, if you’re using DirectQuery mode you should try to avoid situations where Power BI generates SQL queries that return a large number of rows such as table visuals with scrollbars. Regardless of whether you’re using a gateway or not, moving large amounts of data from your data source to your semantic model can be slow – even if your data source tells you the SQL query itself is fast. You can use Performance Analyzer in Power BI Desktop (see here) to find out how many rows the SQL queries Power BI generates return and how long it takes to read that data.