Snowflake Query Tags In Power BI And Workspace Monitoring

Since I wrote about the Snowflake query tags generated by Power BI earlier this year, one important piece of functionality has been added: an OperationID that allows you to link a SQL query in Snowflake to events in Workspace Monitoring and the Capacity Metrics app. Let’s see some examples.

I created a DirectQuery semantic model connected to Snowflake in a workspace with Workspace Monitoring enabled and ran a report. Looking in the Snowflake monitoring page I clicked on a SQL query that I knew had been generated by Power BI and found the query tag:

Here are the contents of the query tag:

{"PowerQuery":true,"Host":"PBI_SemanticModel_MWC","HostContext":"PowerBIPremium-DirectQuery","ActivityId":"377da3e0-900c-474f-aca2-c6bb6cd4d5a6"}

The ActivityId in the query tag is the OperationId of the query, and as you’ll know if you have read my recent posts, you can use the OperationId to get more details on the DAX query that generated this SQL query in Workspace Monitoring. Here’s the KQL I used to query Workspace Monitoring:

SemanticModelLogs
| where Timestamp > ago(1hr)
| where OperationId == "377da3e0-900c-474f-aca2-c6bb6cd4d5a6"
| project Timestamp, OperationName, EventText, DurationMs

And here are the results, showing the events associated with this DAX query:

Once you’ve found the events associated with a DAX query in Workspace Monitoring you can then get all kinds of other useful information, such as the IDs of the report and visual that generated the DAX query which are found in the ApplicationContext column in the SemanticModelLogs table.

Here’s more information about this DAX query from the new Timepoint Detail (Preview) page in the Capacity Metrics App, with the OperationId column showing the same ID from the query tag (see here for more details on this feature of the Capacity Metrics App):

This trick also works for Import mode refreshes. Here’s the query tag from Snowflake for a SQL query generated by Power BI during the refresh of an Import mode semantic model:

{"PowerQuery":true,"Host":"PBI_SemanticModel_MWC","HostContext":"PowerBIPremium-Import","ActivityId":"8f552c8e-4f7c-4376-b663-198f7f310d09"}

Again, you can use this ID to query Workspace Monitoring to get all the activity associated with the refresh operation:

SemanticModelLogs
| where Timestamp > ago(1hr)
| where OperationId == "8f552c8e-4f7c-4376-b663-198f7f310d09"
| project Timestamp, OperationName, EventText, DurationMs

And again you can find the OperationId for the refresh in the Timepoint Detail (Preview) page of the Capacity Metrics App (you can search for an ID in the Operation ID slicer on this page too):

This is going to make it a lot easier to troubleshoot slow DirectQuery reports or Import mode refreshes when you’re using Snowflake as a source.

[Thanks to my colleague Thierry Houy for this information]

Current Status Of Snowflake Query Tags In Power BI

Since the November 2024 Power BI release blog post announced that queries sent to Snowflake by Power BI include a query tag I’ve had a lot of questions from people who couldn’t see this happening or wanted to know what the query tags contained, so in this blog I thought I would outline the current status.

The query tagging feature for the Power BI Snowflake connector actually didn’t get released in November 2024 and even now, in April 2025, it’s only available for DirectQuery connections and Import mode refreshes that use the V1.0 connector (the V2.0 connector will support query tags soon). Here’s an example of what a query tag looks like for a SQL query generated by Power BI from a DirectQuery semantic model:

{"PowerQuery":true,"Host":"PBI_SemanticModel_MWC","HostContext":"PowerBIPremium-DirectQuery"}

And here’s what a query tag for a SQL query generated for a semantic model refresh:

{"PowerQuery":true,"Host":"PBI_SemanticModel_MWC","HostContext":"PowerBIPremium-Import"}

At the time of writing only SQL queries sent from the Power BI Service contain query tags, not those sent from Power BI Desktop. Also there is no way to customise the contents and unlike SQL queries sent to SQL Server-related sources there is no information on the report or visual that generated the SQL query. In the future some of these limitations may go away.

Another Way To Get The SQL Generated By Power BI DirectQuery Mode

If you’re performance tuning a DirectQuery mode semantic model in Power BI, one of the first things you’ll want to do is look at the SQL that Power BI is generating. That’s easy if you have permissions to monitor your source database but if you don’t, it can be quite difficult to do so from Power BI. I explained the options for getting the SQL generated in DirectQuery mode and why it’s so complicated in a presentation here, but I’ve recently found a new way of doing this in Power BI Desktop (but not the Service) that works for some M-based connectors, for example Snowflake.

The trick is to use Profiler and the M Data Provider Events/Execute Source Query event. To demonstrate this I created a DirectQuery semantic model in Power BI Desktop using Snowflake as a source. I then connected SQL Server Profiler to Power BI Desktop (the easiest way to do this is to install it as an External Tool as described here, although you can connect by finding the diagnostics port as described here) and selected the following events:

I then added a card visual to my report that contained a simple SUM measure.

Here are the events generated in Profiler:

As you would expect there’s a single DirectQuery Begin/End event pair generated because a single SQL query is sent to Snowflake by the card visual. However, the TextData column shows the following SQL – which is definitely not Snowflake SQL:

SELECT SUM([t0].[fare_amount])
 AS [a0]
FROM [NYCTAXIDATA] AS [t0] 

This is because for some data sources, Power BI emits TSQL and the connector used then converts this to the actual dialect of SQL used by the source database.

This is where the Execute Source Query event comes in. The TextData column for this event shows the actual SQL sent to Snowflake for this query, which is:

select sum("fare_amount") as "C1"
from "NYCTAXIDB"."PUBLIC"."NYCTAXIDATA"

Interestingly this event is generated after the Query End event for the whole query, but you can associate this event with its parent DAX query by matching the value in the RequestId column with the RequestId for the other events generated by the query.

Multiple Connections To The Same Data Source In The Power BI Service With Shareable Cloud Connections

A few weeks ago an important new feature for managing connections to data sources in the Power BI Service was released: Shareable Cloud Connections. You can read the blog post announcing them here. I won’t describe their functionality because the post already does that perfectly well; I want to focus on one thing in particular that is important for anyone using Power BI with Snowflake (and, I believe BigQuery and probably several other non-Microsoft sources): Shareable Cloud Connections allow you to have multiple connections to the same data source in the Power BI Service, each using different credentials.

Some of you are going to read that last sentence and get very excited. Many of you will probably be surprised that Power BI didn’t already support this. To understand what’s going on here you first have to understand what Power BI considers a “data source”. The answer can be found on this page of the Power Query SDK docs:

The M engine identifies a data source using a combination of its Kind and Path […]

The Kind value comes from the Data Source Kind definition.

The Path value is derived from the required parameters of your data source function. Optional parameters aren’t factored into the data source path identifier.

In the case of the Snowflake connector, the “Kind” of the connector is Snowflake and the “Path” is the determined by the two required parameters in the Snowflake connector, namely the Server and the Warehouse:

Before Shareable Cloud Connections, unless you used a gateway, you could only use one connection with one set of credentials for each data source used in the Power BI Service. This meant, for Snowflake, you could only use one set of credentials for all datasets that connected to the same Server and Warehouse, which led to a variety of problems like this one where different credentials were needed for different Snowflake databases or like this one where one user would publish a dataset and enter credentials that worked for them and then a second user would publish another dataset, enter different credentials for the same Server/Warehouse combination and break refresh for the first dataset. With most other popular connectors these issues were rarer because their Paths are more specific and aligned to how you’d want to use different credentials.

As I said, Shareable Clould Connections solve all this by allowing the creation of multiple named connections to the same source, each of which can use different credentials. As a result I strongly recommend everyone using Snowflake with Power BI to create new Shareable Clould Connections and use them in the Power BI Service.

ApproximateDistinctCount DAX Function Now Works On More DirectQuery Sources

Some good news for those of you using DirectQuery mode in Power BI: the ApproximateDistinctCount DAX function, which returns an estimate of the number of the distinct values in a column and which can be a lot faster than a true distinct count as returned by the DistinctCount function, is now available to use with BigQuery, Databricks and Snowflake sources. It only worked with Azure SQL DB and Synapse before; RedShift is coming soon. You can use it in exactly the same way that you would with the DistinctCount function except that it only works in DirectQuery mode.

For example, I have a Power BI DirectQuery dataset that uses the New York Taxi data in Snowflake as its source. With the following two DAX measures defined on the Trip table:

Approximate Distinct Medallions = APPROXIMATEDISTINCTCOUNT('TRIP'[MEDALLIONID])
Distinct Medallions = DISTINCTCOUNT('TRIP'[MEDALLIONID])

…I can build a table visual that compares the output of the two measures (as you can see, the difference isn’t that big):

…and see that the ApproximateDistinctCount DAX function is translated to the APPROX_COUNT_DISTINCT function in Snowflake SQL:

select { fn convert(count(distinct("MEDALLIONID")), SQL_DOUBLE) } + { fn convert(max("C1"), SQL_DOUBLE) } as "C1",
    approx_count_distinct("MEDALLIONID") as "C2"
from 
(
    select "DATEID",
        "MEDALLIONID",
        "HACKNEYLICENSEID",
        "PICKUPTIMEID",
        "DROPOFFTIMEID",
        "PICKUPGEOGRAPHYID",
        "DROPOFFGEOGRAPHYID",
        "PICKUPLATITUDE",
        "PICKUPLONGITUDE",
        "PICKUPLATLONG",
        "DROPOFFLATITUDE",
        "DROPOFFLONGITUDE",
        "DROPOFFLATLONG",
        "PASSENGERCOUNT",
        "TRIPDURATIONSECONDS",
        "TRIPDISTANCEMILES",
        "PAYMENTTYPE",
        "FAREAMOUNT",
        "SURCHARGEAMOUNT",
        "TAXAMOUNT",
        "TIPAMOUNT",
        "TOLLSAMOUNT",
        "TOTALAMOUNT",
        case
            when "MEDALLIONID" is null
            then CAST(1 as INTEGER)
            else CAST(0 as INTEGER)
        end as "C1"
    from "NYCDATA_DB"."NYCTAXIDATA"."TRIP"
) as "ITBL"

Distinct counts are often the slowest type of measure but in my experience report consumers are very unwilling to accept seeing “near enough” numbers in their reports rather than numbers that are 100% accurate, even if the approximate distinct counts are much faster. I heard someone suggest using field parameters to allow report consumers to switch between showing fast approximate distinct counts for exploration and accurate distinct counts when they really need them, and I think this is a great compromise.

Bonus links: if you need to do an approximate distinct count in Import mode, Phil Seamark shows how to do this here; Phil also wrote a great post on building aggregations for distinct counts (read it carefully – this is a really powerful technique!) here.

Power BI DirectQuery Best Practices For Snowflake And Other Databases

Recently I collaborated with a number of people from Microsoft and Snowflake to write a blog post on best practices for using Power BI in DirectQuery mode on Snowflake. You can read it here:

https://medium.com/@jerellegainey/best-practices-for-using-power-bi-in-directquery-mode-with-snowflake-bfd1312ca7ab

It builds on what is already in the Power BI guidance documentation for DirectQuery to add some advice specific to Snowflake. It also has a few other tips that are generally applicable to all DirectQuery sources and which aren’t in the guidance docs (yet), such as the importance of setting the Maximum Connections Per Data Source property (which I also blogged about recently here) and the fact you can increase this to 30 in a Premium capacity, as well as the importance of always testing DirectQuery performance in the Power BI Service rather than in Power BI Desktop. As a result it’s worth reading if you are thinking of using Power BI in DirectQuery mode with Synapse, BigQuery or any other source.

If you are considering using DirectQuery on a project I have one last piece of advice: think very carefully about why you need to use DirectQuery and not Import mode. Many people don’t and end up in trouble – in my opinion Import mode should be the default choice simply because it will almost always perform better than DirectQuery mode, whatever back-end database you’re using.

Build Scalable BI Solutions Using Power BI and Snowflake

On the Power BI team we want our customers to be successful whichever data source they’re using – not just the Microsoft ones. Recently I had the pleasure of recording a webinar on the subject of best practices for using Power BI with Snowflake, along with Craig Collier from Snowflake and Chris Holliday from Visual BI (a partner that specialises in Power BI/Snowflake solutions). You can watch it here:

https://info.microsoft.com/ww-landing-build-scalable-BI-solutions-using-power-BI-and-snowflake.html

There’s a summary of what we talk about on the Snowflake blog:

https://www.snowflake.com/blog/maximizing-power-bi-with-snowflake/

We’re continuously improving the performance and functionality of all our connectors, and Snowflake is no exception: we recently announced that in early 2021 we’ll have support for Snowflake roles and the ability to use your own SQL queries in DirectQuery mode. Power BI support for Azure Service Tags (on the roadmap here) will also be very useful for Snowflake customers.