Power BI · Shareable Cloud Connections · Snowflake

Multiple Connections To The Same Data Source In The Power BI Service With Shareable Cloud Connections

A few weeks ago an important new feature for managing connections to data sources in the Power BI Service was released: Shareable Cloud Connections. You can read the blog post announcing them here. I won’t describe their functionality because the post already does that perfectly well; I want to focus on one thing in particular that is important for anyone using Power BI with Snowflake (and, I believe BigQuery and probably several other non-Microsoft sources): Shareable Cloud Connections allow you to have multiple connections to the same data source in the Power BI Service, each using different credentials.

Some of you are going to read that last sentence and get very excited. Many of you will probably be surprised that Power BI didn’t already support this. To understand what’s going on here you first have to understand what Power BI considers a “data source”. The answer can be found on this page of the Power Query SDK docs:

The M engine identifies a data source using a combination of its Kind and Path […]

The Kind value comes from the Data Source Kind definition.

The Path value is derived from the required parameters of your data source function. Optional parameters aren’t factored into the data source path identifier.

In the case of the Snowflake connector, the “Kind” of the connector is Snowflake and the “Path” is the determined by the two required parameters in the Snowflake connector, namely the Server and the Warehouse:

Before Shareable Cloud Connections, unless you used a gateway, you could only use one connection with one set of credentials for each data source used in the Power BI Service. This meant, for Snowflake, you could only use one set of credentials for all datasets that connected to the same Server and Warehouse, which led to a variety of problems like this one where different credentials were needed for different Snowflake databases or like this one where one user would publish a dataset and enter credentials that worked for them and then a second user would publish another dataset, enter different credentials for the same Server/Warehouse combination and break refresh for the first dataset. With most other popular connectors these issues were rarer because their Paths are more specific and aligned to how you’d want to use different credentials.

As I said, Shareable Clould Connections solve all this by allowing the creation of multiple named connections to the same source, each of which can use different credentials. As a result I strongly recommend everyone using Snowflake with Power BI to create new Shareable Clould Connections and use them in the Power BI Service.

BigQuery · Databricks · DirectQuery · Google · Performance Tuning · Power BI · Snowflake

ApproximateDistinctCount DAX Function Now Works On More DirectQuery Sources

Some good news for those of you using DirectQuery mode in Power BI: the ApproximateDistinctCount DAX function, which returns an estimate of the number of the distinct values in a column and which can be a lot faster than a true distinct count as returned by the DistinctCount function, is now available to use with BigQuery, Databricks and Snowflake sources. It only worked with Azure SQL DB and Synapse before; RedShift is coming soon. You can use it in exactly the same way that you would with the DistinctCount function except that it only works in DirectQuery mode.

For example, I have a Power BI DirectQuery dataset that uses the New York Taxi data in Snowflake as its source. With the following two DAX measures defined on the Trip table:

Approximate Distinct Medallions = APPROXIMATEDISTINCTCOUNT('TRIP'[MEDALLIONID])
Distinct Medallions = DISTINCTCOUNT('TRIP'[MEDALLIONID])

…I can build a table visual that compares the output of the two measures (as you can see, the difference isn’t that big):

…and see that the ApproximateDistinctCount DAX function is translated to the APPROX_COUNT_DISTINCT function in Snowflake SQL:

select { fn convert(count(distinct("MEDALLIONID")), SQL_DOUBLE) } + { fn convert(max("C1"), SQL_DOUBLE) } as "C1",
    approx_count_distinct("MEDALLIONID") as "C2"
from 
(
    select "DATEID",
        "MEDALLIONID",
        "HACKNEYLICENSEID",
        "PICKUPTIMEID",
        "DROPOFFTIMEID",
        "PICKUPGEOGRAPHYID",
        "DROPOFFGEOGRAPHYID",
        "PICKUPLATITUDE",
        "PICKUPLONGITUDE",
        "PICKUPLATLONG",
        "DROPOFFLATITUDE",
        "DROPOFFLONGITUDE",
        "DROPOFFLATLONG",
        "PASSENGERCOUNT",
        "TRIPDURATIONSECONDS",
        "TRIPDISTANCEMILES",
        "PAYMENTTYPE",
        "FAREAMOUNT",
        "SURCHARGEAMOUNT",
        "TAXAMOUNT",
        "TIPAMOUNT",
        "TOLLSAMOUNT",
        "TOTALAMOUNT",
        case
            when "MEDALLIONID" is null
            then CAST(1 as INTEGER)
            else CAST(0 as INTEGER)
        end as "C1"
    from "NYCDATA_DB"."NYCTAXIDATA"."TRIP"
) as "ITBL"

Distinct counts are often the slowest type of measure but in my experience report consumers are very unwilling to accept seeing “near enough” numbers in their reports rather than numbers that are 100% accurate, even if the approximate distinct counts are much faster. I heard someone suggest using field parameters to allow report consumers to switch between showing fast approximate distinct counts for exploration and accurate distinct counts when they really need them, and I think this is a great compromise.

Bonus links: if you need to do an approximate distinct count in Import mode, Phil Seamark shows how to do this here; Phil also wrote a great post on building aggregations for distinct counts (read it carefully – this is a really powerful technique!) here.

DirectQuery · Power BI · Snowflake

Power BI DirectQuery Best Practices For Snowflake And Other Databases

Recently I collaborated with a number of people from Microsoft and Snowflake to write a blog post on best practices for using Power BI in DirectQuery mode on Snowflake. You can read it here:

https://medium.com/@jerellegainey/best-practices-for-using-power-bi-in-directquery-mode-with-snowflake-bfd1312ca7ab

It builds on what is already in the Power BI guidance documentation for DirectQuery to add some advice specific to Snowflake. It also has a few other tips that are generally applicable to all DirectQuery sources and which aren’t in the guidance docs (yet), such as the importance of setting the Maximum Connections Per Data Source property (which I also blogged about recently here) and the fact you can increase this to 30 in a Premium capacity, as well as the importance of always testing DirectQuery performance in the Power BI Service rather than in Power BI Desktop. As a result it’s worth reading if you are thinking of using Power BI in DirectQuery mode with Synapse, BigQuery or any other source.

If you are considering using DirectQuery on a project I have one last piece of advice: think very carefully about why you need to use DirectQuery and not Import mode. Many people don’t and end up in trouble – in my opinion Import mode should be the default choice simply because it will almost always perform better than DirectQuery mode, whatever back-end database you’re using.

Power BI · Snowflake

Build Scalable BI Solutions Using Power BI and Snowflake

On the Power BI team we want our customers to be successful whichever data source they’re using – not just the Microsoft ones. Recently I had the pleasure of recording a webinar on the subject of best practices for using Power BI with Snowflake, along with Craig Collier from Snowflake and Chris Holliday from Visual BI (a partner that specialises in Power BI/Snowflake solutions). You can watch it here:

https://info.microsoft.com/ww-landing-build-scalable-BI-solutions-using-power-BI-and-snowflake.html

There’s a summary of what we talk about on the Snowflake blog:

https://www.snowflake.com/blog/maximizing-power-bi-with-snowflake/

We’re continuously improving the performance and functionality of all our connectors, and Snowflake is no exception: we recently announced that in early 2021 we’ll have support for Snowflake roles and the ability to use your own SQL queries in DirectQuery mode. Power BI support for Azure Service Tags (on the roadmap here) will also be very useful for Snowflake customers.