ODBC Data Sources, The SqlCapabilities Option And Power Query/Power BI Data Refresh Performance

A few weeks ago I received a great tip from Simon Nuss about a Power Query feature relating to ODBC data sources that I hadn’t seen before (although it’s in Power BI Desktop and Excel 365, so it can’t be that new) and which is potentially important for the performance of data refresh and of the Power Query Editor’s preview pane. It’s only visible if you click “Advanced options” when creating your ODBC data source:

image

It’s the “Supported row reduction clauses” option:

image

What does it do? Well, first of all let’s see what happens if you don’t set it. Let’s say you have an ODBC data source that uses the Microsoft ODBC Driver for SQL Server to connect to the Adventure Works DW database. If you create a new query that uses this data source to get the contents of the DimDate table it works pretty much as you’d expect; here’s the M code:

let
    Source =
        Odbc.DataSource(
            "dsn=Test SQL",
            [HierarchicalNavigation=true]
        ),
    #"Adventure Works DW_Database" =
        Source{
            [Name="Adventure Works DW",Kind="Database"]
            }[Data],
    dbo_Schema =
        #"Adventure Works DW_Database"{
            [Name="dbo",Kind="Schema"]
            }[Data],
    DimDate_Table =
        dbo_Schema{[Name="DimDate",Kind="Table"]}[Data]
in
    DimDate_Table

image

However, if you look in Profiler when the contents of the Preview window is refreshed, you’ll see that the Power Query engine is running a query that gets the entire contents of the table:

image

Obviously, if you are connecting to a big table this could make the Power Query Editor slow. What’s more, as this blog post shows, the Power Query Editor only actually wants to display 1000 rows and the fact that the query is requesting all the rows from the table suggests that even basic query folding is not happening on this ODBC data source.

Now let’s see what happens if you do select something from the “Supported row reduction clauses” dropdown. As the screenshot above shows, there are four values in the dropdown that relate to the form of SQL supported by your ODBC source to only get the first N rows from a table. If you don’t know which one to choose you can click the Detect button and it will select one for you. In this case I selected TOP:

image

Having done this, when you connect to the same DimDate table, you’ll see that now the Power Query Editor is only requesting the top 1000 rows from the table:

image

Also, if you look at the M code for the query, the Odbc.DataSource function has the as-yet undocumented SqlCapabilities option set in its second parameter,

Odbc.DataSource(
 "dsn=Test SQL",
 [HierarchicalNavigation=true,
 SqlCapabilities=[LimitClauseKind=LimitClauseKind.Top]
 ]
)

Actually it’s not quite undocumented because it is mentioned here as something that is only available if you’re using Odbc.DataSource in a custom connector; obviously things have changed.

Setting this option can make a big difference to the responsiveness of the Power Query Editor when you’re working with large tables and/or slow data sources – I saw this myself this week when working with one of my customers. It’s also going to be important if you use the Keep Rows/Keep Top Rows button on the ribbon or the Table.FirstN() M function; I suspect there are other, more obscure, reasons why it might speed refresh up even if you aren’t filtering the rows of the table but I’m not 100% sure.

One last thing to mention is that there is another undocumented function which is what, I think, the Detect button in the screenshots above uses to find out how to set the SqlCapabilities option. It’s called Odbc.InferOptions and it can be used like so:

Odbc.InferOptions("dsn=Test SQL")

This function returns a record containing a field called LimitClauseKind that tells you the type of limit clause that is supported:

image

image

Here’s a list of the possible LimitClauseKind values I found using #shared:

image

3 responses

  1. Interesting finding with possibly additional benefits

    Performance effect on Refresh of PowerPivot models ?
    Does this mean, that when I’m refreshing a PowerPivot model based on PowerQueries,
    that it still loads the data twice, but in the first kind of preview run, it applies the SqlCapabilities settings?

    This would already result in a quite good improvement. Still the double refresh in Excel is still a miracle
    and occurs on very simple examples, even all privacy settings have been turned off.

    Odbc.DataSource vs Odbc.Query
    Is there maybe also a possibility to set similar options to Odbc.Query?
    Theoretically PowerQuery could wrap the SQL statement in a subselect and apply a top N on SELECT * FROM…

    Cheers
    Michael

    • It should have an impact on performance of refresh of Power Pivot models, yes. I don’t think it works with Odbc.Query though – as soon as you specify your own query against any data source, query folding stops happening.

  2. Is it possible to change data type and filter data while importing using SQL Statement? I have an ODBC data source where one of the columns is time-stamp data. Currently it is being imported as text. I would like to first convert it into date-time and Format and apply a filter in the SQL statement to import data from a particular date range.

    Thanks. I appreciate your help

Leave a Reply to Michael Brönnimann Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: