ODBC Data Sources, The SqlCapabilities Option And Power Query/Power BI Data Refresh Performance

A few weeks ago I received a great tip from Simon Nuss about a Power Query feature relating to ODBC data sources that I hadn’t seen before (although it’s in Power BI Desktop and Excel 365, so it can’t be that new) and which is potentially important for the performance of data refresh and of the Power Query Editor’s preview pane. It’s only visible if you click “Advanced options” when creating your ODBC data source:

image

It’s the “Supported row reduction clauses” option:

image

What does it do? Well, first of all let’s see what happens if you don’t set it. Let’s say you have an ODBC data source that uses the Microsoft ODBC Driver for SQL Server to connect to the Adventure Works DW database. If you create a new query that uses this data source to get the contents of the DimDate table it works pretty much as you’d expect; here’s the M code:

let
Source =
Odbc.DataSource(
“dsn=Test SQL”,
[HierarchicalNavigation=true]
),
#”Adventure Works DW_Database” =
Source{
[Name=”Adventure Works DW”,Kind=”Database”]
}[Data],
dbo_Schema =
#”Adventure Works DW_Database”{
[Name=”dbo”,Kind=”Schema”]
}[Data],
DimDate_Table =
dbo_Schema{[Name=”DimDate”,Kind=”Table”]}[Data]
in
DimDate_Table

image

However, if you look in Profiler when the contents of the Preview window is refreshed, you’ll see that the Power Query engine is running a query that gets the entire contents of the table:

image

Obviously, if you are connecting to a big table this could make the Power Query Editor slow. What’s more, as this blog post shows, the Power Query Editor only actually wants to display 1000 rows and the fact that the query is requesting all the rows from the table suggests that even basic query folding is not happening on this ODBC data source.

Now let’s see what happens if you do select something from the “Supported row reduction clauses” dropdown. As the screenshot above shows, there are four values in the dropdown that relate to the form of SQL supported by your ODBC source to only get the first N rows from a table. If you don’t know which one to choose you can click the Detect button and it will select one for you. In this case I selected TOP:

image

Having done this, when you connect to the same DimDate table, you’ll see that now the Power Query Editor is only requesting the top 1000 rows from the table:

image

Also, if you look at the M code for the query, the Odbc.DataSource function has the as-yet undocumented SqlCapabilities option set in its second parameter,

Odbc.DataSource(
“dsn=Test SQL”,
[HierarchicalNavigation=true,
SqlCapabilities=[LimitClauseKind=LimitClauseKind.Top]
]
)

Actually it’s not quite undocumented because it is mentioned here as something that is only available if you’re using Odbc.DataSource in a custom connector; obviously things have changed.

Setting this option can make a big difference to the responsiveness of the Power Query Editor when you’re working with large tables and/or slow data sources – I saw this myself this week when working with one of my customers. It’s also going to be important if you use the Keep Rows/Keep Top Rows button on the ribbon or the Table.FirstN() M function; I suspect there are other, more obscure, reasons why it might speed refresh up even if you aren’t filtering the rows of the table but I’m not 100% sure.

One last thing to mention is that there is another undocumented function which is what, I think, the Detect button in the screenshots above uses to find out how to set the SqlCapabilities option. It’s called Odbc.InferOptions and it can be used like so:

Odbc.InferOptions(“dsn=Test SQL”)

This function returns a record containing a field called LimitClauseKind that tells you the type of limit clause that is supported:

image

image

Here’s a list of the possible LimitClauseKind values I found using #shared:

image

10 thoughts on “ODBC Data Sources, The SqlCapabilities Option And Power Query/Power BI Data Refresh Performance

  1. Interesting finding with possibly additional benefits

    Performance effect on Refresh of PowerPivot models ?
    Does this mean, that when I’m refreshing a PowerPivot model based on PowerQueries,
    that it still loads the data twice, but in the first kind of preview run, it applies the SqlCapabilities settings?

    This would already result in a quite good improvement. Still the double refresh in Excel is still a miracle
    and occurs on very simple examples, even all privacy settings have been turned off.

    Odbc.DataSource vs Odbc.Query
    Is there maybe also a possibility to set similar options to Odbc.Query?
    Theoretically PowerQuery could wrap the SQL statement in a subselect and apply a top N on SELECT * FROM…

    Cheers
    Michael

    1. It should have an impact on performance of refresh of Power Pivot models, yes. I don’t think it works with Odbc.Query though – as soon as you specify your own query against any data source, query folding stops happening.

  2. Is it possible to change data type and filter data while importing using SQL Statement? I have an ODBC data source where one of the columns is time-stamp data. Currently it is being imported as text. I would like to first convert it into date-time and Format and apply a filter in the SQL statement to import data from a particular date range.

    Thanks. I appreciate your help

  3. can this setting be applied after the data query is already defined to improve the refresh in existing reports or would they have to be rebuilt?

  4. We are trying to apply this setting for an existing query. The query is going via the DevArt MySQL ODBC driver. For some reason the setting does not persist: we change it to “LIMIT” and it resets back to “(None)” after reopening the Source settings.

    Not sure if it’s a Power BI Desktop bug or something in the ODBC driver. We are on the latest releases of both Power BI Desktop and the ODBC driver. Has anyone run into any similar issues with this driver or any other driver?

  5. Thank you for the very helpful article. There is no documention or help on any other site except yours on this topic.

    I tried the snipped from the article to have an ODBC-connected table in DirectQuery mode. Sadly, it still stores the table (finalTable below) in Import mode. Help from anyone is much appreciated!

    let
    Source= Odbc.DataSource(“dsn=localhost SQL Server”,
    [HierarchicalNavigation=true,
    SqlCapabilities=[SupportsDerivedTable = true]
    ]
    ),
    pdbname = Source{[Name=”dbname”,Kind=”Database”]}[Data],
    dbo_Schema = pdbname{[Name=”dbo”,Kind=”Schema”]}[Data],
    #”Filtered Rows” = Table.SelectRows(dbo_Schema, each [Name] = “tablename”),
    finalTable = #”Filtered Rows”{[Name=”tablename”,Kind=”Table”]}[Data]
    in
    finalTable

Leave a Reply to David BrandCancel reply