Excel.Workbook() And The delayTypes Option In Power Query/Power BI

A while ago I found myself wondering – as I often do about this kind of thing – about what the undocumented third parameter of the Excel.Workbook() M function (called delayTypes) actually does. I found a forums post from Guy Hunkin of the Excel Power Query team in 2017 here, which gives the following summary:

Originally, Excel.Workbook used to read the entire workbook data to accurately assign types to the columns. This was EXTREMELY slow on large workbooks.

Having this in mind, we added this flag to delay the behavior. When set to “true”, we don’t infer any column types. Instead, the UI uses its normal inference algorithm to generate a Table.TransformColumnTypes step with the inferred types and if it inferred wrongly, the user can update the transformation explicitly.

I also posted a question on the Power Query MSDN forum and got some more details from Colin Banfield, one of the many M experts who hang out there, as well as doing some investigations of my own. This post summarises what I found out.

First of all, a simple demonstration of what it does. Consider the following Excel table:

image

Connecting to this table in the December 2018 release of Power BI Desktop generates the following M query:

let
    Source = 
    Excel.Workbook(
        File.Contents(
            "C:\DelayTypesTest.xlsx"
            ), 
            null, 
            true
            ),
    SourceData_Table = 
    Source{[Item="SourceData",Kind="Table"]}[Data],
    #"Changed Type" = 
    Table.TransformColumnTypes(
        SourceData_Table,
        {
            {"TextColumn", type text}, 
            {"IntegerColumn", Int64.Type}, 
            {"DecimalColumn", type number}, 
            {"DateColumn", type date}
            }
            )
in
    #"Changed Type"

Notice that the delayTypes option is now automatically set to true and that there is a “Changed Type” step. The output of this query, as seen in the Power Query Editor, is what you would expect:

image

Compare this with the output of the following query which has delayTypes set to false and has no “Changed Type” step:

let
    Source = 
    Excel.Workbook(
        File.Contents(
            "C:\DelayTypesTest.xlsx"
            ), 
            null, 
            false
            ),
    SourceData_Table = 
    Source{[Item="SourceData",Kind="Table"]}[Data]
in
    SourceData_Table

image

There are two things to notice:

  • With delayTypes set to false no “Changed Type” step is necessary, the data types on the columns are set by Excel.Workbook(). With delayTypes set to true all the columns returned by Excel.Workbook() have their data type set to Any and a subsequent “Changed Type” step is necessary to set data types to something useful.
  • With delayTypes set to true the IntegerColumn and the DecimalColumn columns have different data types assigned to them by the “ChangedType” step: IntegerColumn becomes Int64.Type whereas DecimalColumn becomes type number. With delayTypes set to false both columns are set to number. So it looks like Excel.Workbook with delayTypes set to false will only convert columns to primitive data types.

The second question is: if it is, primarily, a performance optimisation how much of an impact does it have? I built four queries that read all the data from an 80MB xlsx file to test the performance of all of the variations of delayTypes true/false and having a “Changed Type” step present or not (just in case there was an overhead to having a “Changed Type” step). Here are the results – the main timings are in minutes and seconds, and in brackets is the length of the “Evaluating” phase of query execution:

  delayTypes=false delayTypes=true
no “Changed Type” step 2:08
(1:21)
1:06
(0:00)
“Changed Type” present 2:19
(1:30)
1:08
(0:00)

I wouldn’t read too much into these exact timings because in my experience timings for the same query can fluctuate quite a lot, but one thing is clear: setting delayTypes to true results in much better performance. Indeed with delayTypes set to false nearly half the time was spent in the “Evaluating” phase whereas with delayTypes set to true there was no “Evaluating” phase at all. As a result I can’t think of a reason not to set the delayTypes option to true.

It’s worth noting that delayTypes is set to false by default if you don’t specify this option, so be aware of this if you are writing your own M code.

2 responses

  1. Ohhh man! I bet “Automatically Detect Column Types” will make it default. I’m going to check right now—this could be a game changer!

  2. Yes. I have noticed the time lag using false setting to delay type. I did not think twice about switching to true when I realized that it was much faster that way. But thank you for giving me an opportunity to revisit and understand what that really means. I bet that a number of updates that they have rolled out for the enhancement and improvements to the original code of accessing the data may have improved the speed of ‘Excel.Workbook’ which initially identified as a ‘time-consuming’ process. I guess the development team needs to regroup and reassess all the existing data accessing function to change the default value for parameters.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: