Some interesting new Power Query functionality was released in the December release of Power BI Desktop (it’s also in the build of Excel I’m running) that didn’t get announced in the blog post but which will be useful for anyone using Excel as a data source. It is fully documented but I thought I’d blog about it anyway to raise awareness.
It’s a new option on the Excel.Workbook function called InferSheetDimensions. Here’s what the docs for the Excel.Workbook function say:
Can be null or a logical (true/false) value indicating whether the area of a worksheet that contains data should be inferred by reading the worksheet itself, rather than by reading the dimensions metadata from the file. This can be useful in cases where the dimensions metadata is incorrect. Note that this option is only supported for Open XML Excel files, not for legacy Excel files. Default: false.
There’s also a much more detailed explanation on this page:
…in the section “Missing or incomplete Excel data” on how to troubleshoot issues where Power Query isn’t loading all the data from your worksheet.
Here’s a quick summary of the problem. Open XML format Excel workbooks (ie xlsx and xlsm format files) have some metadata (the “dimensions metadata” referred to in the docs) in them that acts as a kind of bounding box describing the range of cells in each worksheet that contain values. By default, Power Query uses this metadata when it gets data from an Excel worksheet – although not when it gets data from an Excel table or named range. Most of the time this metadata is reliable but in some cases, for example where an xlsx file has been generated by third-party software that has “export to Excel” functionality, it contains inaccurate information. This doesn’t affect Excel, which still loads the workbook, but it means that Power Query may not load all the data present in a worksheet. In these cases, setting InferSheetDimensions=true in Excel.Workbook causes Power Query to ignore the dimensions metadata and load all the data from the worksheet.
The docs also point out that if the dimension metadata returns a range that much larger than the actual populated range of cells in the worksheet, this can cause performance problems when loading data. While this can be fixed by manually editing the workbook as shown here, that’s not much help if you’re dealing with Excel files generated by buggy third-party software. Since Excel is already one of the slowest data sources you can use with Power Query, I guess this is another reason to use text formats like CSV with Power Query instead.