Handling Added Or Missing Columns In Power Query

Chris Webb Power Query February 26, 2015January 18, 2020 3 Minutes

A recent conversation in the comments of this blog post brought up the subject of how to handle columns that have either been removed from or added to a data source in Power Query. Anyone who has worked with csv files knows that they have a nasty habit of changing format even when they aren’t supposed to, and added or removed columns can cause all kinds of problems downstream.

Ken Puls (whose excellent blog you are probably already reading if you’re interested in Power Query) pointed out that it’s very easy to protect yourself against new columns in your data source. When creating a query, select all the columns that you want and then right-click and select Remove Other Columns:

This means that if any new columns are added to your data source in the future, they won’t appear in the output of your query. In the M code the Table.SelectColumns() function is used to do this.

Dealing with missing columns is a little bit more complicated. In order to find out whether a column is missing, first of all you’ll need a list of columns that should be present in your query. You can of course store these tables in a table in Excel and enter the column names manually, or you can do this in M fairly easily by creating a query that connects to your data source and using the Table.ColumnNames() function something like this:

[sourcecode language=”text” padlinenumbers=”true”]
let
//Connect to CSV file
Source = Csv.Document(
File.Contents(
"C:\Users\Chris\Documents\Power Query demos\SampleData.csv"
),null,",",null,1252),
//Use first row as headers
FirstRowAsHeader = Table.PromoteHeaders(Source),
//Get a list of column names
GetColumns = Table.ColumnNames(FirstRowAsHeader),
//Turn this list into a table
MakeATable = Table.FromList(
GetColumns,
Splitter.SplitByNothing(),
null,
null,
ExtraValues.Error),
//Rename this table’s sole column
RenamedColumns = Table.RenameColumns(
MakeATable ,
{{"Column1", "ColumnName"}})
in
RenamedColumns
[/sourcecode]

Given a csv file that looks like this:

…the query above returns the following table of column names:

You can then store the output of this query in an Excel table for future reference – just remember not to refresh the query!

Having done that, you can then look at the columns returned by your data source and compare them with the columns you are expecting by using the techniques shown in this post. For example, here’s a query that reads a list of column names from an Excel table and compares them with the columns returned from a csv file:

[sourcecode language=”text”]
let
//Connect to Excel table containing expected column names
ExcelSource = Excel.CurrentWorkbook(){[Name="GetColumnNames"]}[Content],
//Get list of expected columns
ExpectedColumns = Table.Column(ExcelSource, "ColumnName"),
//Connect to CSV file
CSVSource = Csv.Document(
File.Contents(
"C:\Users\Chris\Documents\Power Query demos\SampleData.csv"
),null,",",null,1252),
//Use first row as headers
FirstRowAsHeader = Table.PromoteHeaders(CSVSource),
//Get a list of column names in csv
CSVColumns = Table.ColumnNames(FirstRowAsHeader),
//Find missing columns
MissingColumns = List.Difference(ExpectedColumns, CSVColumns),
//Find added columns
AddedColumns = List.Difference(CSVColumns, ExpectedColumns),
//Report what has changed
OutputMissing = if List.Count(MissingColumns)=0 then
"No columns missing" else
"Missing columns: " & Text.Combine(MissingColumns, ","),
OutputAdded = if List.Count(AddedColumns)=0 then
"No columns added" else
"Added columns: " & Text.Combine(AddedColumns, ","),
Output = OutputMissing & " " & OutputAdded
in
Output
[/sourcecode]

Given a csv file that looks like this:

…and an Excel table like the one above containing the three column names Month, Product and Sales, the output of this query is:

It would be very easy to convert this query to a function that you could use to check the columns expected by multiple queries, and also to adapt the output to your own needs. Also, in certain scenarios (such as when you’re importing data from SQL Server) you might also want to check the data types used by the columns; I’ll leave that for another blog post though. In any case, data types aren’t so much of an issue with CSV files because it’s Power Query that imposes the types on the columns within a query, and any type conversion issues can be dealt with by Power Query’s error handling functionality (see Gerhard Brueckl’s post on this topic, for example).

You can download a workbook containing the two queries from this post here.

Published by Chris Webb

My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel. View all posts by Chris Webb

Published February 26, 2015January 18, 2020

20 thoughts on “Handling Added Or Missing Columns In Power Query”

Curt Hagenlocher says:

March 5, 2015 at 11:26 pm

If you have a set of columns (with associated ordering) that you want to impose on a table, you could do something like this:

Table.SetColumns = (table as table, columns as list) =>
let
names = Table.ColumnNames(table),
addedNames = List.RemoveItems(columns, names),
tmp = Text.NewGuid(),
added = Table.AddColumn(table, tmp, each []),
expanded = Table.ExpandRecordColumn(added, tmp, addedNames),
result = if List.IsEmpty(addedNames) then table else expanded,
reordered = Table.SelectColumns(result, columns)
in
reordered

Loading...

Reply
1. Curt Hagenlocher says:
  
  March 5, 2015 at 11:27 pm
  
  My kingdom for blogs (and Yammer!) that preserve spaces in code samples… 🙁
  
  Loading...
  
  Reply
2. Chris Webb says:
  
  March 8, 2015 at 5:44 pm
  
  Very useful, thanks Curt!
  
  Loading...
  
  Reply
Hank Johnson says:

April 24, 2015 at 3:53 pm

Hi Chris, I am just beginning to discover the Power Query possibilities but already got stuck up with some syntax… do You think You could take a look at what is a likely simple problem? stated here: http://www.excelguru.ca/forums/showthread.php?4385-Changing-the-individual-values-in-a-column-based-on-conditions-in-Power-Query
It would be of great help! Thanks, Hank.

Loading...

Reply
Pingback: The Table.Schema() Function In Power BI/M – Chris Webb's BI Blog
Pingback: The Table.Schema() Function In Power BI/M - SQL Server - SQL Server - Toad World
Llewellyn says:

September 14, 2016 at 2:57 pm

Hey guys. I have a bit of a problem and I hope you can help. So I’ve used the ‘Each Try’ function to drill down into data capture errors from a database imported from smartsheets. It works beautifully when run on my desktop version, but as soon as I publish it, the dashboards show blanks. I have tried everything and it still does not work published. Any help will be appreciated.

Loading...

Reply
1. Chris Webb says:
  
  September 14, 2016 at 4:10 pm
  
  When you say you are publishing your report, does that mean you’re using Power BI rather than Power Query/Excel?
  
  Loading...
  
  Reply
  1. llewholts123 says:
    
    September 14, 2016 at 5:28 pm
    
    That is correct. Development was done in Power BI Desktop and now I want to publish it to Power BI online. The query is a Power Query….query written in M.
    
    Loading...
  2. Chris Webb says:
    
    September 14, 2016 at 7:53 pm
    
    You may be running into the same problem I described here https://blog.crossjoin.co.uk/2016/08/23/web-contents-m-functions-and-dataset-refresh-errors-in-power-bi/ which is not restricted to Web.Contents() but any M query that uses a function
    
    Loading...
  3. Llewellyn says:
    
    September 14, 2016 at 9:04 pm
    
    While mine does not necessarily get stuck at a web-contents function, it could be related. Everything seems to be working fine, until I add columns with the ‘each try’ functions, after which I unpack the records with errors and what those errors are about, below is a sample of the code to do error reporting on badly or incorrectly entered date and numeric values:
    
    let
    Source = Smartsheet.Tables(),
    #”2126513840646020″ = Source{[Key=”2126513840646020″]}[Data],
    #”20494008248196″ = #”2126513840646020″{[Key=”20494008248196″]}[Data],
    #”Changed Type” = Table.TransformColumnTypes(#”2126513840646020″,{{“Actual Time of Arrival “”ATA”””, type date}, {“Original Document Date Received”, type date}, {“D/O Release Date”, type date}, {“Customs Entry Date”, type date}, {“Cleared Date”, type date}, {“Cycle Time Days”, Int64.Type}, {“Invoice/PO Value”, type number}, {“Demurrage & Storage”, type number}, {“Duties Paid”, type number}, {“Broker_Name”, type text}, {“Docs & OGA Missing”, type text}, {“After Decleration Defect Code”, type text}, {“Pre-Entry Defect Code”, type text}, {“Currency Code of Duties Paid”, type text}, {“Currency Code of Demurrage & Storage”, type text}, {“Currency Code of Invoice/PO Value”, type text}, {“Shipping Mode”, type text}, {“Sheet Name”, type text}, {“MAWB# / HAWB# / Courier# / Bill of Lading#”, type text}, {“PO#, ID#, CI#, Job#, PR#”, type text}, {“Gensuite ID”, type text}, {“Primary”, type text}, {“RowNumber”, Int64.Type}, {“Expected Time of Arrival “”ETA”””, type date}}),
    #”Kept Errors” = Table.SelectRowsWithErrors(#”Changed Type”),
    #”Added Custom” = Table.AddColumn(#”Kept Errors”, “ETA Error”, each try[#”Expected Time of Arrival “”ETA”””]),
    #”Added Custom1″ = Table.AddColumn(#”Added Custom”, “ATA Error”, each try[#”Actual Time of Arrival “”ATA”””]),
    #”Added Custom2″ = Table.AddColumn(#”Added Custom1″, “ODR Error”, each try[Original Document Date Received]),
    #”Added Custom3″ = Table.AddColumn(#”Added Custom2″, “Do Release Error”, each try[#”D/O Release Date”]),
    #”Added Custom4″ = Table.AddColumn(#”Added Custom3″, “Custom Entry Error”, each try[Customs Entry Date]),
    #”Added Custom5″ = Table.AddColumn(#”Added Custom4″, “Cleared Error”, each try[Cleared Date]),
    #”Added Custom6″ = Table.AddColumn(#”Added Custom5″, “Invoice/PO Error”, each try[#”Invoice/PO Value”]),
    #”Added Custom7″ = Table.AddColumn(#”Added Custom6″, “Demurrage & Storage Error”, each try[#”Demurrage & Storage”]),
    #”Added Custom8″ = Table.AddColumn(#”Added Custom7″, “Duties Paid Error”, each try[Duties Paid]),
    #”Removed Columns” = Table.RemoveColumns(#”Added Custom8″,{“PO#, ID#, CI#, Job#, PR#”, “MAWB# / HAWB# / Courier# / Bill of Lading#”, “Shipping Mode”, “Expected Time of Arrival “”ETA”””, “Actual Time of Arrival “”ATA”””, “Original Document Date Received”, “D/O Release Date”, “Customs Entry Date”, “Cleared Date”, “Cycle Time Days”, “Invoice/PO Value”, “Currency Code of Invoice/PO Value”, “Demurrage & Storage”, “Currency Code of Demurrage & Storage”, “Duties Paid”, “Currency Code of Duties Paid”, “Pre-Entry Defect Code”, “After Decleration Defect Code”, “Docs & OGA Missing”}),
    #”Expanded ETA Error” = Table.ExpandRecordColumn(#”Removed Columns”, “ETA Error”, {“HasError”, “Value”, “Error”}, {“ETA.HasError”, “ETA.Value”, “ETA.Error”}),
    #”Expanded ETA.Error” = Table.ExpandRecordColumn(#”Expanded ETA Error”, “ETA.Error”, {“Reason”, “Message”, “Detail”}, {“ETA.Error.Reason”, “ETA.Error.Message”, “ETA.Error.Detail”}),
    #”Expanded ATA Error” = Table.ExpandRecordColumn(#”Expanded ETA.Error”, “ATA Error”, {“HasError”, “Value”}, {“ATA Error.HasError”, “ATA Error.Value”}),
    
    More of the same unpacking for the other columns continue….
    
    Do you think the problem has to be with the way the data is called? Or perhaps something else? As long as I can get a table which shows me which date / numeric values were incorrectly entered, I’ll be a happy man!!
    
    Thanks for your help, much appreciated.
    
    Loading...
  4. Chris Webb says:
    
    September 14, 2016 at 9:15 pm
    
    This is strange – it’s not something I’ve seen before. Since “each” is actually shorthand for declaring a function, it wouldn’t surprise me if this was falling foul of the same issues that I talked about regarding functions and static analysis in that post I linked to.
    
    Loading...
Matt G. says:

October 14, 2016 at 7:45 pm

Have you ever seen query “refuse” to show a column? I have a csv file with 50 columns but I can’t get the query to see the last 4 columns. If I import the csv directly into excel, I see all 50 columns.

Loading...

Reply
1. Chris Webb says:
  
  October 14, 2016 at 11:04 pm
  
  No, I haven’t – when you load the data (either to the worksheet or the data model) are the columns still missing?
  
  Loading...
  
  Reply
  1. Matt G. says:
    
    October 15, 2016 at 12:58 am
    
    Yes, after I close the query editor and load, the columns are missing. If I create a new query, the columns show up. It’s very strange.
    
    Loading...
  2. Matt G. says:
    
    October 17, 2016 at 3:04 pm
    
    I found it! Under advanced editor, I had to manually change the column value in this section: [Delimiter=”,”, Columns=46, Encoding=1200, QuoteStyle=QuoteStyle.None])
    
    Loading...
Pingback: Chris Webb's BI Blog: The Table.Schema() Function In Power BI/M Chris Webb's BI Blog
Johan Arndrup says:

June 10, 2020 at 9:36 am

I’ve had problems importing all columns from Excel files, but this works every time 🙂

let
Source= Odbc.Query(“dsn=Excel Files;dbq=C:\FOLDER\ExcelFile.xlsx;defaultdir=C:\FOLDER;driverid=1046;maxbuffersize=2048;pagetimeout=5”, “SELECT * FROM `C:\FOLDER\ExcelFile.xlsx`.`’SheetName$’`as Sheet”)
in
Source

Loading...

Reply
ali says:

March 15, 2022 at 9:32 pm

hey chris i cannt seem to figure out why all the column ina table i pulled from sql server not showing up in power BI

Loading...

Reply
Anonymous says:

March 8, 2024 at 7:12 am

Thank you for this blog post! It helped me solve the issue for using the AzureStorage.DataLake connector in PBI/Power Query, to import Parquet files in a folder hierarchy with varying existence of added columns.

Loading...

Reply