Handling Added Or Missing Columns In Power Query

A recent conversation in the comments of this blog post brought up the subject of how to handle columns that have either been removed from or added to a data source in Power Query. Anyone who has worked with csv files knows that they have a nasty habit of changing format even when they aren’t supposed to, and added or removed columns can cause all kinds of problems downstream.

Ken Puls (whose excellent blog you are probably already reading if you’re interested in Power Query) pointed out that it’s very easy to protect yourself  against new columns in your data source. When creating a query, select all the columns that you want and then right-click and select Remove Other Columns:

image

This means that if any new columns are added to your data source in the future, they won’t appear in the output of your query. In the M code the Table.SelectColumns() function is used to do this.

Dealing with missing columns is a little bit more complicated. In order to find out whether a column is missing, first of all you’ll need a list of columns that should be present in your query. You can of course store these tables in a table in Excel and enter the column names manually, or you can do this in M fairly easily by creating a query that connects to your data source and using the Table.ColumnNames() function something like this:

let
    //Connect to CSV file
    Source = Csv.Document(
                      File.Contents(
                       "C:\Users\Chris\Documents\Power Query demos\SampleData.csv"
                      ),null,",",null,1252),
    //Use first row as headers
    FirstRowAsHeader = Table.PromoteHeaders(Source),
    //Get a list of column names
    GetColumns = Table.ColumnNames(FirstRowAsHeader),
    //Turn this list into a table
    MakeATable = Table.FromList(
                              GetColumns, 
                              Splitter.SplitByNothing(), 
                              null, 
                              null, 
                              ExtraValues.Error),
    //Rename this table's sole column
    RenamedColumns = Table.RenameColumns(
                                         MakeATable ,
                                         {{"Column1", "ColumnName"}})
in
    RenamedColumns 

Given a csv file that looks like this:

image

…the query above returns the following table of column names:

image

You can then store the output of this query in an Excel table for future reference – just remember not to refresh the query!

Having done that, you can then look at the columns returned by your data source and compare them with the columns you are expecting by using the techniques shown in this post. For example, here’s a query that reads a list of column names from an Excel table and compares them with the columns returned from a csv file:

let
    //Connect to Excel table containing expected column names
    ExcelSource = Excel.CurrentWorkbook(){[Name="GetColumnNames"]}[Content],
    //Get list of expected columns
    ExpectedColumns = Table.Column(ExcelSource, "ColumnName"),
    //Connect to CSV file
    CSVSource = Csv.Document(
                      File.Contents(
                       "C:\Users\Chris\Documents\Power Query demos\SampleData.csv"
                      ),null,",",null,1252),
    //Use first row as headers
    FirstRowAsHeader = Table.PromoteHeaders(CSVSource),
    //Get a list of column names in csv
    CSVColumns = Table.ColumnNames(FirstRowAsHeader),
    //Find missing columns
    MissingColumns = List.Difference(ExpectedColumns, CSVColumns),
    //Find added columns
    AddedColumns = List.Difference(CSVColumns, ExpectedColumns),
    //Report what has changed
    OutputMissing = if List.Count(MissingColumns)=0 then
                     "No columns missing" else
                     "Missing columns: " & Text.Combine(MissingColumns, ","),
    OutputAdded = if List.Count(AddedColumns)=0 then
                     "No columns added" else
                     "Added columns: " & Text.Combine(AddedColumns, ","),
    Output = OutputMissing & "   " & OutputAdded
in
    Output

Given a csv file that looks like this:

image

…and an Excel table like the one above containing the three column names Month, Product and Sales, the output of this query is:

image

It would be very easy to convert this query to a function that you could use to check the columns expected by multiple queries, and also to adapt the output to your own needs. Also, in certain scenarios (such as when you’re importing data from SQL Server) you might also want to check the data types used by the columns; I’ll leave that for another blog post though. In any case, data types aren’t so much of an issue with CSV files because it’s Power Query that imposes the types on the columns within a query, and any type conversion issues can be dealt with by Power Query’s error handling functionality (see Gerhard Brueckl’s post on this topic, for example).

You can download a workbook containing the two queries from this post here.

16 thoughts on “Handling Added Or Missing Columns In Power Query

  1. If you have a set of columns (with associated ordering) that you want to impose on a table, you could do something like this:

    Table.SetColumns = (table as table, columns as list) =>
    let
    names = Table.ColumnNames(table),
    addedNames = List.RemoveItems(columns, names),
    tmp = Text.NewGuid(),
    added = Table.AddColumn(table, tmp, each []),
    expanded = Table.ExpandRecordColumn(added, tmp, addedNames),
    result = if List.IsEmpty(addedNames) then table else expanded,
    reordered = Table.SelectColumns(result, columns)
    in
    reordered

  2. Hey guys. I have a bit of a problem and I hope you can help. So I’ve used the ‘Each Try’ function to drill down into data capture errors from a database imported from smartsheets. It works beautifully when run on my desktop version, but as soon as I publish it, the dashboards show blanks. I have tried everything and it still does not work published. Any help will be appreciated.

      • That is correct. Development was done in Power BI Desktop and now I want to publish it to Power BI online. The query is a Power Query….query written in M.

      • While mine does not necessarily get stuck at a web-contents function, it could be related. Everything seems to be working fine, until I add columns with the ‘each try’ functions, after which I unpack the records with errors and what those errors are about, below is a sample of the code to do error reporting on badly or incorrectly entered date and numeric values:

        let
        Source = Smartsheet.Tables(),
        #”2126513840646020″ = Source{[Key=”2126513840646020″]}[Data],
        #”20494008248196″ = #”2126513840646020″{[Key=”20494008248196″]}[Data],
        #”Changed Type” = Table.TransformColumnTypes(#”2126513840646020″,{{“Actual Time of Arrival “”ATA”””, type date}, {“Original Document Date Received”, type date}, {“D/O Release Date”, type date}, {“Customs Entry Date”, type date}, {“Cleared Date”, type date}, {“Cycle Time Days”, Int64.Type}, {“Invoice/PO Value”, type number}, {“Demurrage & Storage”, type number}, {“Duties Paid”, type number}, {“Broker_Name”, type text}, {“Docs & OGA Missing”, type text}, {“After Decleration Defect Code”, type text}, {“Pre-Entry Defect Code”, type text}, {“Currency Code of Duties Paid”, type text}, {“Currency Code of Demurrage & Storage”, type text}, {“Currency Code of Invoice/PO Value”, type text}, {“Shipping Mode”, type text}, {“Sheet Name”, type text}, {“MAWB# / HAWB# / Courier# / Bill of Lading#”, type text}, {“PO#, ID#, CI#, Job#, PR#”, type text}, {“Gensuite ID”, type text}, {“Primary”, type text}, {“RowNumber”, Int64.Type}, {“Expected Time of Arrival “”ETA”””, type date}}),
        #”Kept Errors” = Table.SelectRowsWithErrors(#”Changed Type”),
        #”Added Custom” = Table.AddColumn(#”Kept Errors”, “ETA Error”, each try[#”Expected Time of Arrival “”ETA”””]),
        #”Added Custom1″ = Table.AddColumn(#”Added Custom”, “ATA Error”, each try[#”Actual Time of Arrival “”ATA”””]),
        #”Added Custom2″ = Table.AddColumn(#”Added Custom1″, “ODR Error”, each try[Original Document Date Received]),
        #”Added Custom3″ = Table.AddColumn(#”Added Custom2″, “Do Release Error”, each try[#”D/O Release Date”]),
        #”Added Custom4″ = Table.AddColumn(#”Added Custom3″, “Custom Entry Error”, each try[Customs Entry Date]),
        #”Added Custom5″ = Table.AddColumn(#”Added Custom4″, “Cleared Error”, each try[Cleared Date]),
        #”Added Custom6″ = Table.AddColumn(#”Added Custom5″, “Invoice/PO Error”, each try[#”Invoice/PO Value”]),
        #”Added Custom7″ = Table.AddColumn(#”Added Custom6″, “Demurrage & Storage Error”, each try[#”Demurrage & Storage”]),
        #”Added Custom8″ = Table.AddColumn(#”Added Custom7″, “Duties Paid Error”, each try[Duties Paid]),
        #”Removed Columns” = Table.RemoveColumns(#”Added Custom8″,{“PO#, ID#, CI#, Job#, PR#”, “MAWB# / HAWB# / Courier# / Bill of Lading#”, “Shipping Mode”, “Expected Time of Arrival “”ETA”””, “Actual Time of Arrival “”ATA”””, “Original Document Date Received”, “D/O Release Date”, “Customs Entry Date”, “Cleared Date”, “Cycle Time Days”, “Invoice/PO Value”, “Currency Code of Invoice/PO Value”, “Demurrage & Storage”, “Currency Code of Demurrage & Storage”, “Duties Paid”, “Currency Code of Duties Paid”, “Pre-Entry Defect Code”, “After Decleration Defect Code”, “Docs & OGA Missing”}),
        #”Expanded ETA Error” = Table.ExpandRecordColumn(#”Removed Columns”, “ETA Error”, {“HasError”, “Value”, “Error”}, {“ETA.HasError”, “ETA.Value”, “ETA.Error”}),
        #”Expanded ETA.Error” = Table.ExpandRecordColumn(#”Expanded ETA Error”, “ETA.Error”, {“Reason”, “Message”, “Detail”}, {“ETA.Error.Reason”, “ETA.Error.Message”, “ETA.Error.Detail”}),
        #”Expanded ATA Error” = Table.ExpandRecordColumn(#”Expanded ETA.Error”, “ATA Error”, {“HasError”, “Value”}, {“ATA Error.HasError”, “ATA Error.Value”}),

        More of the same unpacking for the other columns continue….

        Do you think the problem has to be with the way the data is called? Or perhaps something else? As long as I can get a table which shows me which date / numeric values were incorrectly entered, I’ll be a happy man!!

        Thanks for your help, much appreciated.

      • This is strange – it’s not something I’ve seen before. Since “each” is actually shorthand for declaring a function, it wouldn’t surprise me if this was falling foul of the same issues that I talked about regarding functions and static analysis in that post I linked to.

  3. Have you ever seen query “refuse” to show a column? I have a csv file with 50 columns but I can’t get the query to see the last 4 columns. If I import the csv directly into excel, I see all 50 columns.

      • Yes, after I close the query editor and load, the columns are missing. If I create a new query, the columns show up. It’s very strange.

      • I found it! Under advanced editor, I had to manually change the column value in this section: [Delimiter=”,”, Columns=46, Encoding=1200, QuoteStyle=QuoteStyle.None])

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s