Comparing Columns In Power Query

A few weeks ago I wrote a post about comparing the contents of entire tables in Power Query, and a question came up in the comments about how you might go about comparing values in columns rather than whole tables. Of course this prompted me to investigate how different types of comparison might be done – and here’s the blog post with the results of the investigation.

Consider the following two single-column tables in an Excel worksheet:

image

Which items are present in one column and not in the other? Which are present in both? The easiest way to answer these questions is to take each table and turn it into a List object (using Table.ToList() ); once you’ve done that you’ll find there are loads of really useful functions for this type of thing. Here’s a query that compares the values in each column:

let

    Source = Excel.CurrentWorkbook(){[Name="Source"]}[Content],

    Target = Excel.CurrentWorkbook(){[Name="Target"]}[Content],

    SourceList = Table.ToList(Source),

    TargetList = Table.ToList(Target),

    InSourceNotTarget = List.Difference(SourceList, TargetList),

    InTargetNotSource = List.Difference(TargetList, SourceList),

    InTargetAndSource = List.Intersect({SourceList, TargetList}),

    CombineWithComma = Combiner.CombineTextByDelimiter(", "),

    ResultsTable = Table.FromRows(

             {

         {"In Source but not in Target", CombineWithComma(InSourceNotTarget)},

             {"In Target but not in Source", CombineWithComma(InTargetNotSource)},

             {"In both Target and Source", CombineWithComma(InTargetAndSource)}

             },

             {"Comparison Type", "ListResult"}

             )

 

in

    ResultsTable

 

Here’s the output:

image

Fairly self-explanatory, I think. List.Difference() finds the items that are in one list and not another: List.Intersect() finds items that are in both. In fact it’s probably more interesting to look at how I’ve generated the output. Table.FromRows() returns a manually constructed table. The CombineWithComma step uses Combine.CombineTextByDelimiter() to return a function that turns all of the items in a list into a single, comma-delimited piece of text, and I then use that function inside each row of the table I’m returning to get a readable version of what List.Difference() and List.Intersect() return.

Rather than looking at the distinct values in each column, though, you might want to do a row-by-row comparison. Here’s another query that does that:

let

    Source = Excel.CurrentWorkbook(){[Name="Source"]}[Content],

    Target = Excel.CurrentWorkbook(){[Name="Target"]}[Content],

    SourceList = Table.ToList(Source),

    TargetList = Table.ToList(Target),

    PositionList = List.Positions(SourceList),

    RowComparison = List.Transform(

                      PositionList, 

                      each 

                       if 

                        SourceList{_}=TargetList{_} 

                       then 

                        {_+1, "No Change",  SourceList{_} } 

                       else 

                        {_+1, "Change", "Source: " & SourceList{_} & 

                        ", Target: " & TargetList{_} }),

    ResultsTable = Table.FromRows(RowComparison, {"Row Number", "Changed?", "Comparison"})

 

in

    ResultsTable

Here’s the output:

image

Again I’m turning each table into a list, and then I’m using List.Positions() to generate a list of integer values from 0 to 9 representing the index of each item in the source list, then using List.Transform() to iterate over each item in this list and compare the values at the given index in the source and target list.

Frankly, an even easier way of doing this might have been to import both tables in separate queries, add an index column to both of them using the Insert Index Column button, then join the two tables together on the index column using the Merge button and then finally create some custom columns to do the comparison. This is certainly how any end-user would do it, but the resulting code is a bit less elegant I didn’t learn anything interesting about M from doing it that way. I’ve left the example in the demo workbook, which you can download here.

PS Even if you have voted for me already in the Power BI competition, please vote for me again (you can vote once every 24 hours)! Here’s the link:

CLICK HERE TO VOTE FOR ME IN THE POWER BI COMPETITION! EVEN IF YOU’VE VOTED ALREADY YOU CAN DO SO AGAIN!

17 thoughts on “Comparing Columns In Power Query

  1. Hi Chris,
    Your example is amazing and I have tried to repro it with 2 multi-column tables. I’m not able to specify in my formula ToList, the 2 columns I want to compare (or maybe it’s in the source).
    Any idea to help me ?

      • Sure, here it is
        let
        Source = Excel.CurrentWorkbook(){[Name=”Table_Employee”]}[Content],
        Target = Excel.CurrentWorkbook(){[Name=”Table_BML”]}[Content],
        SourceList = Table.ToList(Source),
        TargetList = Table.ToList(Target),
        InSourceNotTarget = List.Difference(SourceList, TargetList),
        InTargetNotSource = List.Difference(TargetList, SourceList),
        InTargetAndSource = List.Intersect({SourceList, TargetList}),
        CombineWithComma = Combiner.CombineTextByDelimiter(“, “),
        ResultsTable = Table.FromRows(
        {
        {“In Source but not in Target”, CombineWithComma(InSourceNotTarget)},
        {“In Target but not in Source”, CombineWithComma(InTargetNotSource)},
        {“In both Target and Source”, CombineWithComma(InTargetAndSource)}
        },
        {“Comparison Type”, “ListResult”}
        )

        in
        ResultsTable

        I want to compare 2 columns with phone number. In the first table the column name is ‘Phone Number’ and in the second table it’s ‘Caller’

        By the way, what is your Wp Add-in name to log with Facebook before to post a comment. It’s pretty cool :)

      • I think the problem is that you are using Table.ToList to convert your whole table to a list. What you need to do is Table.Column(Source, “Phone Number”) to get a list that contains only the values from the column in the table you want.

  2. Not sure what Frederic ended up doing to compare his phone number columns within two tables, but I was able to get Table.Column to work this way:

    Source = Excel.CurrentWorkbook(){[Name=”MySourceTable”]}[Content],
    Target = Excel.CurrentWorkbook(){[Name=”MyTargetTable”]}[Content],
    SourceList = Table.Column(Source, “phone_number”),
    TargetList = Table.Column(Target, “phone_number”),

    • I sorted out Frederic’s problem offline – he was getting errors because his phone numbers were (unsurprisingly) numeric, and this broke the code I was using to construct the comma-delimited list for the output.

  3. Hi Chris, I am trying to use power query for ETL in my school project and found your articles very helpful, Thanks! I am stuck at a particular problem and not sure how to solve it. In my source data, I have a column with amounts. I have to rank them as lowest as Rank 1. This has to be done grouped by area code. I want to do this by adding a custom column in power query and applying the above logic to get the values in it. Do you think this is possible? And if yes, how? Would appreciate any help.

  4. Hi Chris, thank you for your post. I am starting to use Power Query. This is very cool.
    I try to compare 2 list of employees to define the new employees (ON), the employees that left (OFF) by using your approach. I think my set up of data is missing something to make the formula working.
    Could you please help?
    Cheers mate,
    GC

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s