Comparing Columns In Power Query

A few weeks ago I wrote a post about comparing the contents of entire tables in Power Query, and a question came up in the comments about how you might go about comparing values in columns rather than whole tables. Of course this prompted me to investigate how different types of comparison might be done – and here’s the blog post with the results of the investigation.

Consider the following two single-column tables in an Excel worksheet:

image

Which items are present in one column and not in the other? Which are present in both? The easiest way to answer these questions is to take each table and turn it into a List object (using Table.ToList() ); once you’ve done that you’ll find there are loads of really useful functions for this type of thing. Here’s a query that compares the values in each column:

let

    Source = Excel.CurrentWorkbook(){[Name="Source"]}[Content],

    Target = Excel.CurrentWorkbook(){[Name="Target"]}[Content],

    SourceList = Table.ToList(Source),

    TargetList = Table.ToList(Target),

    InSourceNotTarget = List.Difference(SourceList, TargetList),

    InTargetNotSource = List.Difference(TargetList, SourceList),

    InTargetAndSource = List.Intersect({SourceList, TargetList}),

    CombineWithComma = Combiner.CombineTextByDelimiter(", "),

    ResultsTable = Table.FromRows(

             {

         {"In Source but not in Target", CombineWithComma(InSourceNotTarget)},

             {"In Target but not in Source", CombineWithComma(InTargetNotSource)},

             {"In both Target and Source", CombineWithComma(InTargetAndSource)}

             },

             {"Comparison Type", "ListResult"}

             )

 

in

    ResultsTable

 

Here’s the output:

Fairly self-explanatory, I think. List.Difference() finds the items that are in one list and not another: List.Intersect() finds items that are in both. In fact it’s probably more interesting to look at how I’ve generated the output. Table.FromRows() returns a manually constructed table. The CombineWithComma step uses Combine.CombineTextByDelimiter() to return a function that turns all of the items in a list into a single, comma-delimited piece of text, and I then use that function inside each row of the table I’m returning to get a readable version of what List.Difference() and List.Intersect() return.

Rather than looking at the distinct values in each column, though, you might want to do a row-by-row comparison. Here’s another query that does that:

let

    Source = Excel.CurrentWorkbook(){[Name="Source"]}[Content],

    Target = Excel.CurrentWorkbook(){[Name="Target"]}[Content],

    SourceList = Table.ToList(Source),

    TargetList = Table.ToList(Target),

    PositionList = List.Positions(SourceList),

    RowComparison = List.Transform(

                      PositionList, 

                      each 

                       if 

                        SourceList{_}=TargetList{_} 

                       then 

                        {_+1, "No Change",  SourceList{_} } 

                       else 

                        {_+1, "Change", "Source: " & SourceList{_} & 

                        ", Target: " & TargetList{_} }),

    ResultsTable = Table.FromRows(RowComparison, {"Row Number", "Changed?", "Comparison"})

 

in

    ResultsTable

Here’s the output:

Again I’m turning each table into a list, and then I’m using List.Positions() to generate a list of integer values from 0 to 9 representing the index of each item in the source list, then using List.Transform() to iterate over each item in this list and compare the values at the given index in the source and target list.

Frankly, an even easier way of doing this might have been to import both tables in separate queries, add an index column to both of them using the Insert Index Column button, then join the two tables together on the index column using the Merge button and then finally create some custom columns to do the comparison. This is certainly how any end-user would do it, but the resulting code is a bit less elegant I didn’t learn anything interesting about M from doing it that way. I’ve left the example in the demo workbook, which you can download here.

PS Even if you have voted for me already in the Power BI competition, please vote for me again (you can vote once every 24 hours)! Here’s the link:

CLICK HERE TO VOTE FOR ME IN THE POWER BI COMPETITION! EVEN IF YOU’VE VOTED ALREADY YOU CAN DO SO AGAIN!

30 thoughts on “Comparing Columns In Power Query

  1. The more I see of the power of these PowerBI tools, the more disappointed I am that that the functionality is not accessible through VBA.

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      I know, it would be really useful to have this. I guess it’s on the “to-do” list, though probably quite low down.

  2. Hi Chris,
    Your example is amazing and I have tried to repro it with 2 multi-column tables. I’m not able to specify in my formula ToList, the 2 columns I want to compare (or maybe it’s in the source).
    Any idea to help me ?

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      Can you post the code for your query?

      1. Sure, here it is
        let
        Source = Excel.CurrentWorkbook(){[Name=”Table_Employee”]}[Content],
        Target = Excel.CurrentWorkbook(){[Name=”Table_BML”]}[Content],
        SourceList = Table.ToList(Source),
        TargetList = Table.ToList(Target),
        InSourceNotTarget = List.Difference(SourceList, TargetList),
        InTargetNotSource = List.Difference(TargetList, SourceList),
        InTargetAndSource = List.Intersect({SourceList, TargetList}),
        CombineWithComma = Combiner.CombineTextByDelimiter(“, “),
        ResultsTable = Table.FromRows(
        {
        {“In Source but not in Target”, CombineWithComma(InSourceNotTarget)},
        {“In Target but not in Source”, CombineWithComma(InTargetNotSource)},
        {“In both Target and Source”, CombineWithComma(InTargetAndSource)}
        },
        {“Comparison Type”, “ListResult”}
        )

        in
        ResultsTable

        I want to compare 2 columns with phone number. In the first table the column name is ‘Phone Number’ and in the second table it’s ‘Caller’

        By the way, what is your Wp Add-in name to log with Facebook before to post a comment. It’s pretty cool 🙂

      2. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
        Chris Webb says:

        I think the problem is that you are using Table.ToList to convert your whole table to a list. What you need to do is Table.Column(Source, “Phone Number”) to get a list that contains only the values from the column in the table you want.

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      Maybe it’s easier if you email it to me – you can find my contact details at http://www.crossjoin.co.uk

  3. Not sure what Frederic ended up doing to compare his phone number columns within two tables, but I was able to get Table.Column to work this way:

    Source = Excel.CurrentWorkbook(){[Name=”MySourceTable”]}[Content],
    Target = Excel.CurrentWorkbook(){[Name=”MyTargetTable”]}[Content],
    SourceList = Table.Column(Source, “phone_number”),
    TargetList = Table.Column(Target, “phone_number”),

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      I sorted out Frederic’s problem offline – he was getting errors because his phone numbers were (unsurprisingly) numeric, and this broke the code I was using to construct the comma-delimited list for the output.

  4. Hi Chris, I am trying to use power query for ETL in my school project and found your articles very helpful, Thanks! I am stuck at a particular problem and not sure how to solve it. In my source data, I have a column with amounts. I have to rank them as lowest as Rank 1. This has to be done grouped by area code. I want to do this by adding a custom column in power query and applying the above logic to get the values in it. Do you think this is possible? And if yes, how? Would appreciate any help.

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      Yes, this sounds possible. It might be easier if you send me a workbook with your source data in and I can show you how to write the code. You can find my contact details at http://www.crossjoin.co.uk

  5. Hi Chris, thank you for your post. I am starting to use Power Query. This is very cool.
    I try to compare 2 list of employees to define the new employees (ON), the employees that left (OFF) by using your approach. I think my set up of data is missing something to make the formula working.
    Could you please help?
    Cheers mate,
    GC

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      Can you provide more details about what’s going wrong? Maybe a simple example of the data, and the code you’re using?

  6. How can we get value of a cell in a column after applying some filter on any other column in the same row?
    Also is there any built in function to calculate running total and if not then how can we calculate it

  7. Thank you for your great post!
    May I ask a question? in your code:
    CombineWithComma = Combiner.CombineTextByDelimiter(“, “),

    “CombineWithComma” is a variable , but at code :
    {“In Source but not in Target”, CombineWithComma(InSourceNotTarget)},

    “CombineWithComma” is used as a function with a parameter “InSourceNotTarget”,
    why we can use like this?

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      On this line:
      CombineWithComma = Combiner.CombineTextByDelimiter(“, “),
      I’m declaring a variable called CombineWithComma, and giving it the value that Combiner.CombineTextByDelimiter(“, “) returns – and that value is a value of type function. So that means, when I say:
      CombineWithComma(InSourceNotTarget)
      …I’m calling the function originally returned by Combiner.CombineTextByDelimiter(“, “).

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      Can you give me an example of what you want to do here?

  8. Hi Chris,

    Great Read! Im just wondering if this could be done with comparing rows?
    Basically i have list of attempt code in at least 6 columns (for now), and i need to have this compared to all other code per line to check how many combination have been used 6 times, 5 times, etc.

    sample data:
    Att1 Att2 Att3 Att4 Att5 Att6
    A B C D E F
    A B C D E H
    A K L O Q S

    result: line 1: equals 2 on 5 duplicates (line 1 and line 2); 0 on other duplicates
    line 2: same as line
    line 3: 3 on 1 duplicate (counting line 1, 2 and 3); 0 on other duplicates

    is this a possibility?

    another option Im thinking, is create the possible 6, 5, etc combinations and count against each other. the problem is similar as i need to merge the columns of the combination, then count per row.
    also, the one i was able to do using unpivoting, grouping by and sum resulted in a very slow calculation and memory error.

    Thanks for any feed back.

  9. Hi Chris, I’m getting an error when trying to download the demo file (although I did notice the original post was 2014, so some time ago!). I haven’t been able to find anything similar elsewhere though, so am keen to to look at the row-by-row comparison options. I’m not sure if this is what Luke meant when he said ‘file is not working’ but I suspect it may be.

Leave a Reply to SgCancel reply