Join Conditions In Power Query, Part 1

In last week’s post I showed how to create a simple LIKE function in Power Query which could be used in a calculated column. This week I’m going to show you how you can use this function in a condition when joining two tables together.

Consider the following two tables in an Excel worksheet:

image

Let’s say you want to join the Patterns table to the Phrases table, but only return the rows for each pattern where the Like() function returns true. Power Query has good support for different types of joins in the Table.Join() and Table.NestedJoin() functions but it isn’t immediately obvious how to handle join conditions such as this one.

Here’s the M code for a query that shows you how to do this:

let

    //Load Patterns table

    PatternsTable = Excel.CurrentWorkbook(){[Name="Patterns"]}[Content],

    //Load Phrases table

    PhrasesTable = Excel.CurrentWorkbook(){[Name="Phrases"]}[Content],

    //Add custom column to Patterns that returns the Phrases table for each row

    InsertedCustom = Table.AddColumn(PatternsTable, "AllPhrases", each PhrasesTable),

    //Expand the new column

    ExpandedTable = Table.ExpandTableColumn(InsertedCustom, "AllPhrases", {"Phrases"}, {"Phrases"}),

    //Filter the expanded table using the Like() function

    Custom1 = Table.SelectRows(ExpandedTable, each Like([Phrases],[Patterns]))

in

    Custom1

 

What this query does is the following:

  • Gets the data from both the Patterns table and the Phrases table. I’ve done this as two steps in the same query for simplicity; you might want to create two separate queries to do this.
  • On the Patterns table, add a new custom column that returns the entire Phrases table as a value. The Insert Custom Column dialog and its output will look like this:

  • Expands the new custom column so that every row of the Phrases table is displayed against every row of the Patterns table – in effect, it does a cross join between the two tables. Interestingly there is no option to do a cross join using the Table.Join() and Table.NestedJoin() functions.

  • Finally, it uses the Like() function in Table.SelectRows() as follows:

    Table.SelectRows(ExpandedTable, each Like([Phrases],[Patterns]))

    …to filter the table from the previous step so that you only get the rows back where the Like() function returns true and the pattern matches the phrase:

So… we have a solution to our problem (and you can download the example workbook here). However, be warned: for large tables this approach may not perform well! In part two I’ll show you another example of a complex join condition and show you how different approaches can yield very different performance.

22 thoughts on “Join Conditions In Power Query, Part 1

  1. Hi Chris, I didn’t know where to ask this since I couldn’t find any blogpost with similar subject, I have table where I want to replace certain text in one of the column with with another text. I know I can do this one by one by using Table.ReplaceValue() but is there a way I can do this more efficiently using another table which contains two columns, first contains the TextToReplace and 2nd Contains the Replacement text. I have about 50 texts to replace. I’ve been pulling hair trying to accomplish this using function that iterate but haven’t been successful so far. let me know if this is possible.

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      I’m sure it is possible, but I need to think about it! If I can do it I will write it up as a blog post.

    2. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      Actually, I’ve just solved the problem so if you’d like me to send you the workbook then drop me an email. You can find my contact details at http://www.crossjoin.co.uk

  2. Chris,
    Regarding your statement above:
    “Interestingly there is no option to do a cross join using the Table.Join() and Table.NestedJoin() functions.”

    Unless I’m mistaken, it seems you can leave the “key” arguments for both tables in the functions as empty lists “{}” in order to perform a cross join. For the NestedJoin, it seems the function will only add the join column if the 2nd table uses the Table.PrefixColumns function to prefix the column names, even if they’re not the same to begin with, which seems odd.

    Hope that helps.

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      Thanks Logan, that is very useful – I’ll check it out!

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      Yes, absolutely, you can join data from any data source to data from any other data source in Power Query before the result is loaded into Excel.

      1. Thanks so much- That sounds great. Once can then translate columns from say a very large text file, using an Excel sheet with the mapping, before drawing it into Excel. Which would save a lot of space.

  3. ricardodiazjimenez – World – I'm a person of big ambitions, but takes care of small details. As an entrepreneur I have learned to value people that surrounds me and invest in their wealth. Living in a big chaotic city has been an inspiration for ideas to make working from home a viable option. From some years, until now, I have discovered that sharing my knowledge is a way to feel good and at the same time, receive more. I have invested in a headhunting company specialized in information technology, an elearning content production company and an outsourcing administrative services company.
    ricardodiazjimenez says:

    Chris Hi, thanks for the article! Just one question, this line throws me an error: Table.SelectRows(ExpandedTable, each Like([Phrases],[Patterns]))
    The Like function changed or was deprecated?

  4. Hello Chris,
    I’m looking for solution for situation bellow.
    I have table with column with values. Want to duplicate this row same times as in column with value and increase value in another column
    As example:
    Initial table:
    Row 1 | 35 000 | 10 000 | 4
    Row 2 | 18 000 | 10 000 | 2
    Row 3 | 23 000 | 10 000 | 3
    Row 4 | 8 000 | 10 000 | 1
    Row 5 | 19 000 | 10 000 | 2

    Result Table
    Row 1 | 10 000 | 10 000 | 1
    Row 2 | 10 000 | 10 000 | 2
    Row 3 | 10 000 | 10 000 | 3
    Row 4 | 5 000 | 10 000 | 4
    Row 5 | 10 000 | 10 000 | 1
    Row 6 | 8 000 | 10 000 | 2
    Row 7 | 10 000 | 10 000 | 1
    Row 8 | 10 000 | 10 000 | 2
    Row 9 | 3 000 | 10 000 | 3
    Row 10 | 8 000 | 10 000 | 1
    Row 11 | 10 000 | 10 000 | 1
    Row 12 | 9 0000 | 10 000 | 2

    Do you think that will be possible to do that?

    Thank you in advance!
    Todor Todorov

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      It will be possible, but it will need some M and it won’t be easy. I’ll have to think about it…

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      What do you mean by “separate values” exactly?

  5. Hi Chris,

    I am trying to achieve conditional cross joint in M / Power Query.

    I have a Tasks table with following data:

    Task Name Start End
    A 01/08/2018 03/08/2018
    B 07/08/2018 17/08/2018
    C 21/08/2018 31/08/2018

    I have a Calendar table that spans 01/01/2018 to 31/12/2018.

    I wanted to get one line per day for a given task between Start and End Date.

    e.g.

    Calendar Date Task Name Start End
    01/08/2018 A 01/08/2018 03/08/2018
    02/08/2018 A 01/08/2018 03/08/2018
    03/08/2018 A 01/08/2018 03/08/2018

    I can do this in DAX by creating a new table with:

    Table =

    FILTER(

    CROSSJOIN(‘Project Calendar’, ‘Project Tasks’),

    ‘Project Tasks'[Start] = ‘Project Calendar'[Date]

    )

    How can I achieve the same using M / Power Query.

    I tried using Merge Queries but since I can only select one date field for Tasks table when merging with Calendar table, it does not give the desired results.

    Any help will be greatly appreciated.

    1. Chris Webb – My name is Chris Webb, and I work on the Fabric CAT team at Microsoft. I blog about Power BI, Power Query, SQL Server Analysis Services, Azure Analysis Services and Excel.
      Chris Webb says:

      My suggestion is not to do this as a Merge, but instead create a custom column in M that contains a nested table value, where each row contains a table that has one row for each date from the start date to the end date. You could then expand the column and get the number of rows you need.

Leave a Reply to Chris WebbCancel reply