Using List.Generate() To Make Multiple Replacements Of Words In Text In Power Query

Recently I had a request for help from someone who wanted to do the following in Power Query: take a piece of text and then, using a table, search for all of the occurrences of the words in one column of the table in the text and replace those words with those in the other column. So, for example, given these two tables in Excel:

image

You want to take the table on the left and for each piece of text replace the words in the ‘Word To Replace’ column of the right-hand table with those in the ‘Replace With’ column of the right-hand table. The output would therefore be:

image

An interesting challenge in itself, and one I solved first of all using a recursive function. Here’s some code showing how I did it:

let

    //Get table of word replacements

    Replacements = Excel.CurrentWorkbook(){[Name="Replacements"]}[Content],

    //Get table containing text to change

    TextToChange = Excel.CurrentWorkbook(){[Name="Text"]}[Content],

    //Get a list of all words to replace

    WordsToReplace = Table.Column(Replacements, "Word To Replace"),

    //Get a list of all words to replace with

    WordsToReplaceWith = Table.Column(Replacements, "Replace With"),

    //Recursive function to do the replacement

    ReplacementFunction = (InputText, Position)=> 

    let 

     //Use Text.Replace to do each replace

     ReplaceText = Text.Replace(

            InputText, 

            WordsToReplace{Position}, 

            WordsToReplaceWith{Position})

    in

     //If we have reached the end of the list of replacements

     if Position=List.Count(WordsToReplace)-1 

      then 

      //return the output of the query

      ReplaceText 

      else 

      //call the function again

      @ReplacementFunction(ReplaceText, Position+1),

    //Add a calculated column to call the function on every row in the table

    //containing text to change

    Output = Table.AddColumn(TextToChange, "Changed Text", each ReplacementFunction([Text], 0))

    

in

    Output

 

It does the job, but… after thinking about this some more, I wondered if there was a better way. A lot of my recent Power Query blog posts have used recursive functions, but are they a Good Thing? So I asked on the forum, and as usual the nice people on the Power Query dev team answered very promptly (that’s one of the things I like about the Power Query dev team – they engage with their users). Recursive functions are indeed something that should be avoided if there is an alternative, and in this case List.Generate() can be used instead. Here’s how:

let

    //Get table of word replacements

    Replacements = Excel.CurrentWorkbook(){[Name="Replacements"]}[Content],

    //Get table containing text to change

    TextToChange = Excel.CurrentWorkbook(){[Name="Text"]}[Content],

    //Get list of words to replace

    WordsToReplace = Table.Column(Replacements, "Word To Replace"),

    //Get list of words to replace them with

    WordsToReplaceWith = Table.Column(Replacements, "Replace With"),

    //A non-recursive function to do the replacements

    ReplacementFunction = (InputText)=> 

     let

       //Use List.Generate() to do the replacements

       DoReplacement = List.Generate(

                          ()=> [Counter=0, MyText=InputText], 

                          each [Counter]<=List.Count(WordsToReplaceWith), 

                          each [Counter=[Counter]+1, 

                                MyText=Text.Replace(

                                         [MyText], 

                                         WordsToReplace{[Counter]}, 

                                         WordsToReplaceWith{[Counter]})], 

                          each [MyText]),

       //Return the last item in the list that

       //List.Generate() returns

       GetLastValue = List.Last(DoReplacement)

     in

      GetLastValue,

    //Add a calculated column to call the function on every row in the table

    //containing the text to change

    Output = Table.AddColumn(TextToChange, "Changed Text", each ReplacementFunction([Text]))

in

    Output

 

List.Generate() is a very powerful function indeed, albeit one that took me a while to understand properly. It’s a bit like a FOR loop even if it’s a function that returns a list. Here’s what each of the parameters I’m passing to the function in the example above do:

  •  ()=> [Counter=0, MyText=InputText] returns a function that itself returns a record (a record is a bit like a table with just one row in it). The record contains two fields: Counter, which has the value 0, and MyText which is given the value of the text where the values are to be replaced. This record is the initial value that List.Generate() will modify at each iteration.
  • each [Counter]<=List.Count(WordsToReplaceWith) returns a function too. An each expression is a quick way of declaring a function that takes one, unnamed parameter, and in this case the value that will be passed to this parameter is a record of the same structure as the one declared in the previous bullet. The expression [Counter] gets the value of the Counter field from that record. The function returns a boolean value, true when the value in the [Counter] field of the record is less than or equal to the number of items in the list of words to replace. List.Generate() returns a list, and while this function returns true it will keep on iterating and adding new items to the list it returns.
  • each [Counter=[Counter]+1, MyText=Text.Replace([MyText], WordsToReplace{[Counter]}, WordsToReplaceWith{[Counter]})] returns yet another function, once again declared using an each expression. The function here takes the record from the current iteration and returns the record to be used at the next iteration: a record where the value of the Counter field is increased by one, and where the value of the MyText field has one word replaced. The word that gets replaced in MyText is the word in the (zero-based) row number given by Counter in the ‘Word To Replace’ column; this word is replaced by the word in the row number given by Counter in the ‘Replace With’ column.
  • each [MyText] returns a very simple function, one that returns the value from the MyText field of the record from the current iteration. It’s the value that this function returns that is added to the list returned by List.Generate() at every iteration.

To illustrate this, here’s a simplified example showing how List.Generate() works in this case:

let

    WordsToReplace = {"cat", "dog", "mat"},

    WordsToReplaceWith = {"fish", "snake", "ground"},

    Demo = List.Generate(

                          ()=> [Counter=0, MyText="the cat and the dog sat on the mat"], 

                          each [Counter]<=List.Count(WordsToReplaceWith), 

                          each [Counter=[Counter]+1, 

                                MyText=Text.Replace(

                                         [MyText], 

                                         WordsToReplace{[Counter]}, 

                                         WordsToReplaceWith{[Counter]})], 

                          each [MyText])

 

in

    Demo

 

The output of this query is the list:

image

This list can be written as (with the words changed at each iteration highlighted):

{“the cat and the dog sat on the mat”, “the fish and the dog sat on the mat”,  “the fish and the snake sat on the mat”, “the fish and the snake sat on the ground”}

So, another useful function to know about. I’m slowly getting to grips with all this functional programming!

You can download the sample workbook here.

23 thoughts on “Using List.Generate() To Make Multiple Replacements Of Words In Text In Power Query

  1. another brilliant way to do these replacements.. however so far I’ve used the recursive method in many workbook and saw no issue…is there specific problem with it?… what about doing replace only when entire cell matches and not just sub-string without using Table.Transform() to replace existing column entirely.

    • Recursion can be a beast – although this looks like recursion in a neatly controlled wrapper Iwas usually doing basically the same thing with recursion except usually escaping at the end state rather than returning the whole tree that got me there like this function does. Great option though it definately seems better controlled than some of the recursive implementations I’ve seen and created.

  2. Hi Chris, Many thanks for another awesome blog (& your book too). I was just wondering if there is a limit to the number of records that can be loaded into Excel when using List.Generate? Following the download example I created a new workbook but with the source (text) table containing multiple columns and about 50,000 rows. I created a replacements table with about 10 replacement rows. Each of the 50,000 “text” rows has no more than 2 replacements. In the Power Query window it all works perfectly and makes the replacements for all 50,000 rows but when I try and load to worksheet (Excel 2010, 32 bit) it displays “100 rows loaded” in the “Workbook Queries” pane with the processing icon looping around but it never finishes. Only 100 rows appear in the excel worksheet, the 101st row displays “…” in each of the columns. Maximum cache is 4,096 MB & currently used cache is only 23.7 MB. Any ideas what might be preventing it from loading all rows?

  3. Hi Chris!

    I came to a fact that this function is much slower than just a few replacement functions from UI…

    It really crashes my performance. =(

    Do you know any other alternatives for multiple replacements?

      • A little late to the party, and I’m not sure if it would help with performance (the List.Buffer trick mentioned in other comments probably does more for that), but the code can be made simpler by automatically traversing the list of replacements with List.Accumulate:

        let
        Replacements = List.Buffer(Table.ToRows(Excel.CurrentWorkbook(){[Name=”Replacements”]}[Content])),
        TextToChange = Excel.CurrentWorkbook(){[Name=”Text”]}[Content],
        ReplacementFunction = each List.Accumulate(Replacements, _, (t, r) => Text.Replace(t, r{0}, r{1})),
        Output = Table.AddColumn(TextToChange, “Changed Text”, each ReplacementFunction([Text]))
        in
        Output

  4. […] Post of Chris Webb inspired me on creation of alternative way to make replacement in table column using separate table with list of Old & New values. Say, more scalable solution. Because List.Generate, used by Chris, and cell by cell replacement is very slow method. Another Chris’s post helped me with it. The idea is to use recursion, go line by line of Replacement table and apply Table.ReplaceValue function. […]

  5. This function is awesome – thank you so much for sharing!
    Can now do recursive calculations with a couple of hundred-k rows (simple math operations). But key to this is to replace “Table.Column… with List.Buffer… (thanks to Bill Szysz for turning this into the light!)

    This example with text replacements does 20k rows in 2 mins:

    old: WordsToReplace = Table.Column(Replacements, “Word To Replace”),
    new: WordsToReplace = List.Buffer(Replacements[Word To Replace]),

    (same with WordsToReplaceWith)

    Thank you so much!

  6. Imke,

    Were your 20K rows unique, and/or was your replacement list very long? I tried 20K rows, but copied the rows in the original table multiple times. Also, I used the original replacement list. In Excel 2016, it took under 1 sec to run the query. Both the recursion and the generate versions were about equal in execution time. Recursion will never be a problem unless you build up a large stack, and even then, you can make the recursion iterative, thereby using no stack space at all.

    Since Replacements[Word To Replace] is merely a shortcut for Table.Column(….), it should make no difference at all to the execution time. List.Buffer (for both lists) makes ALL the difference.

  7. cbanfield,
    sorry, just recognized your comment/question now.

    Yes my replacement list was very long (5k) – had sth like real translations in mind when trying this out
    (actually, was only trying to point out the positive effect of List.Buffer here).

  8. Hi Chris,

    I took your code above and modify for use on my case. However, it generate error.
    No sure if you have any clues of where lies the issue with my code below. Appreciate your help.

    let
    LookupSearchTable = Excel.CurrentWorkbook(){[Name=”Special_IPP_Programs”]}[Content],
    TableToBeModified = Excel.CurrentWorkbook(){[Name=”PowerQueryLoad_ForChecking”]}[Content],
    TextToSearch = Table.Column (LookupSearchTable,”Special IPP Program Keywords”),
    IPP_Program_Classification_1 = Table.Column (LookupSearchTable,”Special IPP Program”),
    IPP_MID_Only = Table.SelectRows (TableToBeModified,each[cc_MID_Classification]=”IPP Installment Plan”),
    IPP_AutoClassify_Function = (InputText)=>
    let
    SearchKeyWordAndClassify = List.Generate(
    ()=> [Counter=0,MyText=InputText,SearchResult=false,SearchResultText=null],
    each [Counter]<=List.Count(TextToSearch),
    each [Counter=[Counter]+1,
    SearchResult = Text.Contains(
    [MyText],
    TextToSearch{[Counter]})],
    each
    if [SearchResult]=true then [SearchResultText]=IPP_Program_Classification_1 else [SearchResultText]=null),
    GetPositiveSearch = List.First(List.Select(SearchKeyWordAndClassify,each [SearchResultText] null))
    in
    GetPositiveSearch,
    Output = Table.AddColumn (IPP_MID_Only,”IPP Classification”,each IPP_AutoClassify_Function([Program Description]))
    in
    Output

    • Hi, Tommy…

      Troubleshooting your code is much more difficult without a clear statement of what the code is supposed to do, along with what error it produces. I also had to reverse engineer the structure of your input tables from the code. Please keep these things in mind when asking for help.

      That said, I believe your code is supposed to select the rows from PowerQueryLoad_ForChecking where cc_MID_Classification = “IPP Installment Plan” and add an “IPP Classification” column with the value of “Special IPP Program” from the first row of Special_IPP_Programs whose “Special IPP Program Keywords” value is contained in the “Program Description”.

      If this is correct, then the following code should do the job. The first and last few lines are the same as your code. Rather than using List.Generate to build a new list with the results of checking against each row of the lookup table, we just filter the lookup table directly for those rows that match, then select the first match – with a default provided to avoid an Error when there is no match – and return the desired value.

      let
      LookupSearchTable = Excel.CurrentWorkbook(){[Name=”Special_IPP_Programs”]}[Content],
      TableToBeModified = Excel.CurrentWorkbook(){[Name=”PowerQueryLoad_ForChecking”]}[Content],
      IPP_MID_Only = Table.SelectRows(TableToBeModified, each [cc_MID_Classification] = “IPP Installment Plan”),
      IPP_AutoClassify_Function = (desc) =>
      let
      Default = [Special IPP Program=”Not Found”],
      Match = each Text.Contains(desc, [Special IPP Program Keywords])
      in
      Table.First(Table.SelectRows(LookupSearchTable, Match), Default)[Special IPP Program],
      Output = Table.AddColumn(IPP_MID_Only, “IPP Classification”, each IPP_AutoClassify_Function([Program Description]))
      in
      Output

      • Hi,

        My apology for the blunt request and extra effort put in to r-engineer my request.
        Yes. You are right. The object of the code is exactly what you elaborate:
        ” to select the rows from PowerQueryLoad_ForChecking where cc_MID_Classification = “IPP Installment Plan” and add an “IPP Classification” column with the value of “Special IPP Program” from the first row of Special_IPP_Programs whose “Special IPP Program Keywords” value is contained in the “Program Description”. ”

        I tried the code. however error prompted : “[Expression.Error] The field ‘Special IPP Program Keywords’ of the record wasn’t found.”

      • Hi, Tommy…

        The only place that field name appears in the code is in the “Match” function, which takes a record argument and returns whether the named field’s value is contained in the current “Program Description”.

        The “Match” function is used in just one place, a Table.SelectRows call which applies it to each row of LookupSearchTable and returns only those rows for which it returns true. Click on the step that defines LookupSearchTable. If you don’t see a field with that exact name (case and whitespace matter!), then you need to either fix the table or the code.

        If it looks like the field is there, with the right name, try applying an operation on it, like changing its type. That will generate a line of code with what Power Query thinks the name is so you can copy and paste it into the Match function, then get rid of the temporary step.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s