Using List.Generate() To Make Multiple Replacements Of Words In Text In Power Query

Recently I had a request for help from someone who wanted to do the following in Power Query: take a piece of text and then, using a table, search for all of the occurrences of the words in one column of the table in the text and replace those words with those in the other column. So, for example, given these two tables in Excel:

image

You want to take the table on the left and for each piece of text replace the words in the ‘Word To Replace’ column of the right-hand table with those in the ‘Replace With’ column of the right-hand table. The output would therefore be:

image

An interesting challenge in itself, and one I solved first of all using a recursive function. Here’s some code showing how I did it:

let

    //Get table of word replacements

    Replacements = Excel.CurrentWorkbook(){[Name="Replacements"]}[Content],

    //Get table containing text to change

    TextToChange = Excel.CurrentWorkbook(){[Name="Text"]}[Content],

    //Get a list of all words to replace

    WordsToReplace = Table.Column(Replacements, "Word To Replace"),

    //Get a list of all words to replace with

    WordsToReplaceWith = Table.Column(Replacements, "Replace With"),

    //Recursive function to do the replacement

    ReplacementFunction = (InputText, Position)=> 

    let 

     //Use Text.Replace to do each replace

     ReplaceText = Text.Replace(

            InputText, 

            WordsToReplace{Position}, 

            WordsToReplaceWith{Position})

    in

     //If we have reached the end of the list of replacements

     if Position=List.Count(WordsToReplace)-1 

      then 

      //return the output of the query

      ReplaceText 

      else 

      //call the function again

      @ReplacementFunction(ReplaceText, Position+1),

    //Add a calculated column to call the function on every row in the table

    //containing text to change

    Output = Table.AddColumn(TextToChange, "Changed Text", each ReplacementFunction([Text], 0))

    

in

    Output

 

It does the job, but… after thinking about this some more, I wondered if there was a better way. A lot of my recent Power Query blog posts have used recursive functions, but are they a Good Thing? So I asked on the forum, and as usual the nice people on the Power Query dev team answered very promptly (that’s one of the things I like about the Power Query dev team – they engage with their users). Recursive functions are indeed something that should be avoided if there is an alternative, and in this case List.Generate() can be used instead. Here’s how:

let

    //Get table of word replacements

    Replacements = Excel.CurrentWorkbook(){[Name="Replacements"]}[Content],

    //Get table containing text to change

    TextToChange = Excel.CurrentWorkbook(){[Name="Text"]}[Content],

    //Get list of words to replace

    WordsToReplace = Table.Column(Replacements, "Word To Replace"),

    //Get list of words to replace them with

    WordsToReplaceWith = Table.Column(Replacements, "Replace With"),

    //A non-recursive function to do the replacements

    ReplacementFunction = (InputText)=> 

     let

       //Use List.Generate() to do the replacements

       DoReplacement = List.Generate(

                          ()=> [Counter=0, MyText=InputText], 

                          each [Counter]<=List.Count(WordsToReplaceWith), 

                          each [Counter=[Counter]+1, 

                                MyText=Text.Replace(

                                         [MyText], 

                                         WordsToReplace{[Counter]}, 

                                         WordsToReplaceWith{[Counter]})], 

                          each [MyText]),

       //Return the last item in the list that

       //List.Generate() returns

       GetLastValue = List.Last(DoReplacement)

     in

      GetLastValue,

    //Add a calculated column to call the function on every row in the table

    //containing the text to change

    Output = Table.AddColumn(TextToChange, "Changed Text", each ReplacementFunction([Text]))

in

    Output

 

List.Generate() is a very powerful function indeed, albeit one that took me a while to understand properly. It’s a bit like a FOR loop even if it’s a function that returns a list. Here’s what each of the parameters I’m passing to the function in the example above do:

  •  ()=> [Counter=0, MyText=InputText] returns a function that itself returns a record (a record is a bit like a table with just one row in it). The record contains two fields: Counter, which has the value 0, and MyText which is given the value of the text where the values are to be replaced. This record is the initial value that List.Generate() will modify at each iteration.
  • each [Counter]<=List.Count(WordsToReplaceWith) returns a function too. An each expression is a quick way of declaring a function that takes one, unnamed parameter, and in this case the value that will be passed to this parameter is a record of the same structure as the one declared in the previous bullet. The expression [Counter] gets the value of the Counter field from that record. The function returns a boolean value, true when the value in the [Counter] field of the record is less than or equal to the number of items in the list of words to replace. List.Generate() returns a list, and while this function returns true it will keep on iterating and adding new items to the list it returns.
  • each [Counter=[Counter]+1, MyText=Text.Replace([MyText], WordsToReplace{[Counter]}, WordsToReplaceWith{[Counter]})] returns yet another function, once again declared using an each expression. The function here takes the record from the current iteration and returns the record to be used at the next iteration: a record where the value of the Counter field is increased by one, and where the value of the MyText field has one word replaced. The word that gets replaced in MyText is the word in the (zero-based) row number given by Counter in the ‘Word To Replace’ column; this word is replaced by the word in the row number given by Counter in the ‘Replace With’ column.
  • each [MyText] returns a very simple function, one that returns the value from the MyText field of the record from the current iteration. It’s the value that this function returns that is added to the list returned by List.Generate() at every iteration.

To illustrate this, here’s a simplified example showing how List.Generate() works in this case:

let

    WordsToReplace = {"cat", "dog", "mat"},

    WordsToReplaceWith = {"fish", "snake", "ground"},

    Demo = List.Generate(

                          ()=> [Counter=0, MyText="the cat and the dog sat on the mat"], 

                          each [Counter]<=List.Count(WordsToReplaceWith), 

                          each [Counter=[Counter]+1, 

                                MyText=Text.Replace(

                                         [MyText], 

                                         WordsToReplace{[Counter]}, 

                                         WordsToReplaceWith{[Counter]})], 

                          each [MyText])

 

in

    Demo

 

The output of this query is the list:

image

This list can be written as (with the words changed at each iteration highlighted):

{“the cat and the dog sat on the mat”, “the fish and the dog sat on the mat”,  “the fish and the snake sat on the mat”, “the fish and the snake sat on the ground”}

So, another useful function to know about. I’m slowly getting to grips with all this functional programming!

You can download the sample workbook here.

47 thoughts on “Using List.Generate() To Make Multiple Replacements Of Words In Text In Power Query

  1. another brilliant way to do these replacements.. however so far I’ve used the recursive method in many workbook and saw no issue…is there specific problem with it?… what about doing replace only when entire cell matches and not just sub-string without using Table.Transform() to replace existing column entirely.

    1. Recursion can be a beast – although this looks like recursion in a neatly controlled wrapper Iwas usually doing basically the same thing with recursion except usually escaping at the end state rather than returning the whole tree that got me there like this function does. Great option though it definately seems better controlled than some of the recursive implementations I’ve seen and created.

  2. Hi Chris, Many thanks for another awesome blog (& your book too). I was just wondering if there is a limit to the number of records that can be loaded into Excel when using List.Generate? Following the download example I created a new workbook but with the source (text) table containing multiple columns and about 50,000 rows. I created a replacements table with about 10 replacement rows. Each of the 50,000 “text” rows has no more than 2 replacements. In the Power Query window it all works perfectly and makes the replacements for all 50,000 rows but when I try and load to worksheet (Excel 2010, 32 bit) it displays “100 rows loaded” in the “Workbook Queries” pane with the processing icon looping around but it never finishes. Only 100 rows appear in the excel worksheet, the 101st row displays “…” in each of the columns. Maximum cache is 4,096 MB & currently used cache is only 23.7 MB. Any ideas what might be preventing it from loading all rows?

    1. I had the same issue but the memory used was 2GB and memory for caching was 4GB. After pressing “Clear cache” on “Query options” -> “Data load” I refreshed the query and the xls worksheet contained more than 100 rows.

  3. Hi Chris!

    I came to a fact that this function is much slower than just a few replacement functions from UI…

    It really crashes my performance. =(

    Do you know any other alternatives for multiple replacements?

      1. A little late to the party, and I’m not sure if it would help with performance (the List.Buffer trick mentioned in other comments probably does more for that), but the code can be made simpler by automatically traversing the list of replacements with List.Accumulate:

        let
        Replacements = List.Buffer(Table.ToRows(Excel.CurrentWorkbook(){[Name=”Replacements”]}[Content])),
        TextToChange = Excel.CurrentWorkbook(){[Name=”Text”]}[Content],
        ReplacementFunction = each List.Accumulate(Replacements, _, (t, r) => Text.Replace(t, r{0}, r{1})),
        Output = Table.AddColumn(TextToChange, “Changed Text”, each ReplacementFunction([Text]))
        in
        Output

  4. This function is awesome – thank you so much for sharing!
    Can now do recursive calculations with a couple of hundred-k rows (simple math operations). But key to this is to replace “Table.Column… with List.Buffer… (thanks to Bill Szysz for turning this into the light!)

    This example with text replacements does 20k rows in 2 mins:

    old: WordsToReplace = Table.Column(Replacements, “Word To Replace”),
    new: WordsToReplace = List.Buffer(Replacements[Word To Replace]),

    (same with WordsToReplaceWith)

    Thank you so much!

  5. Imke,

    Were your 20K rows unique, and/or was your replacement list very long? I tried 20K rows, but copied the rows in the original table multiple times. Also, I used the original replacement list. In Excel 2016, it took under 1 sec to run the query. Both the recursion and the generate versions were about equal in execution time. Recursion will never be a problem unless you build up a large stack, and even then, you can make the recursion iterative, thereby using no stack space at all.

    Since Replacements[Word To Replace] is merely a shortcut for Table.Column(….), it should make no difference at all to the execution time. List.Buffer (for both lists) makes ALL the difference.

  6. cbanfield,
    sorry, just recognized your comment/question now.

    Yes my replacement list was very long (5k) – had sth like real translations in mind when trying this out
    (actually, was only trying to point out the positive effect of List.Buffer here).

  7. Hi Chris,

    I took your code above and modify for use on my case. However, it generate error.
    No sure if you have any clues of where lies the issue with my code below. Appreciate your help.

    let
    LookupSearchTable = Excel.CurrentWorkbook(){[Name=”Special_IPP_Programs”]}[Content],
    TableToBeModified = Excel.CurrentWorkbook(){[Name=”PowerQueryLoad_ForChecking”]}[Content],
    TextToSearch = Table.Column (LookupSearchTable,”Special IPP Program Keywords”),
    IPP_Program_Classification_1 = Table.Column (LookupSearchTable,”Special IPP Program”),
    IPP_MID_Only = Table.SelectRows (TableToBeModified,each[cc_MID_Classification]=”IPP Installment Plan”),
    IPP_AutoClassify_Function = (InputText)=>
    let
    SearchKeyWordAndClassify = List.Generate(
    ()=> [Counter=0,MyText=InputText,SearchResult=false,SearchResultText=null],
    each [Counter]<=List.Count(TextToSearch),
    each [Counter=[Counter]+1,
    SearchResult = Text.Contains(
    [MyText],
    TextToSearch{[Counter]})],
    each
    if [SearchResult]=true then [SearchResultText]=IPP_Program_Classification_1 else [SearchResultText]=null),
    GetPositiveSearch = List.First(List.Select(SearchKeyWordAndClassify,each [SearchResultText] null))
    in
    GetPositiveSearch,
    Output = Table.AddColumn (IPP_MID_Only,”IPP Classification”,each IPP_AutoClassify_Function([Program Description]))
    in
    Output

    1. Hi, Tommy…

      Troubleshooting your code is much more difficult without a clear statement of what the code is supposed to do, along with what error it produces. I also had to reverse engineer the structure of your input tables from the code. Please keep these things in mind when asking for help.

      That said, I believe your code is supposed to select the rows from PowerQueryLoad_ForChecking where cc_MID_Classification = “IPP Installment Plan” and add an “IPP Classification” column with the value of “Special IPP Program” from the first row of Special_IPP_Programs whose “Special IPP Program Keywords” value is contained in the “Program Description”.

      If this is correct, then the following code should do the job. The first and last few lines are the same as your code. Rather than using List.Generate to build a new list with the results of checking against each row of the lookup table, we just filter the lookup table directly for those rows that match, then select the first match – with a default provided to avoid an Error when there is no match – and return the desired value.

      let
      LookupSearchTable = Excel.CurrentWorkbook(){[Name=”Special_IPP_Programs”]}[Content],
      TableToBeModified = Excel.CurrentWorkbook(){[Name=”PowerQueryLoad_ForChecking”]}[Content],
      IPP_MID_Only = Table.SelectRows(TableToBeModified, each [cc_MID_Classification] = “IPP Installment Plan”),
      IPP_AutoClassify_Function = (desc) =>
      let
      Default = [Special IPP Program=”Not Found”],
      Match = each Text.Contains(desc, [Special IPP Program Keywords])
      in
      Table.First(Table.SelectRows(LookupSearchTable, Match), Default)[Special IPP Program],
      Output = Table.AddColumn(IPP_MID_Only, “IPP Classification”, each IPP_AutoClassify_Function([Program Description]))
      in
      Output

      1. Hi,

        My apology for the blunt request and extra effort put in to r-engineer my request.
        Yes. You are right. The object of the code is exactly what you elaborate:
        ” to select the rows from PowerQueryLoad_ForChecking where cc_MID_Classification = “IPP Installment Plan” and add an “IPP Classification” column with the value of “Special IPP Program” from the first row of Special_IPP_Programs whose “Special IPP Program Keywords” value is contained in the “Program Description”. ”

        I tried the code. however error prompted : “[Expression.Error] The field ‘Special IPP Program Keywords’ of the record wasn’t found.”

      2. Hi, Tommy…

        The only place that field name appears in the code is in the “Match” function, which takes a record argument and returns whether the named field’s value is contained in the current “Program Description”.

        The “Match” function is used in just one place, a Table.SelectRows call which applies it to each row of LookupSearchTable and returns only those rows for which it returns true. Click on the step that defines LookupSearchTable. If you don’t see a field with that exact name (case and whitespace matter!), then you need to either fix the table or the code.

        If it looks like the field is there, with the right name, try applying an operation on it, like changing its type. That will generate a line of code with what Power Query thinks the name is so you can copy and paste it into the Match function, then get rid of the temporary step.

  8. Thanks for a great demo. I had started down the road of a recursive function call but discovered your demo of List.Generate here which is brilliant and hopefully more performant!

    The List.Generate implementation I show below ‘works’, but I have a question about how I have implemented.

    The function (with arguments φ,M) starts with two values, φ=φ0 and M=0. Then I call the transformation function calculating new values of both until N-N0-M

    let
    φ_in = inputlist{0},
    M = inputlist{1},
    φ_working = (N-N0-M)/(a*F0) + φ_in,

    Ma = (1 + n + (5/4)*n2 + (5/4)*n3) * (φ_working -φ0),
    Mb = (3*n + 3*n*n + (21/8)*n3) * Number.Sin(φ_working -φ0) * Number.Cos(φ_working +φ0),
    Mc = ((15/8)*n2 + (15/8)*n3) * Number.Sin(2*(φ_working -φ0)) * Number.Cos(2*(φ_working+φ0)),
    Md = (35/24)*n3 * Number.Sin(3*(φ_working-φ0)) * Number.Cos(3*(φ_working +φ0)),
    M_working = b * F0 * (Ma – Mb + Mc – Md), // meridional arc

    Return = { φ_working,M_working}
    in
    Return,

    DoFunction = List.Generate(()=>[InputList={φ0,0}],each (N-N0-[InputList]{1})>= 0.00001,each [InputList=calcfunc({[InputList]{0},[InputList]{1}})], each [InputList]),

    φ_out= List.Last(DoFunction){0},
    M_out= List.Last(DoFunction){1},

    1. The function (with arguments φ,M) starts with two values, φ=φ0 and M=0. Then I call the transformation function calculating new values of both until N-N0-M < 0.0001.

      I decided to use a list as an argument for the function because I am passing in two values, they are both being transformed and both being passed back in as the new arguments for the subsequent function call. It seems a little verbose, just wondering if someone can let me know if they can see a simpler way of expressing the function with two separate arguments (φ,M) and calling with List.Generate? Or was using a list the right approach?

      I'm posting here because it was this post that inspired me to tackle List.Generate. It seemed a great way of implementing a sort of do-while loop using M !

      1. This hasn’t gone well… sorry. A line of code got chopped at the top of my code above, just before the first “let”

        calcfunc = (inputlist) as list =>

  9. If you have several column headers, and you want to remove the first 4 characters from the column header’s name, is there a script for this in Power Query, so that you don’t have to manually remove the characters from each individual column?

    1. Hi Lenzy, Table.RenameColumns may work for you here. Try this.

      let

      //Define some example table
      Source = Table.FromColumns({{0..10},{20..30},{30..40},{50..60}}, “ABCDColumn1″,”WXYZColumn2″,”Column3″,”Column4”}),

      //Extract the names of all of the columns of your table into a list
      ExtractColumnNames = Table.ColumnNames(Source),

      //Convert list to table so we can transform individual column names
      ColumnNamesTable = Table.FromList(ExtractColumnNames, Splitter.SplitByNothing(), null, null, ExtraValues.Error),

      //Add a new column that, based on some “if then else” logic, removes the first four characters from only those old column names you require
      AddedCustom = Table.AddColumn(ColumnNamesTable, “NewColumnNames”, each if Text.Length([Column1]) > 7 then Text.Range([Column1],4) else [Column1]),

      //Create a list of rename pairs i.e. {Old name, New Name}
      RenamePairs = Table.ToRows(AddedCustom),

      //Use Table.RenameColumns
      RenamedColumns = Table.RenameColumns(Source,RenamePairs)
      in
      RenamedColumns

  10. Any thoughts on using this as HTML parser?

    There is probably a better way to do this. Also I haven’t figured out a way to replace tags with attributes. (maybe keeping some for things like a href)

    (TextString as any)=>

    let
    WordsToReplace = {“”, “”, “”, “”,” “, “"”,”_x000D_”,”>”,”<“,””,””,””,””,””},
    WordsToReplaceWith = {“”,”#(lf)”,””,””,” “,”‘”,”#(cr)”,”>”,” [Counter=0, MyText=Text.From(TextString)],
    each [Counter]<=List.Count(WordsToReplaceWith),
    each [Counter=[Counter]+1,
    MyText=Text.Replace(
    [MyText],
    WordsToReplace{[Counter]},
    WordsToReplaceWith{[Counter]})],
    each [MyText])
    #"Kept Last Items" = List.LastN(ResultText, 1),
    CleanText = #"Kept Last Items"{0}
    in
    CleanText

  11. Hi, I think you should change that

    each [Counter]<=List.Count(WordsToReplaceWith)

    to
    each [Counter] [Counter=0, MyText=”117″],
    each [Counter]<=List.Count(WordsToReplaceWith),
    each [Counter=[Counter]+1,
    MyText=Text.Replace(
    [MyText],
    WordsToReplace{[Counter]},
    WordsToReplaceWith{[Counter]})],
    each [MyText])
    in
    Demo

    Then try again without the equal sign.

    1. I can’t change my last comment, I was trying to say that you should remove the equal sign from the second argument of List.Generate:

      each [Counter] [Counter=0, MyText=”117″],
      each [Counter]<=List.Count(WordsToReplaceWith),
      each [Counter=[Counter]+1,
      MyText=Text.Replace(
      [MyText],
      WordsToReplace{[Counter]},
      WordsToReplaceWith{[Counter]})],
      each [MyText])

      in
      Demo

      Then try it again without that equal sign.

      1. There’s also some bug with this comment system, it’s removing part of my comment, maybe there were too many spaces within my code. Lets try again:

        let
        WordsToReplace = {“117”, “217”},
        WordsToReplaceWith = {“117/217”, “117/217″},
        Demo = List.Generate(
        ()=> [Counter=0, MyText=”117”],
        each [Counter]<=List.Count(WordsToReplaceWith),
        each [Counter=[Counter]+1,
        MyText=Text.Replace(
        [MyText],
        WordsToReplace{[Counter]},
        WordsToReplaceWith{[Counter]})],
        each [MyText])

        in
        Demo

  12. Using List.Accumulate() function is another way

    let
    Text= Excel.CurrentWorkbook(){[Name=”Text”]}[Content],
    Replacements=Excel.CurrentWorkbook(){[Name=”Replacements”]}[Content],
    OutPut=Table.AddColumn(
    Text,”Changed Text”,
    each List.Accumulate(
    List.Zip({Replacements[Word To Replace],Replacements[Replace With]}),
    [Text],(x,y)=>Text.Replace(x,y{0},y{1})
    )
    )
    in
    OutPut

  13. Hi Chris. Looking forward to your upcoming presentation to the Wellington PowerBI User Group that I co-run with Phil Seamark.

    Question: Do you think List.Accumulate or List.Generate can handle the states of multiple input and output tables? Or a clever construction of it?

    I’m trying to reconcile data from two different entities regarding movement of shipping containers, by doing a bulletproof fuzzy match on dates. My hard-coded approach involves doing an inner join on the two data sources, removing matches (but keeping a record of them), and then doing subsequent inner joins on the remaining unmatched records. This requires 3 + 3X queries, where X is the amount of days I want to ultimately widen my matching to encompass. At the end of the process I have a table giving me the matches (with a ‘Date Offset’ column telling me how many days apart they were) and two tables of unmatched records.

    Doing this recursively would radically reduce my code base. I’ll post a link to a related forum question in a moment that outlines the challenge…I’d love to hear your thoughts as to whether you believe this is possible.

    1. It *might* work, but without working through the problem I can’t say for sure; I generally try to avoid recursion (though I wouldn’t call List.Accumulate or List.Generate proper recursion) in M because the performance can be bad. All I can say is to try it and see…

      1. Damn…my post at the community site got marked as Spam for some reason, and it’s no longer visible. Hopefully they will put it back up soon. It’s an interesting question – really it’s about finding the “nearest neighbor” across a large dataset. My existing approach – i.e. where I don’t use recursion – also raises other interesting questions around where exactly to use Table.Buffer to speed things up.

  14. Well, I’m definitely late to the party. A great thing about the find/replace functionality in both Excel and VBA is the ability to use wildcards. I’m not seeing that with M. Am I missing something? In VBA, I use a find replace vlookup with an asterisk wildcard to help clean junk from data at the end of downloaded banking transactions. In your example, for me it would be great if you could have “the cat sat on the mat” replaced with just “the cat” by having the cat* as the find text and a blank as the replacement. Thank you.

Leave a Reply to Fun with List.Generate() – Tips and producing a Date Range – Excel and Power BICancel reply