Implementing A Basic LIKE/Wildcard Search Function In Power Query

Last week someone asked me whether it was possible to do the equivalent of a SQL LIKE filter in Power Query. Unfortunately there isn’t a function to do this in the standard library but, as always, it is possible to write some M code to do this. Here’s what I came up while I was waiting around at the stables during my daughter’s horse-riding lesson. At the moment it only supports the % wildcard character; also I can’t guarantee that it’s the most efficient implementation or indeed 100% bug-free, but it seems to work fine as far as I can see…

let

    Like = (Phrase as text, Pattern as text) => 

let

    //Split pattern up into a list using % as a delimiter

    PatternList = Text.Split(Pattern, "%"),

    //if the first character in the pattern is %

    //then the first item in the list is an empty string

    StartsWithWc = (List.First(PatternList)=""),

    //if the last character in the pattern is %

    //then the last item in the list is an empty string

    EndsWithWc = (List.Last(PatternList)=""),

    //if the first character is not %

    //then we have to match the first string in the pattern

    //with the opening characters of the phrase

    StartsTest = if (StartsWithWc=false) 

       then Text.StartsWith(Phrase, List.First(PatternList)) 

       else true,

    //if the last item is not %

    //then we have to match the final string in the pattern

    //with the final characters of the phrase

    EndsText = if (EndsWithWc=false) 

       then Text.EndsWith(Phrase, List.Last(PatternList)) 

       else true,

    //now we also need to check that each string in the pattern appears 

    //in the correct order in the phrase

    //and to do this we need to declare a function PhraseFind

    PhraseFind = (Phrase as text, SearchString as list) =>

    let

     //does the first string in the pattern appear in the phrase?

     StringPos = Text.PositionOf(Phrase, SearchString{0}, Occurrence.First),

     PhraseFindOutput = 

                 if

                 //if string not find then return false 

                 (StringPos=-1) 

                 then false 

                 else if

                 //we have found the string in the pattern, and

                 //if this is the last string in the pattern, return true

                 List.Count(SearchString)=1

                 then true

                 else

                 //if it isn't the last string in the pattern

                 //test the next string in the pattern by removing

                 //the first string from the pattern list

                 //and all text up to and including the string we have found in the phrase

                 (true and

                 @PhraseFind(

                 Text.RemoveRange(Phrase, 0, StringPos + Text.Length(SearchString{0})),

                 List.RemoveRange(SearchString, 0, 1)))

     in

      PhraseFindOutput,

    //return true if we have passed all tests    

    Output = StartsTest and EndsText and PhraseFind(Phrase, PatternList) 

in

    Output

in

    Like

 

Using the following test data:

image

I can run the following query:

let

    Source = Excel.CurrentWorkbook(){[Name="Phrases"]}[Content],

    ChangedType = Table.TransformColumnTypes(Source,{{"Phrases", type text}}),

    InsertedCustom = Table.AddColumn(ChangedType, "Test", each Like([Phrases],"%cat%sat%mat%"))

in

    InsertedCustom

 

And get this output:

image

You can download the sample workbook here.

I know the Power Query team have been asked for this several times already, but it would be really useful if we could package up functions like this and make it easy to share them publicly with other Power Query users…

23 responses

  1. Pingback: Dew Drop – May 28, 2014 (#1785) | Morning Dew

  2. Pingback: Join Conditions In Power Query, Part 1 | Chris Webb's BI Blog

  3. Pingback: Join Conditions In Power Query, Part 1 - SQL Server - SQL Server - Toad World

  4. Pingback: daily 05/26/2015 | Cshonea's Blog

  5. Nice solution, but is it possible for your “Like = (Phrase as text, Pattern as text) =>” function to use two different data sources for the inputs? This seems like a partial solution to the SQL like function.

  6. Hi Chris, really excellent read.

    I’m testing a variation of this, but am hitting a wall. I’m trying to take a table of terms to see if any of them match anywhere in the text strings of a column in Power Query. Using your Like Query, I can match to one term, but not multiple terms (and only exact match to the string, not broad matching to any part of the string).

    Have you ever tried to solve this approach?

    The pertinent code working (for single term exact, but not broad matching): Table.AddColumn(#”Renamed Columns”, “Brand”, each Like([Phrase],”%car%”))

    And not working for multiple phrases (and also not broad matching, of course): Table.AddColumn(#”Renamed Columns”, “Brand”, each Like([Phrase],”%car%red car%”))

    Really love the blog!

    C

  7. Chris – Please show where the argument for PhraseFile function SearchString is obtained? I can’t see it. Thanks.

  8. Pingback: Power Query - Filter with wildcards - Excel Off The Grid

  9. Hi Chris, Excellent Article,
    I would to know in 2020, if the Power Query team have been package up functions like this and make it easy to share them publicly with other Power Query users

    Thanks

  10. How would you pass a list of phrases to the function, check each of them in order, and then exit after the first match so that it doesn’t keep evaluating?

  11. Hello Chris Webb! This was working fine for quite while – also in dataflows. But out of the sudden it stops working in Dataflow. Do you have any updates on the script to make it again work in dataflow? Or at least any hint which piece might be worth to be debugged?

    • Can you post your code? It still works fine for me with the latest PQ installed.

      If you use the version I posted in the link from 2019 above, you can step through the steps – I broke it up into a query and then had PQ generate a function from it for this purpose.

  12. Can you post your code? It still works fine for me with the latest PQ installed.

    If you use the version I posted in the link from 2019 above, you can step through the steps – I broke it up into a query and then had PQ generate a function from it for this purpose.

  13. In my search for wildcard matching in Power Query, I stumbled on your great blog and this article in particular. Inspired by your SQL-Like solution for %, I took the liberty of creating my own non-recursive version for the (equivalent) wildcard *.

    (string as text, pattern as text, optional Comparer as function) =>
    let
    CompareDefault = Comparer ?? Comparer.Ordinal,

    patList = Text.Split(pattern,”*”),
    patCount = List.Count(patList),

    // patCount=1, the pattern contains no *
    // patCount=2, the pattern has exactly one *, so the pattern has a head and tale (“[head]*[tale]”)
    // patCount>2, the pattern has more then one *, so it also contains one or more middle parts (“[head]*[middleparts]*[tale]”)

    loop=List.Generate(
    ()=>[patIndex = 0,
    patPart = patList{patIndex},
    strRest = string,
    strPos = // if the pattern contains no *, a succesful comparison is all we need
    // otherwise the string must start with the head of the pattern
    if patCount=1 then -Number.Abs(CompareDefault(strRest, patPart))
    else if Text.StartsWith(strRest,patPart,CompareDefault) then 0 else -1,
    match = strPos>=0,
    continue = true],
    each [continue],
    each if not [match] or [patIndex]+1>=patCount then
    // Stop continuing because the previous match failed or the pattern scan is completed.
    // Note that this ‘extra generation’ is needed to ensure that the final match result is included in the list
    [continue = false]
    else
    [patIndex = [patIndex]+1,
    patPart = patList{patIndex},
    strRest = Text.RemoveRange([strRest],0,[strPos]+Text.Length([patPart])),
    strPos = // if at some middle part in the pattern, we look for it in the string
    // otherwise we are at the tale of the pattern so we check if it matches the end of the string
    if patIndex=0,
    continue = true]
    ),

    result=List.Last(loop)[match]
    in
    result

    It passes all the “simple testcases” (i.e. not including a ?) listed at http://developforperformance.com/MatchingWildcards_AnImprovedAlgorithmForBigData.html,
    including the edge case with an empty pattern which your version seems to handle incorrectly (e.g. Like(“abc”, “”) returns true?). Also it avoids testing the pattern twice (start and end) in case it contains no % at all.

    Meanwhile, I successfully created a M language version of an algorithm (“wild_match_iter”) discussed at Wildcard Matching Methods (dogankurt.com), which supports both wildcards * and ?. It works just fine, but was quite a challenge for me, not being that familiar with the M language or functional programming in general. I am still working on getting a better performance.

    • Copy&Paste messed up things a bit, hope this better:

      (string as text, pattern as text, optional Comparer as function) =>
      let
      CompareDefault = Comparer ?? Comparer.Ordinal,

      patList = Text.Split(pattern,”*”),
      patCount = List.Count(patList),

      // patCount=1, the pattern contains no *
      // patCount=2, the pattern has exactly one *, so the pattern has a head and tale (“[head]*[tale]”)
      // patCount>2, the pattern has more then one *, so it also contains one or more middle parts (“[head]*[middleparts]*[tale]”)

      loop=List.Generate(
      ()=>[patIndex = 0,
      patPart = patList{patIndex},
      strRest = string,
      strPos = // if the pattern contains no *, a succesful comparison is all we need
      // otherwise the string must start with the head of the pattern
      if patCount=1 then -Number.Abs(CompareDefault(strRest, patPart))
      else if Text.StartsWith(strRest,patPart,CompareDefault) then 0 else -1,
      match = strPos>=0,
      continue = true],
      each [continue],
      each if not [match] or [patIndex]+1>=patCount then
      // Stop continuing because the previous match failed or the pattern scan is completed.
      // Note that this ‘extra generation’ is needed to ensure that the final match result is included in the list
      [continue = false]
      else
      [patIndex = [patIndex]+1,
      patPart = patList{patIndex},
      strRest = Text.RemoveRange([strRest],0,[strPos]+Text.Length([patPart])),
      strPos = // if at some middle part in the pattern, we look for it in the string
      // otherwise we are at the tale of the pattern so we check if it matches the end of the string
      if patIndex=0,
      continue = true]
      ),

      result=List.Last(loop)[match]
      in
      result

  14. Again something went wrong, try again …:

    (string as text, pattern as text, optional Comparer as function) =>
    let
    CompareDefault = Comparer ?? Comparer.Ordinal,

    patList = Text.Split(pattern,”*”),
    patCount = List.Count(patList),

    // patCount=1, the pattern contains no *
    // patCount=2, the pattern has exactly one *, so the pattern has a head and tale (“[head]*[tale]”)
    // patCount>2, the pattern has more then one *, so it also contains one or more middle parts (“[head]*[middleparts]*[tale]”)

    loop=List.Generate(
    ()=>[patIndex = 0,
    patPart = patList{patIndex},
    strRest = string,
    strPos = // if the pattern contains no *, a succesful comparison is all we need
    // otherwise the string must start with the head of the pattern
    if patCount=1 then -Number.Abs(CompareDefault(strRest, patPart))
    else if Text.StartsWith(strRest,patPart,CompareDefault) then 0 else -1,
    match = strPos>=0,
    continue = true],
    each [continue],
    each if not [match] or [patIndex]+1>=patCount then
    // Stop continuing because the previous match failed or the pattern scan is completed.
    // Note that this ‘extra generation’ is needed to ensure that the final match result is included in the list
    [continue = false]
    else
    [patIndex = [patIndex]+1,
    patPart = patList{patIndex},
    strRest = Text.RemoveRange([strRest],0,[strPos]+Text.Length([patPart])),
    strPos = // if at some middle part in the pattern, we look for it in the string
    // otherwise we are at the tale of the pattern so we check if it matches the end of the string
    if patIndex=0,
    continue = true]
    ),

    result=List.Last(loop)[match]
    in
    result

    • Really sorry, but am unable to copy and paste the code correctly. Dont understand what I’m doing wrong. Maybe comeone can help me out?

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: