Last week someone asked me whether it was possible to do the equivalent of a SQL LIKE filter in Power Query. Unfortunately there isn’t a function to do this in the standard library but, as always, it is possible to write some M code to do this. Here’s what I came up while I was waiting around at the stables during my daughter’s horse-riding lesson. At the moment it only supports the % wildcard character; also I can’t guarantee that it’s the most efficient implementation or indeed 100% bug-free, but it seems to work fine as far as I can see…
let
Like = (Phrase as text, Pattern as text) =>
let
//Split pattern up into a list using % as a delimiter
PatternList = Text.Split(Pattern, "%"),
//if the first character in the pattern is %
//then the first item in the list is an empty string
StartsWithWc = (List.First(PatternList)=""),
//if the last character in the pattern is %
//then the last item in the list is an empty string
EndsWithWc = (List.Last(PatternList)=""),
//if the first character is not %
//then we have to match the first string in the pattern
//with the opening characters of the phrase
StartsTest = if (StartsWithWc=false)
then Text.StartsWith(Phrase, List.First(PatternList))
else true,
//if the last item is not %
//then we have to match the final string in the pattern
//with the final characters of the phrase
EndsText = if (EndsWithWc=false)
then Text.EndsWith(Phrase, List.Last(PatternList))
else true,
//now we also need to check that each string in the pattern appears
//in the correct order in the phrase
//and to do this we need to declare a function PhraseFind
PhraseFind = (Phrase as text, SearchString as list) =>
let
//does the first string in the pattern appear in the phrase?
StringPos = Text.PositionOf(Phrase, SearchString{0}, Occurrence.First),
PhraseFindOutput =
if
//if string not find then return false
(StringPos=-1)
then false
else if
//we have found the string in the pattern, and
//if this is the last string in the pattern, return true
List.Count(SearchString)=1
then true
else
//if it isn't the last string in the pattern
//test the next string in the pattern by removing
//the first string from the pattern list
//and all text up to and including the string we have found in the phrase
(true and
@PhraseFind(
Text.RemoveRange(Phrase, 0, StringPos + Text.Length(SearchString{0})),
List.RemoveRange(SearchString, 0, 1)))
in
PhraseFindOutput,
//return true if we have passed all tests
Output = StartsTest and EndsText and PhraseFind(Phrase, PatternList)
in
Output
in
Like
Using the following test data:
I can run the following query:
let
Source = Excel.CurrentWorkbook(){[Name="Phrases"]}[Content],
ChangedType = Table.TransformColumnTypes(Source,{{"Phrases", type text}}),
InsertedCustom = Table.AddColumn(ChangedType, "Test", each Like([Phrases],"%cat%sat%mat%"))
in
InsertedCustom
And get this output:
You can download the sample workbook here.
I know the Power Query team have been asked for this several times already, but it would be really useful if we could package up functions like this and make it easy to share them publicly with other Power Query users…
Nice solution, but is it possible for your “Like = (Phrase as text, Pattern as text) =>” function to use two different data sources for the inputs? This seems like a partial solution to the SQL like function.
Yes, absolutely – the parameter values can come from anywhere.
Hi Chris, really excellent read.
I’m testing a variation of this, but am hitting a wall. I’m trying to take a table of terms to see if any of them match anywhere in the text strings of a column in Power Query. Using your Like Query, I can match to one term, but not multiple terms (and only exact match to the string, not broad matching to any part of the string).
Have you ever tried to solve this approach?
The pertinent code working (for single term exact, but not broad matching): Table.AddColumn(#”Renamed Columns”, “Brand”, each Like([Phrase],”%car%”))
And not working for multiple phrases (and also not broad matching, of course): Table.AddColumn(#”Renamed Columns”, “Brand”, each Like([Phrase],”%car%red car%”))
Really love the blog!
C
It sounds like a function using List.Generate(), similar to the one shown here:
https://blog.crossjoin.co.uk/2014/06/25/using-list-generate-to-make-multiple-replacements-of-words-in-text-in-power-query/
Would allow you to loop over a list of multiple terms.
Chris – Please show where the argument for PhraseFile function SearchString is obtained? I can’t see it. Thanks.
I call the PhraseFind function in the Output step, towards the end of the query.
Thanks for this, it’s very nice.
I’ve just hacked this to use a pattern table containing a list of patterns.
I merge and expand my patterns table (cartesian join) to the data table I’m matching. I then run your function passing it the columns containing the phrase and the pattern as parameters.
It’s not even slow…
https://www.dropbox.com/s/53bwjh3s8bjei64/PQLikeFilterV2.xlsx?dl=1
Hi Chris, Excellent Article,
I would to know in 2020, if the Power Query team have been package up functions like this and make it easy to share them publicly with other Power Query users
Thanks
No, there’s no way to do this yet, sorry
How would you pass a list of phrases to the function, check each of them in order, and then exit after the first match so that it doesn’t keep evaluating?
Hello Chris Webb! This was working fine for quite while – also in dataflows. But out of the sudden it stops working in Dataflow. Do you have any updates on the script to make it again work in dataflow? Or at least any hint which piece might be worth to be debugged?
Can you post your code? It still works fine for me with the latest PQ installed.
If you use the version I posted in the link from 2019 above, you can step through the steps – I broke it up into a query and then had PQ generate a function from it for this purpose.
Can you post your code? It still works fine for me with the latest PQ installed.
If you use the version I posted in the link from 2019 above, you can step through the steps – I broke it up into a query and then had PQ generate a function from it for this purpose.
How like function can be implemented in PATHCONTAINS function in dax query
In my search for wildcard matching in Power Query, I stumbled on your great blog and this article in particular. Inspired by your SQL-Like solution for %, I took the liberty of creating my own non-recursive version for the (equivalent) wildcard *.
(string as text, pattern as text, optional Comparer as function) =>
let
CompareDefault = Comparer ?? Comparer.Ordinal,
patList = Text.Split(pattern,”*”),
patCount = List.Count(patList),
// patCount=1, the pattern contains no *
// patCount=2, the pattern has exactly one *, so the pattern has a head and tale (“[head]*[tale]”)
// patCount>2, the pattern has more then one *, so it also contains one or more middle parts (“[head]*[middleparts]*[tale]”)
loop=List.Generate(
()=>[patIndex = 0,
patPart = patList{patIndex},
strRest = string,
strPos = // if the pattern contains no *, a succesful comparison is all we need
// otherwise the string must start with the head of the pattern
if patCount=1 then -Number.Abs(CompareDefault(strRest, patPart))
else if Text.StartsWith(strRest,patPart,CompareDefault) then 0 else -1,
match = strPos>=0,
continue = true],
each [continue],
each if not [match] or [patIndex]+1>=patCount then
// Stop continuing because the previous match failed or the pattern scan is completed.
// Note that this ‘extra generation’ is needed to ensure that the final match result is included in the list
[continue = false]
else
[patIndex = [patIndex]+1,
patPart = patList{patIndex},
strRest = Text.RemoveRange([strRest],0,[strPos]+Text.Length([patPart])),
strPos = // if at some middle part in the pattern, we look for it in the string
// otherwise we are at the tale of the pattern so we check if it matches the end of the string
if patIndex=0,
continue = true]
),
result=List.Last(loop)[match]
in
result
It passes all the “simple testcases” (i.e. not including a ?) listed at http://developforperformance.com/MatchingWildcards_AnImprovedAlgorithmForBigData.html,
including the edge case with an empty pattern which your version seems to handle incorrectly (e.g. Like(“abc”, “”) returns true?). Also it avoids testing the pattern twice (start and end) in case it contains no % at all.
Meanwhile, I successfully created a M language version of an algorithm (“wild_match_iter”) discussed at Wildcard Matching Methods (dogankurt.com), which supports both wildcards * and ?. It works just fine, but was quite a challenge for me, not being that familiar with the M language or functional programming in general. I am still working on getting a better performance.
Copy&Paste messed up things a bit, hope this better:
(string as text, pattern as text, optional Comparer as function) =>
let
CompareDefault = Comparer ?? Comparer.Ordinal,
patList = Text.Split(pattern,”*”),
patCount = List.Count(patList),
// patCount=1, the pattern contains no *
// patCount=2, the pattern has exactly one *, so the pattern has a head and tale (“[head]*[tale]”)
// patCount>2, the pattern has more then one *, so it also contains one or more middle parts (“[head]*[middleparts]*[tale]”)
loop=List.Generate(
()=>[patIndex = 0,
patPart = patList{patIndex},
strRest = string,
strPos = // if the pattern contains no *, a succesful comparison is all we need
// otherwise the string must start with the head of the pattern
if patCount=1 then -Number.Abs(CompareDefault(strRest, patPart))
else if Text.StartsWith(strRest,patPart,CompareDefault) then 0 else -1,
match = strPos>=0,
continue = true],
each [continue],
each if not [match] or [patIndex]+1>=patCount then
// Stop continuing because the previous match failed or the pattern scan is completed.
// Note that this ‘extra generation’ is needed to ensure that the final match result is included in the list
[continue = false]
else
[patIndex = [patIndex]+1,
patPart = patList{patIndex},
strRest = Text.RemoveRange([strRest],0,[strPos]+Text.Length([patPart])),
strPos = // if at some middle part in the pattern, we look for it in the string
// otherwise we are at the tale of the pattern so we check if it matches the end of the string
if patIndex=0,
continue = true]
),
result=List.Last(loop)[match]
in
result
Again something went wrong, try again …:
(string as text, pattern as text, optional Comparer as function) =>
let
CompareDefault = Comparer ?? Comparer.Ordinal,
patList = Text.Split(pattern,”*”),
patCount = List.Count(patList),
// patCount=1, the pattern contains no *
// patCount=2, the pattern has exactly one *, so the pattern has a head and tale (“[head]*[tale]”)
// patCount>2, the pattern has more then one *, so it also contains one or more middle parts (“[head]*[middleparts]*[tale]”)
loop=List.Generate(
()=>[patIndex = 0,
patPart = patList{patIndex},
strRest = string,
strPos = // if the pattern contains no *, a succesful comparison is all we need
// otherwise the string must start with the head of the pattern
if patCount=1 then -Number.Abs(CompareDefault(strRest, patPart))
else if Text.StartsWith(strRest,patPart,CompareDefault) then 0 else -1,
match = strPos>=0,
continue = true],
each [continue],
each if not [match] or [patIndex]+1>=patCount then
// Stop continuing because the previous match failed or the pattern scan is completed.
// Note that this ‘extra generation’ is needed to ensure that the final match result is included in the list
[continue = false]
else
[patIndex = [patIndex]+1,
patPart = patList{patIndex},
strRest = Text.RemoveRange([strRest],0,[strPos]+Text.Length([patPart])),
strPos = // if at some middle part in the pattern, we look for it in the string
// otherwise we are at the tale of the pattern so we check if it matches the end of the string
if patIndex=0,
continue = true]
),
result=List.Last(loop)[match]
in
result
Really sorry, but am unable to copy and paste the code correctly. Dont understand what I’m doing wrong. Maybe comeone can help me out?