Removing Punctuation From Text With The Text.Select M Function In Power BI/Power Query/Excel Get&Transform

A new, as-yet undocumented, M function appeared in the December 2017 release of Power BI Desktop (I assume it will appear in Excel soon): Text.Select. Here’s the documentation from the Query Editor:

TextSelect

It’s very easy to use: the first parameter takes a text value, the second parameter takes either a text value containing a single text value or a list of single characters, and it returns the text from the first parameter minus all characters that are not in the second parameter. For example, the expression:

[sourcecode language=”text” padlinenumbers=”true”]
Text.Select("Hello", "l")
[/sourcecode]

…returns the text value “ll”:

image

…and the expression:

[sourcecode language=”text”]
Text.Select("Hello", {"H","e","o"})
[/sourcecode]

…returns the text value “Heo”:

image

There are a lot of scenarios where Text.Select will be useful, and the one that I immediately thought of was to remove punctuation from text. In one of my earliest M posts on this blog I used Text.Remove to do this while trying to find Shakespeare’s favourite words, but the problem with this approach is that you have to explicitly specify all the characters you want to remove from your text – and there could be a lot of characters that need to be excluded. Text.Select is a much better option here because it allows you to specify the characters you want to keep.

The first step to doing this is to understand how to construct the list of the characters you do want to keep. You can do this very easily in M when declaring a list using the range technique I blogged about here, so you should read that post before carrying on. The following expression returns a list containing all 26 uppercase and lowercase letters in the alphabet plus a space:

[sourcecode language=”text”]
List.Combine({{"A".."Z"},{"a".."z"},{" "}})
[/sourcecode]

image

Of course depending on the scenario or language you’re working with you may want to include other characters, for example apostrophes or letters with accents, too. Here’s a slightly more complex example of how this list can be used with Text.Select:

[sourcecode language=”text”]
let
SourceText = "Hi! Stop, please. What is your name?",
CharsToInclude = List.Combine({{"A".."Z"},{"a".."z"},{" "}}),
RemovePunc = Text.Select(SourceText, CharsToInclude)
in
RemovePunc
[/sourcecode]

The query above takes the text “Hi! Stop, please. What is your name?” and returns the text “Hi Stop please What is your name”.

image

Finally, because I couldn’t read my old M code without cringing a little bit, here’s an updated version of my query that gets the top 100 words from the Complete Works Of Shakespeare (direct from the Project Gutenberg website):

[sourcecode language=”text”]
let
URL = "http://www.gutenberg.org/cache/epub/100/pg100.txt",
Source = Text.FromBinary(Web.Contents(URL)),
Lowercase = Text.Lower(Source),
RemovePunctuation = Text.Select(Lowercase,
List.Combine({{"a".."z"},{" "}})),
SplitText = Splitter.SplitTextByWhitespace(QuoteStyle.None),
SplitIntoWords = SplitText(RemovePunctuation),
RemoveBlanks = List.Select(SplitIntoWords, each _<>" "),
TableFromList = Table.FromColumns({RemoveBlanks},
type table [Word=text]),
FindWordCounts = Table.Group(
TableFromList,
{"Word"},
{{"Count", each Table.RowCount(_), type number}}),
SortedRows = Table.Sort(
FindWordCounts,
{{"Count", Order.Descending}}),
KeptFirstRows = Table.FirstN(SortedRows,100)
in
KeptFirstRows
[/sourcecode]

Here they are as a word cloud (yes I know it’s not good dataviz practice, but it’s for fun):

image

You can download the .pbix file with this example in here.

BONUS FACT: another new M function appeared recently too: Function.From. You can read all about it on this thread on the Power Query forum.

7 thoughts on “Removing Punctuation From Text With The Text.Select M Function In Power BI/Power Query/Excel Get&Transform

  1. Thanks. It sounds very useful, but how do you use Text.Select on text in each row of a table? I get the error “Expression.Error: We cannot convert a value of type Table to type Text.”

Leave a Reply to SteveCancel reply