Removing Punctuation From Text In Power Query

In one of my first posts on Power Query (and still my favourite) I found the top 100 words in the complete works of Shakespeare. As always when you’re learning a new tool, though, I look back at what I wrote then and realise there are better ways of doing things… one of which is removing all of the punctuation from a piece of text.

In the post I’m talking about I used the following expression to remove punctuation:

Text.Remove(RemoveLineBreaks,{"," ,"." ,"?" ,";" ,":" ,";" ,"’","@" ,"#" ,"~" ,"{" ,"[" ,"}" ,"]" ,"(" ,")", "*"})

However, as you can see, it only removes the punctuation could be bothered to hard-code in the list – luckily Shakespeare didn’t use too much exotic punctuation! There is a better way of doing this, though, and it relies on the Character.FromNumber() function which takes a number and returns the equivalent Unicode character; for example, the expression Character.FromNumber(38) returns an ampersand & symbol.

Looking at the list of Unicode characters here it’s easy to identify the ranges of numbers that represent punctuation characters. These ranges can then be represented in M as lists. Remembering that lists of ranges of numbers can be declared easily by using the notation {1..5}, which results in the list of numbers {1,2,3,4,5}, you can write a list of lists containing the numbers representing the Latin punctuation characters in Unicode like so:


and you can use List.Combine() to turn this list of lists into a single list of numbers.  Finally, you can use List.Transform() and Character.FromNumber() to get a list of the Unicode characters for these numbers, and pass that list to Text.Remove(). Here’s a query showing everything:


    //get a list of lists containing the numbers of Unicode punctuation characters

    numberlists = {{0..31},{33..47},{58..64},{91..96},{123..191}},

    //turn this into a single list

    combinedlist = List.Combine(numberlists),

    //get a list of all the punctuation characters that these numbers represent

    punctuationlist = List.Transform(combinedlist, each Character.FromNumber(_)),

    //some text to test this on

    inputtext = "Hello! My name is Chris, and I'm hoping that this *cool* post will help you!",

    //the text with punctuation removed

    outputtext = Text.Remove(inputtext, punctuationlist)




The output of this query is this:


You can download the sample workbook for this post here.

6 responses

  1. Pingback: Dew Drop – August 19, 2014 (#1837) | Morning Dew

  2. Hi Chris,

    Could you have an Excel Table with a list of punction and/or abbreviations (e.g. Ltd, Co, NZ, AU, UK) and feed that into Text.Remove() ?. Then if ever something new comes a long that you want to remove from the input, users can just add it to the table.



  3. Another option could be

    punctuationlist = {Character.FromNumber(0)..Character.FromNumber(33)}However, given the number of ranges involved, the transform function is probably cleaner.

    • Crap, my concatenation with other lists was removed. Let’s try again:

      punctuationlist = {Character.FromNumber(0)..Character.FromNumber(33)}

  4. Nope, must be the ampersand character (“&”)
    messing things up.

    punctuationlist = {Character.FromNumber(0)..Character.FromNumber(33)}

    There would be a concatenation character (ampersand) at the end of each line, or the whole thing could be wrapped in List.Combine, as you have already done.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: