Removing HTML Tags From Text In Power Query/Power BI

Recently I found I needed to remove all the HTML tags from some text in Power Query. I searched and found a great – if complex – function from Marcel Beug here, but I realised that since that post was written the Html.Table M function has been added to the language and that it makes the task very easy. In fact it’s basically the same as the solution I blogged about here for extracting URLs from a web page.

Here’s an example of how to do it:

    Source = 
        "<!DOCTYPE html>
        <h1>A Heading</h1>
        <p>A paragraph</p>
    RemoveTags = Html.Table(Source, {{"text",":root"}}),
    GetText = RemoveTags{0}

Here’s the output of this query:


The important part is the call to Html.Table using the “:root” CSS selector:

Html.Table(Source, {{"text",":root"}})

This returns a table with one column and one row, containing the text; the GetText step in the query above just gets the contents of the only cell in that table.

14 responses

  1. Chris-
    Thank you for this post. Does this or anything else like it address ascii characters and symbols effectively? I’ve inherited text with both tags and lots of commas, quotes, colons, etc. that I’m trying to replace as well. I’m currently using a solution with List.Generate and iterating over lists with codes and symbols to replace, but I’m finding it’s not scaling well.

    Thank you!

  2. Pingback: Removing HTML Tags From Text In Power Query/Power BI |

  3. I’ve been using this for a while and it’s been working great in Power BI Desktop, but we just deployed our first report with this trick into Power BI Server and we ran into a really strange situation – it seems to return blank after a scheduled data refresh. It doesn’t fail or cause the data refresh to fail, but instead the refresh succeeds and then all results are blank.

    Any idea what’s going on here? I was able to create a test to isolate this function and it causes the issue – just deploy it to either the May or September 2019 release of Power BI Server, and verify that the data works – schedule a data refresh, and boom, the results of the function for all rows are blank. The attached example uses an Excel file, though it happened to us with SQL Server data, so the source doesn’t seem to matter.

    Thoughts? I don’t have access to Power BI Web, but given the lack of questions here, I’m guessing that works perfectly 🙂

  4. Thank you very much! this is very helpful.
    I am running into one problem, using your formula – Power BI is showing the value as a link named “Record” – I can click on that link to see the text, but is there a way to just return the text? instead of a link for Record or Table?

  5. If people are interested on how to apply this technique for a table full of html text (or text that contains html tags), you can use the following method (my apologies for the bad identation):

    {“ColumnName”, each Table.FirstValue(HTML.Table(_, {{“text”, “:root”}})}


    This applies the HTML.Table function to each record in the table.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: