Using Text.BetweenDelimiters() To Extract URLs From A Web Page In Power BI/Power Query M

The Add Column By Example functionality that appeared in the April 2017 release of Power BI Desktop is, I have to say, very cool: I’ve used it quite a bit in the last few weeks and it really does work well. One thing I noticed while using it is that three new functions have been added to the M language to support this feature:

  • Text.BetweenDelimiters() which takes a text value and extracts the part that appears between two given delimiter values
  • Text.BeforeDelimiter() which takes a text value and extracts the part that appears before a given delimiter value
  • Text.AfterDelimiter() which takes a text value and extracts the part that appears after a given delimiter value

The functions themselves are quite straightforward and the online documentation has some good examples of how to use them. To save you the click here’s an extra example – the expression:

Text.BetweenDelimiters("Hello *world!??", "*", "!")

…returns the text “world”:


As it happens last week I received an email from a reader who wanted to know if it was possible to extract all the links from the href attributes in the source of a web page using M, and I realised that Text.BetweenDelimiters() would be very useful for doing this. I wrote the following M function to demonstrate:

(SourceURL as text, AttributeDelimiter as text) =>
	//Get HTML source
    Source = Text.FromBinary(Web.Contents(SourceURL)),
	//Function to find each link
    GetLink = (Counter as number) =>
                        CurrentLink = 
			"href=" & AttributeDelimiter, 
                        if CurrentLink="" 
	//Call function
    Output = GetLink(0)

A few things to note:

  • I’m using a combination of Text.FromBinary() and Web.Contents() to get the HTML source for the web page whose links we’re extracting
  • Since HTML allows the use of single and double quotes for attributes, I’ve added a parameter to my function called AttributeDelimiter to allow either to be passed in
  • Text.BetweenDelimiters only extracts one piece of text at a time, but you can specify which occurrence of the start delimiter it uses. I therefore used recursion to extract the contents of every href attribute in the HTML: I declare a function called GetLink, and from within that function I can make a recursive call by putting an @ before the function name as in line 22 above. It would probably be better to use List.Generate() instead of recursion here though.

Assuming the query that returns this function is called GetAllLinks


…then it can be called in a new query like so:


One other thing to point out is how, in order to pass a double quote character to the function as text, since text has itself to be enclosed in double quotes I need to use four double quotes: “”””

The output of this query is a list containing all of the links from the href attributes on the page that are enclosed in double quotes:


I guess this could be taken even further to create a function that crawls a series of web pages and returns the links in all of them, then uses the Force Directed Graph custom visual or better still NodeXL in Excel to show which pages link to each other. I’ll leave that to someone else to do though…!

You can download a pbix file with all of the examples in this post here.

12 responses

  1. Pingback: daily 04/25/2017 | Cshonea's Blog

  2. Hello Chris,

    Do mind explaining how exactly your recursion logic works here. I know you you mentioned that PQ team suggest using List.Generate() instead but it will be still very interesting to know how you got it to work with recursion and why is GetLink(0)


    • GetLink(0) is the first call to the GetLink() function, defined in the previous step. The first call GetLink(0) finds the first link in the text; inside the function, when the function calls itself, you’ll see that the call @GetLink(Counter+1) is to find the link that is after the link for the current call.

  3. Pingback: Using Html.Table() To Extract URLs From A Web Page In Power BI/Power Query M « Chris Webb's BI Blog

  4. Is it possible to iterate through many pages on a website using this method?
    = GetAllLinks(
    {“”,””,””} ,

    or somthing like that?

  5. Pingback: Chris Webb's BI Blog: Using Html.Table() To Extract URLs From A Web Page In Power BI/Power Query M Chris Webb's BI Blog

  6. Hello Chris, I used your function in a personal project and it worked great, now I’m making an entry in my blog talking about it. My question is, can I quote your function?, giving you the credits and linking it to this page?.

  7. Pingback: Power Query: una herramienta potente - Savingl

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: