Last year I blogged about how to use the Text.BetweenDelimiters() function to extract all the links from the href attributes in the source of a web page. The code was reasonably simple but there’s now an even easier way to solve the same problem using the new Html.Table() function. This function doesn’t seem to be documented online yet, but the built-in documentation for the function available in the Query Editor is up-to-date:
Miguel Escobar also has a great post showing how to use it and the new Web.BrowserContents function here.
Here’s an example M query that extracts all the links that start with the letters “http” from my company homepage:
[sourcecode language=’text’ padlinenumbers=’true’]
To explain what’s going on here:
- Web.BrowserContents returns the text of the html DOM for the web page
- In the second step Html.Table takes that text and searches for all <a> elements whose href attribute starts with the letters “http”. I found this CSS selector here.