Using Text.BetweenDelimiters() To Extract URLs From A Web Page In Power BI/Power Query M

The Add Column By Example functionality that appeared in the April 2017 release of Power BI Desktop is, I have to say, very cool: I’ve used it quite a bit in the last few weeks and it really does work well. One thing I noticed while using it is that three new functions have been added to the M language to support this feature:

  • Text.BetweenDelimiters() which takes a text value and extracts the part that appears between two given delimiter values
  • Text.BeforeDelimiter() which takes a text value and extracts the part that appears before a given delimiter value
  • Text.AfterDelimiter() which takes a text value and extracts the part that appears after a given delimiter value

The functions themselves are quite straightforward and the online documentation has some good examples of how to use them. To save you the click here’s an extra example – the expression:

Text.BetweenDelimiters("Hello *world!??", "*", "!")

…returns the text “world”:

image

As it happens last week I received an email from a reader who wanted to know if it was possible to extract all the links from the href attributes in the source of a web page using M, and I realised that Text.BetweenDelimiters() would be very useful for doing this. I wrote the following M function to demonstrate:

(SourceURL as text, AttributeDelimiter as text) =>
let
	//Get HTML source
    Source = Text.FromBinary(Web.Contents(SourceURL)),
	//Function to find each link
    GetLink = (Counter as number) =>
                    let
                        CurrentLink = 
		Text.BetweenDelimiters(
			Source, 
			"href=" & AttributeDelimiter, 
			AttributeDelimiter,
			Counter
		)
                    in
                        if CurrentLink="" 
		then 
			{} 
		else 
			List.Combine({
				{CurrentLink}, 
				@GetLink(Counter+1)}
			),
	//Call function
    Output = GetLink(0)
in
    Output

A few things to note:

  • I’m using a combination of Text.FromBinary() and Web.Contents() to get the HTML source for the web page whose links we’re extracting
  • Since HTML allows the use of single and double quotes for attributes, I’ve added a parameter to my function called AttributeDelimiter to allow either to be passed in
  • Text.BetweenDelimiters only extracts one piece of text at a time, but you can specify which occurrence of the start delimiter it uses. I therefore used recursion to extract the contents of every href attribute in the HTML: I declare a function called GetLink, and from within that function I can make a recursive call by putting an @ before the function name as in line 22 above. It would probably be better to use List.Generate() instead of recursion here though.

Assuming the query that returns this function is called GetAllLinks

image

…then it can be called in a new query like so:

GetAllLinks(
	"https://msdn.microsoft.com/en-us/library/mt798303.aspx", 
	""""
)

One other thing to point out is how, in order to pass a double quote character to the function as text, since text has itself to be enclosed in double quotes I need to use four double quotes: “”””

The output of this query is a list containing all of the links from the href attributes on the page that are enclosed in double quotes:

image

I guess this could be taken even further to create a function that crawls a series of web pages and returns the links in all of them, then uses the Force Directed Graph custom visual or better still NodeXL in Excel to show which pages link to each other. I’ll leave that to someone else to do though…!

You can download a pbix file with all of the examples in this post here.

Power BI, SSAS Multidimensional And Dynamic Format Strings

If you’re building reports in Power BI against SSAS Multidimensional cubes then you may have encountered situations where the formatting on your measures disappears. For example, take a very simple SSAS Multidimensional cube with a single measure called Sales Amount whose FormatString property is set in SSDT to display values with a £ sign:

image

When you build a report using the Table visualisation in Power BI Desktop using this measure, the formatted values are displayed correctly:

image

However, if you add a SCOPE statement to the cube to alter the format string of the measure for certain cells, as in this example which sets the format string for the Sales Amount measure to $ for Bikes:

SCOPE([Measures].[Sales Amount], [Product].[Category].&[1]);
    FORMAT_STRING(THIS)="$0,0.00";
END SCOPE;

…then you’ll find that Power BI displays no formatting at all for the measure:

image

What’s more (and this is a bit strange) if you look at the DAX queries that are generated by Power BI to get data from the cube, they now request a new column to get the format string for the measure even though that format string isn’t used. Since it increases the amount of data returned by the query much larger, this extra column can have a negative impact on query performance if you’re bringing back large amounts of data.

There is no way of avoiding this problem at the moment, unfortunately. If you need to display formatted values in Power BI you will have to create a calculated measure that returns the value of your original measure, set the format string property on that calculated measure appropriately, and use that calculated measure in your Power BI reports instead:

SCOPE([Measures].[Sales Amount], [Product].[Category].&[1]);
    FORMAT_STRING(THIS)="$0,0.00";
END SCOPE;

CREATE MEMBER CURRENTCUBE.[Measures].[Test] AS
[Measures].[Sales Amount],
FORMAT_STRING="£0,0.00";

image

Thanks to Kevin Jourdain for bringing this to my attention and telling me about the workaround, and also to Greg Galloway for confirming the workaround and providing extra details.

UPDATE October 2017: this issue appears to be fixed in the latest release of Power BI https://powerbi.microsoft.com/en-us/blog/power-bi-desktop-october-2017-feature-summary/#analytics

The DAX Unichar() Function And How To Use It In Measures For Data Visualisation

A few weeks ago I was asked whether it was possible to display line breaks in text in a Power BI visualisation. It turns out it isn’t possible – at the moment Power BI always strips line breaks out of text when it gets loaded into the Data Model. However while researching this I came across the DAX Unichar() function, which returns the unicode character associated with an integer value – and which also seems to be completely undocumented for some reason, I guess because it’s new (it isn’t in Excel 2016 DAX yet as far as I can see).

It’s very straightforward to use: for example, the DAX expression UNICHAR(65) returns the character A; see here for a list of unicode characters and their associated codes. You can have a lot of fun with this function in Power BI when you use it to return symbols that in turn can be used to represent data, so I thought I would put together a few examples to show you.

Take the following table which contains scores for restaurants in the range of 0 to 5:

image

The following measure:

Stars = 
REPT(UNICHAR(9733), AVERAGE('Restaurants'[Score])) 
& 
REPT(UNICHAR(9734), 5-AVERAGE('Restaurants'[Score]))

…which uses the Unichar() function to return characters 9733 and 9734, filled and unfilled stars, and the Rept() function to return a string with those characters repeated N times, can be used to create a table like this in Power BI:

image

[I’m sure I read a blog post somewhere that describes this trick with Rept() but I can’t find it anywhere – if you know the one I’m talking about please leave a link in the comments]

Similarly, with the following source data showing the days that customers made a purchase in a week:

image

…you can use the following measure, which returns characters 9635 (a square with a black dot inside) and 9634 (an empty black square), in a matrix to visualise this information:

Purchase Indicator = 
IF(
	COUNTROWS('Purchase Days')>0, 
		UNICHAR(9635), 
		UNICHAR(9634)
) 
& REPT(" ", 5)

image

Finally, an example using the box drawing unicode block to visualise the following date ranges (nb the dates are in dd/mm/yyyy format). Here’s the source data:

image

Here’s the measure, which uses characters 9500, 9472 and 9508 to draw bars:

Employment Range = 
VAR OverallMinimumDate = 
	CALCULATE(
		MIN('Employment Dates'[Start Date]), 
		ALLSELECTED('Employment Dates'))
VAR DaysBeforeStart = 
	MAX('Employment Dates'[Start Date]) - 
	OverallMinimumDate
VAR DaysBetweenStartAndEnd = 
	MAX('Employment Dates'[End Date]) - 
	MAX('Employment Dates'[Start Date])
VAR BarsBetween = 
	IF(DaysBetweenStartAndEnd>1, 
		DaysBetweenStartAndEnd-2, 
		0)
RETURN 
	REPT(" ", DaysBeforeStart) & 
	UNICHAR(9500) & 
	REPT(UNICHAR(9472), BarsBetween) & 
	UNICHAR(9508)

And here’s the output in a table:

image

You can download the Power BI .pbix file with these examples in here.

Is this going to revolutionise how you design reports? No of course not, but I think it could be a useful trick in certain scenarios. If you do come up with other creative ways to use unicode characters in your reports I would be interested to see the results!

BI Survey 2017

Once again it’s time to complete the BI Survey and take part in the largest annual survey of BI users. By taking part you get the chance to win some Amazon gift vouchers, and by promoting it here I get a free copy of the results which always provides some very interesting insights and will be the basis for a blog post this autumn. I wonder what people will say about Power BI this year?

Click here to fill out the survey, which should only take 20 minutes.