Using Text.BetweenDelimiters() To Extract URLs From A Web Page In Power BI/Power Query M

The Add Column By Example functionality that appeared in the April 2017 release of Power BI Desktop is, I have to say, very cool: I’ve used it quite a bit in the last few weeks and it really does work well. One thing I noticed while using it is that three new functions have been added to the M language to support this feature:

  • Text.BetweenDelimiters() which takes a text value and extracts the part that appears between two given delimiter values
  • Text.BeforeDelimiter() which takes a text value and extracts the part that appears before a given delimiter value
  • Text.AfterDelimiter() which takes a text value and extracts the part that appears after a given delimiter value

The functions themselves are quite straightforward and the online documentation has some good examples of how to use them. To save you the click here’s an extra example – the expression:

Text.BetweenDelimiters("Hello *world!??", "*", "!")

…returns the text “world”:

image

As it happens last week I received an email from a reader who wanted to know if it was possible to extract all the links from the href attributes in the source of a web page using M, and I realised that Text.BetweenDelimiters() would be very useful for doing this. I wrote the following M function to demonstrate:

(SourceURL as text, AttributeDelimiter as text) =>
let
	//Get HTML source
    Source = Text.FromBinary(Web.Contents(SourceURL)),
	//Function to find each link
    GetLink = (Counter as number) =>
                    let
                        CurrentLink = 
		Text.BetweenDelimiters(
			Source, 
			"href=" & AttributeDelimiter, 
			AttributeDelimiter,
			Counter
		)
                    in
                        if CurrentLink="" 
		then 
			{} 
		else 
			List.Combine({
				{CurrentLink}, 
				@GetLink(Counter+1)}
			),
	//Call function
    Output = GetLink(0)
in
    Output

A few things to note:

  • I’m using a combination of Text.FromBinary() and Web.Contents() to get the HTML source for the web page whose links we’re extracting
  • Since HTML allows the use of single and double quotes for attributes, I’ve added a parameter to my function called AttributeDelimiter to allow either to be passed in
  • Text.BetweenDelimiters only extracts one piece of text at a time, but you can specify which occurrence of the start delimiter it uses. I therefore used recursion to extract the contents of every href attribute in the HTML: I declare a function called GetLink, and from within that function I can make a recursive call by putting an @ before the function name as in line 22 above. It would probably be better to use List.Generate() instead of recursion here though.

Assuming the query that returns this function is called GetAllLinks

image

…then it can be called in a new query like so:

GetAllLinks(
	"https://msdn.microsoft.com/en-us/library/mt798303.aspx", 
	""""
)

One other thing to point out is how, in order to pass a double quote character to the function as text, since text has itself to be enclosed in double quotes I need to use four double quotes: “”””

The output of this query is a list containing all of the links from the href attributes on the page that are enclosed in double quotes:

image

I guess this could be taken even further to create a function that crawls a series of web pages and returns the links in all of them, then uses the Force Directed Graph custom visual or better still NodeXL in Excel to show which pages link to each other. I’ll leave that to someone else to do though…!

You can download a pbix file with all of the examples in this post here.

Power BI, SSAS Multidimensional And Dynamic Format Strings

If you’re building reports in Power BI against SSAS Multidimensional cubes then you may have encountered situations where the formatting on your measures disappears. For example, take a very simple SSAS Multidimensional cube with a single measure called Sales Amount whose FormatString property is set in SSDT to display values with a £ sign:

image

When you build a report using the Table visualisation in Power BI Desktop using this measure, the formatted values are displayed correctly:

image

However, if you add a SCOPE statement to the cube to alter the format string of the measure for certain cells, as in this example which sets the format string for the Sales Amount measure to $ for Bikes:

SCOPE([Measures].[Sales Amount], [Product].[Category].&[1]);
    FORMAT_STRING(THIS)="$0,0.00";
END SCOPE;

…then you’ll find that Power BI displays no formatting at all for the measure:

image

What’s more (and this is a bit strange) if you look at the DAX queries that are generated by Power BI to get data from the cube, they now request a new column to get the format string for the measure even though that format string isn’t used. Since it increases the amount of data returned by the query much larger, this extra column can have a negative impact on query performance if you’re bringing back large amounts of data.

There is no way of avoiding this problem at the moment, unfortunately. If you need to display formatted values in Power BI you will have to create a calculated measure that returns the value of your original measure, set the format string property on that calculated measure appropriately, and use that calculated measure in your Power BI reports instead:

SCOPE([Measures].[Sales Amount], [Product].[Category].&[1]);
    FORMAT_STRING(THIS)="$0,0.00";
END SCOPE;

CREATE MEMBER CURRENTCUBE.[Measures].[Test] AS 
[Measures].[Sales Amount],
FORMAT_STRING="£0,0.00";

image

Thanks to Kevin Jourdain for bringing this to my attention and telling me about the workaround, and also to Greg Galloway for confirming the workaround and providing extra details.

The DAX Unichar() Function And How To Use It In Measures For Data Visualisation

A few weeks ago I was asked whether it was possible to display line breaks in text in a Power BI visualisation. It turns out it isn’t possible – at the moment Power BI always strips line breaks out of text when it gets loaded into the Data Model. However while researching this I came across the DAX Unichar() function, which returns the unicode character associated with an integer value – and which also seems to be completely undocumented for some reason, I guess because it’s new (it isn’t in Excel 2016 DAX yet as far as I can see).

It’s very straightforward to use: for example, the DAX expression UNICHAR(65) returns the character A; see here for a list of unicode characters and their associated codes. You can have a lot of fun with this function in Power BI when you use it to return symbols that in turn can be used to represent data, so I thought I would put together a few examples to show you.

Take the following table which contains scores for restaurants in the range of 0 to 5:

image

The following measure:

Stars = 
REPT(UNICHAR(9733), AVERAGE('Restaurants'[Score])) 
& 
REPT(UNICHAR(9734), 5-AVERAGE('Restaurants'[Score]))

…which uses the Unichar() function to return characters 9733 and 9734, filled and unfilled stars, and the Rept() function to return a string with those characters repeated N times, can be used to create a table like this in Power BI:

image

[I’m sure I read a blog post somewhere that describes this trick with Rept() but I can’t find it anywhere – if you know the one I’m talking about please leave a link in the comments]

Similarly, with the following source data showing the days that customers made a purchase in a week:

image

…you can use the following measure, which returns characters 9635 (a square with a black dot inside) and 9634 (an empty black square), in a matrix to visualise this information:

Purchase Indicator = 
IF(
	COUNTROWS('Purchase Days')>0, 
		UNICHAR(9635), 
		UNICHAR(9634)
) 
& REPT(" ", 5)

image

Finally, an example using the box drawing unicode block to visualise the following date ranges (nb the dates are in dd/mm/yyyy format). Here’s the source data:

image

Here’s the measure, which uses characters 9500, 9472 and 9508 to draw bars:

Employment Range = 
VAR OverallMinimumDate = 
	CALCULATE(
		MIN('Employment Dates'[Start Date]), 
		ALLSELECTED('Employment Dates'))
VAR DaysBeforeStart = 
	MAX('Employment Dates'[Start Date]) - 
	OverallMinimumDate
VAR DaysBetweenStartAndEnd = 
	MAX('Employment Dates'[End Date]) - 
	MAX('Employment Dates'[Start Date])
VAR BarsBetween = 
	IF(DaysBetweenStartAndEnd>1, 
		DaysBetweenStartAndEnd-2, 
		0)
RETURN 
	REPT(" ", DaysBeforeStart) & 
	UNICHAR(9500) & 
	REPT(UNICHAR(9472), BarsBetween) & 
	UNICHAR(9508)

And here’s the output in a table:

image

You can download the Power BI .pbix file with these examples in here.

Is this going to revolutionise how you design reports? No of course not, but I think it could be a useful trick in certain scenarios. If you do come up with other creative ways to use unicode characters in your reports I would be interested to see the results!

Passing Parameters To SQL Queries With Value.NativeQuery() In Power Query And Power BI

I first came across the Value.NativeQuery() M function about six months ago, but it didn’t do anything useful then so I didn’t blog about it. I checked it again recently though and now it does something very handy indeed: it allows you to pass parameters to SQL queries. Before this, if you wanted to use parameters in your SQL, you had to do some nasty string manipulation in the way Ken Puls shows here. Now, with Value.NativeQuery(), you can handle SQL query parameters properly in M.

Here’s a simple example that shows how it works, passing two parameters to a SQL query on the Adventure Works DW database:

let
    Source = Sql.Database("localhost", "Adventure Works DW"),
    Test = Value.NativeQuery(
            Source, 
            "SELECT * FROM DimDate 
            WHERE EnglishMonthName=@MonthName 
            AND
            EnglishDayNameOfWeek=@DayName", 
            [MonthName="March", DayName="Tuesday"])
in
    Test

image

Some explanation of what’s happening here:

  • The Source step returns a reference to a SQL Server database, and this is passed to the first parameter of Value.NativeQuery().
  • The second parameter of the Value.NativeQuery() function is the SQL query to be executed. It contains two parameters called @MonthName and @DayName.
  • The parameters to the SQL query are passed using a record in the third parameter of Value.NativeQuery(). Note how the named of the fields in the records match the names of the parameters in the SQL query.

It looks like, eventually, this will be the way that any type of ‘native’ query (ie a query that you write and give to Power Query, rather than a query that is generated for you) is run against any kind of data source – instead of the situation we have today where different M functions are needed to run queries against different types of data source. I guess at some point the UI will be updated to use this function. I don’t think it’s ‘finished’ yet either, because it doesn’t work on Analysis Services data sources, although it may work with other relational data sources – I haven’t tested it on anything other than SQL Server and SSAS. There’s also a fourth parameter for Value.NativeQuery() that can be used to pass data source specific options, but I have no idea what these could be and I don’t think there are any supported for SQL Server. It will be interesting to see how it develops over the next few releases.

Referenced Queries And Caching In Power BI And Power Query

Last week, Maxim Zelensky (whose blog is well worth checking out) tweeted about a very interesting answer he had received to a question he posted on the Power Query MSDN forum, on the subject of caching and referenced queries in Power Query. You can read the thread here:

https://social.technet.microsoft.com/Forums/en-US/8d5ee632-fdff-4ba2-b150-bb3591f955fb/queries-evaluation-chain?forum=powerquery

…but since this is such important information – it’s certainly something I’ve wondered about myself – I though I would share Maxim’s question and the response from Ehren of the Power Query dev team here in full so it gets the wider visibility it deserves. I’m very grateful to Maxim for letting me share this and to Ehren for writing such a detailed response.

First, Maxim’s original question:

There are two different scenarios I am working with:

1) Query1 connects to the data source (flat file) and make basic cleaning and transformations.

Then Query2 and Query3 reference to Query1, performing other transformations needed.

Query3 also take some data from Query2.

Query2 and Query3 then exported to sheet, Query1 – connection only.

As far as I can understand, PQ can define refresh chain like this: evaluate Query1, then evaluate Query2, then evaluate Query3 (as it need the results from Query2). The question is: When PQ performs calculation of Query3, will it recalculate Query1? Or, as it was evaluated in the chain before, Query3 will use cached results of previous Query1 calculation (performed when Query2 was evaluated)?

2) I have a set of flat files, and I take data from them with one Query1. Query1 also performs some transformations. Then I have two independent Query2 and Query3, both connected to Query1 performing different transformations of source data. Results of Query2 and Query3 evaluations then used in Query4, which exports its results to the sheets, Query1, Query2 and Query3 – connection only

The second question is: performing "Refresh" on Query4, how much times will be Query1 evaluated – one, two or three ? Or there also will be chain: calculate Q1, caching, then Q2 or Q3, calculate next using cached results of Q1 evaluation, and then – Q4?

3) Is there is a difference with connection to database?

4) Is there any rules of evaluation chain (like each expression/query will be calculated once in the evaluation chain)?

And here’s Ehren’s reply:

There’s a lot involved in answering your question, so let me back up and explain a few things first.

Caching
Power Query (in both Excel and Power BI Desktop) utilizes a "persistent cache", stored on disk, when refreshing queries. But what exactly does that mean?
First, let’s look at what gets cached. The persistent cache does not store the results of your M Queries (Q1, Q2, etc. in your example). Instead, it stores the results of the behind-the-scenes requests sent to data sources.

So if Q1 queries a SQL Server database called "MyServer/MyDatabase" and returns a single unfiltered table called "MyTable", the query sent to the server might be "select [Col1] from [MyTable]". In this case, the persistent cache will now know the result of sending "select [Col1] from [MyTable]" to "MyServer/MyDatabase". If another M query (whether through referencing Q1, or by querying the same table directly) needs the same result, the persistent cache can provide it, and the result won’t have to be fetched a second time from the SQL Server.
"Great," you might say. "So if I’m pulling from a flat file in Q1, and in a few places in Q2 I need to do Table.RowCount(Q1), the file should only be read from disk once, right?" And the answer would be…no. This is because not all data sources are cached. Specifically, the results of calls to File.Contents are not stored in the persistent cache. Why not? Well, the cache is stored on disk, and caching local files (which are already on disk) elsewhere on disk doesn’t really make sense. (Using Table.Buffer in this context may help…see more on Table.Buffer below.)
"Okay", you might say. "But if Q1 is pulling from a SQL table, and in a few places in Q2 I reference Q1, that should hit the persistent cache, right?" Maybe. It depends on how Q2 is using Q1, since doing additional operations on Q1 (such as filtering or merging) might cause the M engine to compute a different SQL query, resulting in the server being hit again.
Next, let’s look at the scope of caching. The scope of caching differs depending on what you’re doing, as well as what tool you’re using.

Previewing
If you’ve opened the Power Query editor in Excel or Power BI Desktop, you might have seen warnings like "This preview may be up to 3 days old". This is because there is a persistent cache used for interactive previewing of query results. As you can imagine from the fact that we have warnings about preview results being days old, this cache is long-lived and is intended to make the experience of working in the editor faster.

Loading to Excel
If you load/refresh three queries in Excel, each of them gets their own persistent cache. So the fact that a SQL result is cached during the load of Q2 won’t benefit the loading of Q3, even if it needs the same result.

Loading to Power BI Desktop
If you load/refresh three queries in PBI Desktop, they all share a single persistent cache. When you refresh multiple times, each refresh operation gets its own cache (shared by all the queries being refreshed at that particular time). This means that if SQL result is cached during the load of Q2, it will still be cached during the loading of Q3 (assuming they’re both being loaded at the same time).

What about Table.Buffer?
Table.Buffer can be useful if you want to store an intermediate result in memory and avoid pulling content from disk, a remote file share, a SQL Server, a website, or any other data source multiple times during an evaluation.  Think of Table.Buffer as, "load this table into memory, and stop folding subsequent operations back to the data source".

However, because buffering happens in memory and is not persisted on disk, buffering during the load of one query does not affect the load of another query. If Q1 is buffered when Q2 is loaded, Q1 will be re-buffered when Q3 is loaded.

And now to answer your question…
Now let’s take a look at your example (Q4 references Q2 and Q3, and Q2 and Q3 both reference Q1).
Since you’re pulling from a flat file, and File.Contents results aren’t cached, the flat file will be read each time Q1 is referenced (twice in Q4, once in Q3, Q2, and Q1). If you buffered the result of Q1, then Q4 would only read the file once. But when Q1, Q2, and Q3 are loaded (even in PBI Desktop), they will still each also read the file.

What about immutability?
You asked about the fact that M values are supposed to be immutable. This is true for the "pure" parts of the language, but breaks down when you introduce external data sources and folding. (In fact, you could think of Table.Buffer as transferring a table from the fuzzy unpredictable world of folding to the immutable world of pure M values.) You can see this in action by doing the following test, using a query called "MyTextFileQuery" that pulls from a local file on disk.

Reads the file five times
= Table.RowCount(MyTextFileQuery) + Table.RowCount(MyTextFileQuery) + Table.RowCount(MyTextFileQuery) + Table.RowCount(MyTextFileQuery) + Table.RowCount(MyTextFileQuery)

Reads the file once
= let rowCount = Table.RowCount(MyTextFileQuery) in rowCount + rowCount + rowCount + rowCount + rowCount

I’ve read this response several times and it’s still sinking in, but clearly there are some important implications here for anyone doing more advanced data loading work in Power Query and Power BI. I’m sure it will be the inspiration for many future blog posts on tuning Power Query query performance.

Nested Variables In DAX

Last week, at the SQL Server Days conference in Belgium, Kasper mentioned in his presentation that it was possible to define variables inside variables in DAX. So, for example, you could define a measure like so:

MyMeasure = 
var Outer1 = 
               var Inner1 = 1
               var Inner2 = 2
               return Inner1 + Inner2
var Outer2 = 3
return Outer1 + Outer2

This measure returns 6 as you might expect:

image

There aren’t any performance benefits to doing this, although of course it helps with code readability and organisation (thanks to Marius for confirming this).

With my newly rekindled love of DAX I thought this was quite interesting. I’m not really sure why though, given that it’s not particularly useful; I think Matt might be right:

image

Using The Invoke Custom Function Button In Power BI

There are a lot of cool new features in the September 2016 update for Power BI, so many in fact that several of the really important changes in the Query Editor are in danger of going unnoticed. In this post I want to walk through how to use the new Invoke Custom Function button in the Query Editor in Power BI and explain why it’s such a useful thing to have.

More advanced Power BI users will know that a lot of data loading patterns involve using custom M functions. In the past implementing these patterns involved learning M both for writing functions and also for invoking those functions. A few months ago Power BI introduced the ability to automatically generate functions from queries that use parameters, without needing to write code, and now with the latest update we can also invoke functions easily by clicking a button. This means that a lot more advanced data loading patterns are now available to users who don’t know any M and there’s even less need for someone like me to open the Advanced Editor window and start writing code.

Let’s take a look at how this works with a very simple example. Say you have a table that contains sales data, with the number of units sold and the price paid:

image

You now want to add a new column to this table that calculates the sales value as Units * Price, and you have a function to do this. Here’s what the M code for that function (called “Calculate Value”) could look like:

(Units as number, Price as number) => Units * Price

image

With a query that returns your sales data and another query that returns the Calculate Value function, you can easily create a new column on the sales data query and invoke the function for each row. Go to the Sales query, go to the Add Column tab on the ribbon, and click Invoke Custom Function:

image

You’ll see the Invoke Custom Function dialog appear. Here you can choose the query that returns the function you want to use and enter the values you want to pass to that functions’ parameters. At the moment you can type in a value or pass values from a column in the table you’re invoking (strangely enough you don’t seem to be able to use Power BI parameters here yet though?):

image

Click OK and the function is invoked for every row in the table:

image

To take a more realistic example, look at this post – this new functionality replaces the step where I create a new Custom Column and invoke the GetSheet1() function that I created.

This is why Power BI is so successful: Microsoft are not only implementing high-profile wow features but also adding the obscure, unsexy features that nonetheless make a real difference to the productivity of advanced users.