Pre-Conference Training Day On Data Loading With Power BI/Excel Get & Transform/Power Query At PASS Summit

I’m pleased to announce that I’m running a pre-conference training day on “Loading and transforming data with Power BI and Power Query” at this year’s PASS Summit in Seattle. For the full agenda and details on how to register, see:
http://www.pass.org/summit/2017/Sessions/Details.aspx?sid=66485

This full-day session will teach you everything you need to know about loading data into Power BI or Excel; it’s specifically focussed on self-service BI scenarios, but if you want to learn M for SSAS 2017 you might find some of what I cover useful too. Topics covered include using the UI to build queries; using parameters and functions; the M language; data privacy settings; and creating custom data connectors. I hope to see you there!

Query Folding And Writing Your Own SQL Queries In Power Query/Power BI/Excel Get & Transform

When you connect to a relational database like SQL Server in Power BI/Power Query/Excel Get & Transform you have two choices about how to get the data you need:

  1. You can choose a table from the database and then either use the Query Editor UI or write some M to get the data you need from that table. For example, you might choose a table that has one row for every product that your company sells and then, using the UI, filter that down to only the products that are red.
  2. You can enter a SQL query that gets the data you need.

Something that you might not realise is that if you choose the second option and then subsequently use the UI to apply even more filtering or transformation, then those subsequent steps will not be able to make use of query folding.

As an example of option (1), imagine you connect to the DimProduct table in the SQL Server Adventure Works DW database like so:

image

image

The following M query is generated by the Query Editor when you filter the table to only return the red products and remove all columns except EnglishProductName. That’s very easy to do so I won’t describe it, but here’s the M:

let
    Source = 
	Sql.Databases("localhost"),
    #"Adventure Works DW" = 
	Source{
		[Name="Adventure Works DW"]
	}[Data],
    dbo_DimProduct = 
	#"Adventure Works DW"{
		[Schema="dbo",Item="DimProduct"]
	}[Data],
    #"Filtered Rows" = 
	Table.SelectRows(
		dbo_DimProduct, 
		each ([Color] = "Red")
	),
    #"Removed Other Columns" = 
	Table.SelectColumns(
		#"Filtered Rows",
		{"EnglishProductName"}
	)
in
    #"Removed Other Columns"

image

Using the View Native Query option, you can find out that the following SQL is generated to get this data:

select [_].[EnglishProductName]
from [dbo].[DimProduct] as [_]
where [_].[Color] = 'Red'

image

image

It’s pretty clear that query folding is taking place for the filter on “red” and for the selection of the required column.

However, if you enter the following SQL query when you first connect to the database:

select * from dimproduct

image

And then, after that, filter the table and remove columns in exactly the same way, you get the following M query:

let
    Source = 
	Sql.Database(
		"localhost", 
		"Adventure Works DW", 
		[Query="select * from dimproduct"]),
    #"Filtered Rows" = 
	Table.SelectRows(
		Source, 
		each ([Color] = "Red")),
    #"Removed Other Columns" = 
	Table.SelectColumns(
		#"Filtered Rows",
		{"EnglishProductName"})
in
    #"Removed Other Columns"

If you now try to use the View Native Query option on either the Removed Other Columns or Filtered Rows steps you’ll find it’s greyed out, indicating query folding is not taking place for those steps:

image

The query you enter is run and then Power BI applies the filter and selects the column itself in the resultset that the SQL query returns.

This obviously has big implications for performance. The lesson here is that if you’re going to write your own SQL query in the Query Editor, you should make sure it does all of the expensive filters and transformations you need because anything else you do in the query will happen outside the database in Power BI or Excel.

Power BI Custom Data Connectors

Amid all the other announcements at Microsoft Build today came one I’ve been waiting for a long time: we can now build our own custom data connectors for Power BI. Here’s the announcement on the Power BI blog:

https://powerbi.microsoft.com/en-us/blog/data-connectors-developer-preview/

…and here are the docs:

https://github.com/Microsoft/DataConnectors

Although it might not sound that exciting, I think this is one of the biggest things that’s happened to Power BI in a long time. Microsoft’s most successful products are always the ones that are platforms for its partners to build their own solutions on top of, and Power BI is no exception. Custom visuals have made an important contribution to the success of Power BI as a whole, allowing third parties to build visuals that Microsoft either didn’t have the resources to build itself or didn’t realise people wanted. Similarly, custom data connectors will only increase the already impressive number of data sources that Power BI can connect to. My guess is that a lot of SaaS and data providers will want to build their own custom data connectors, and that the number of custom data connectors available is going to explode. Why go to all the trouble of trying to build BI and reporting features into your product when it’s quicker and easier to send the data to Power BI and your users get a much better experience as a result?

This doesn’t just affect Power BI though. Since the new Get & Transform functionality for loading data into Excel (see here) is based on the same Power Query technology that is used by Power BI, these custom data connectors will at some point work with Excel too. Although it will be a while before most Excel users upgrade to a version that supports custom data connectors, when that happens the sheer size of the Excel market means that there will be even more demand for custom connectors and this will in turn strengthen the Power BI ecosystem. Don’t forget that Analysis Services Tabular 2017 also now uses Power Query for loading data, so this will benefit corporate BI users too.

From a purely technical point of view today’s announcement included some welcome new capabilities for M coders like me. These include Visual Studio integration (intellisense for M!) and the ability to connect to web services and use OAuth for authentication, something that wasn’t really possible before. It’s also interesting that you can create custom data connectors that enable DirectQuery connections from Power BI, and that you aren’t limited to importing data.

I can’t wait to start using all these features and I’m sure they will be the subject of many future blog posts. Also, if you’re looking to hire someone with a good understanding of M to build a custom data connector, I’m your guy. You can find my contact details at http://www.crossjoin.co.uk/

DAX Functions, DirectQuery And Unrestricted Measures

If you use Power BI to connect to a data source such as SQL Server in DirectQuery mode, you’ll find that by default you cannot use the complete range of DAX functions inside your calculations. For example, here’s a screenshot from Power BI Desktop in DirectQuery mode with a measure where I’m trying to use the RankX() function:

image

Whenever you get an error message like “Function ‘RANKX’ is not supported in DirectQuery mode” you know you’re trying to use a DAX function that isn’t allowed by default in DirectQuery mode.

Why does this limitation exist? Well, as the documentation points out here:

https://powerbi.microsoft.com/en-us/documentation/powerbi-desktop-use-directquery/#limitations-of-directquery

…this is done for performance reasons. In Power BI, every time you interact with a report (by opening it, clicking on a slicer etc) a series of DAX queries is sent back to the Data Model to get the data needed by the report. In DirectQuery mode, all of these DAX queries – including any calculations – are translated into queries that are sent back to the data source. If that data source is SQL Server, then Power BI generates SQL queries. Not everything that you can do in a DAX calculation can be translated back into efficient SQL, so to prevent you from accidentally building calculations that will perform badly in DirectQuery mode Power BI tries to prevent you from using any DAX functions that it thinks it can’t generate efficient SQL for.

You can remove this restriction by clicking File/Options and Settings/Options to open the Options dialog and then going to the DirectQuery pane and selecting “Allow unrestricted measures in DirectQuery mode”:

image

There is no documented list of functions that work in DirectQuery mode (probably because it might change with each release) but there is a way to get one. Using SQL Server Management Studio it is possible to connect to Power BI Desktop and run DMV queries against the Power BI Data Model. If you query the MDSCHEMA_FUNCTIONS DMV like so:

select * from $SYSTEM.MDSCHEMA_FUNCTIONS

You’ll see that it returns a list of all of the DAX and MDX functions understood by Power BI; the last column, called DIRECTQUERY_PUSHABLE contains information used by Power BI to decide whether a calculation is available in DirectQuery mode or not:

image

[At the time of writing this query doesn’t work in DAX Studio for some reason, I think because of the data returned in the PARAMETERINFO column, but if you try to restrict the rows or columns returned by the DMV in any way the DIRECTQUERY_PUSHABLE column only returns empty values. This is why I say you have to use SQL Server Management Studio]

For the DAX functions in the list, if DIRECTQUERY_PUSHABLE is 0 then it is not available in DirectQuery mode by default. If it returns 1 then it is available in measures but not calculated columns; if it returns 3 it is available in both measures and calculated columns (I guess 2 means that it is only available in calculated columns, but there are no functions with this property value right now). However this is not a definitive guide to whether a function is actually ‘pushable’ or not: some functions could be pushable in some scenarios, or with certain combinations of parameters, and not others and the UI may perform additional checks to decide whether a function can be used.

Thanks to Jeffrey Wang and Marius Dumitru for answering my questions about this.

By the way, if you’re reading this, you might also be interested in this post I wrote a while back on the functionality available in the Query Editor in DirectQuery mode:

https://blog.crossjoin.co.uk/2015/11/11/power-bi-directquery-mode-not-just-ssas-directquery-v2-0/

UPDATE – as Marco points out in his comments below, there *is* documentation on which functions are supported in DirectQuery mode now. I just didn’t know about it…

Using Text.BetweenDelimiters() To Extract URLs From A Web Page In Power BI/Power Query M

The Add Column By Example functionality that appeared in the April 2017 release of Power BI Desktop is, I have to say, very cool: I’ve used it quite a bit in the last few weeks and it really does work well. One thing I noticed while using it is that three new functions have been added to the M language to support this feature:

  • Text.BetweenDelimiters() which takes a text value and extracts the part that appears between two given delimiter values
  • Text.BeforeDelimiter() which takes a text value and extracts the part that appears before a given delimiter value
  • Text.AfterDelimiter() which takes a text value and extracts the part that appears after a given delimiter value

The functions themselves are quite straightforward and the online documentation has some good examples of how to use them. To save you the click here’s an extra example – the expression:

Text.BetweenDelimiters("Hello *world!??", "*", "!")

…returns the text “world”:

image

As it happens last week I received an email from a reader who wanted to know if it was possible to extract all the links from the href attributes in the source of a web page using M, and I realised that Text.BetweenDelimiters() would be very useful for doing this. I wrote the following M function to demonstrate:

(SourceURL as text, AttributeDelimiter as text) =>
let
	//Get HTML source
    Source = Text.FromBinary(Web.Contents(SourceURL)),
	//Function to find each link
    GetLink = (Counter as number) =>
                    let
                        CurrentLink = 
		Text.BetweenDelimiters(
			Source, 
			"href=" & AttributeDelimiter, 
			AttributeDelimiter,
			Counter
		)
                    in
                        if CurrentLink="" 
		then 
			{} 
		else 
			List.Combine({
				{CurrentLink}, 
				@GetLink(Counter+1)}
			),
	//Call function
    Output = GetLink(0)
in
    Output

A few things to note:

  • I’m using a combination of Text.FromBinary() and Web.Contents() to get the HTML source for the web page whose links we’re extracting
  • Since HTML allows the use of single and double quotes for attributes, I’ve added a parameter to my function called AttributeDelimiter to allow either to be passed in
  • Text.BetweenDelimiters only extracts one piece of text at a time, but you can specify which occurrence of the start delimiter it uses. I therefore used recursion to extract the contents of every href attribute in the HTML: I declare a function called GetLink, and from within that function I can make a recursive call by putting an @ before the function name as in line 22 above. It would probably be better to use List.Generate() instead of recursion here though.

Assuming the query that returns this function is called GetAllLinks

image

…then it can be called in a new query like so:

GetAllLinks(
	"https://msdn.microsoft.com/en-us/library/mt798303.aspx", 
	""""
)

One other thing to point out is how, in order to pass a double quote character to the function as text, since text has itself to be enclosed in double quotes I need to use four double quotes: “”””

The output of this query is a list containing all of the links from the href attributes on the page that are enclosed in double quotes:

image

I guess this could be taken even further to create a function that crawls a series of web pages and returns the links in all of them, then uses the Force Directed Graph custom visual or better still NodeXL in Excel to show which pages link to each other. I’ll leave that to someone else to do though…!

You can download a pbix file with all of the examples in this post here.

Power BI, SSAS Multidimensional And Dynamic Format Strings

If you’re building reports in Power BI against SSAS Multidimensional cubes then you may have encountered situations where the formatting on your measures disappears. For example, take a very simple SSAS Multidimensional cube with a single measure called Sales Amount whose FormatString property is set in SSDT to display values with a £ sign:

image

When you build a report using the Table visualisation in Power BI Desktop using this measure, the formatted values are displayed correctly:

image

However, if you add a SCOPE statement to the cube to alter the format string of the measure for certain cells, as in this example which sets the format string for the Sales Amount measure to $ for Bikes:

SCOPE([Measures].[Sales Amount], [Product].[Category].&[1]);
    FORMAT_STRING(THIS)="$0,0.00";
END SCOPE;

…then you’ll find that Power BI displays no formatting at all for the measure:

image

What’s more (and this is a bit strange) if you look at the DAX queries that are generated by Power BI to get data from the cube, they now request a new column to get the format string for the measure even though that format string isn’t used. Since it increases the amount of data returned by the query much larger, this extra column can have a negative impact on query performance if you’re bringing back large amounts of data.

There is no way of avoiding this problem at the moment, unfortunately. If you need to display formatted values in Power BI you will have to create a calculated measure that returns the value of your original measure, set the format string property on that calculated measure appropriately, and use that calculated measure in your Power BI reports instead:

SCOPE([Measures].[Sales Amount], [Product].[Category].&[1]);
    FORMAT_STRING(THIS)="$0,0.00";
END SCOPE;

CREATE MEMBER CURRENTCUBE.[Measures].[Test] AS 
[Measures].[Sales Amount],
FORMAT_STRING="£0,0.00";

image

Thanks to Kevin Jourdain for bringing this to my attention and telling me about the workaround, and also to Greg Galloway for confirming the workaround and providing extra details.

The DAX Unichar() Function And How To Use It In Measures For Data Visualisation

A few weeks ago I was asked whether it was possible to display line breaks in text in a Power BI visualisation. It turns out it isn’t possible – at the moment Power BI always strips line breaks out of text when it gets loaded into the Data Model. However while researching this I came across the DAX Unichar() function, which returns the unicode character associated with an integer value – and which also seems to be completely undocumented for some reason, I guess because it’s new (it isn’t in Excel 2016 DAX yet as far as I can see).

It’s very straightforward to use: for example, the DAX expression UNICHAR(65) returns the character A; see here for a list of unicode characters and their associated codes. You can have a lot of fun with this function in Power BI when you use it to return symbols that in turn can be used to represent data, so I thought I would put together a few examples to show you.

Take the following table which contains scores for restaurants in the range of 0 to 5:

image

The following measure:

Stars = 
REPT(UNICHAR(9733), AVERAGE('Restaurants'[Score])) 
& 
REPT(UNICHAR(9734), 5-AVERAGE('Restaurants'[Score]))

…which uses the Unichar() function to return characters 9733 and 9734, filled and unfilled stars, and the Rept() function to return a string with those characters repeated N times, can be used to create a table like this in Power BI:

image

[I’m sure I read a blog post somewhere that describes this trick with Rept() but I can’t find it anywhere – if you know the one I’m talking about please leave a link in the comments]

Similarly, with the following source data showing the days that customers made a purchase in a week:

image

…you can use the following measure, which returns characters 9635 (a square with a black dot inside) and 9634 (an empty black square), in a matrix to visualise this information:

Purchase Indicator = 
IF(
	COUNTROWS('Purchase Days')>0, 
		UNICHAR(9635), 
		UNICHAR(9634)
) 
& REPT(" ", 5)

image

Finally, an example using the box drawing unicode block to visualise the following date ranges (nb the dates are in dd/mm/yyyy format). Here’s the source data:

image

Here’s the measure, which uses characters 9500, 9472 and 9508 to draw bars:

Employment Range = 
VAR OverallMinimumDate = 
	CALCULATE(
		MIN('Employment Dates'[Start Date]), 
		ALLSELECTED('Employment Dates'))
VAR DaysBeforeStart = 
	MAX('Employment Dates'[Start Date]) - 
	OverallMinimumDate
VAR DaysBetweenStartAndEnd = 
	MAX('Employment Dates'[End Date]) - 
	MAX('Employment Dates'[Start Date])
VAR BarsBetween = 
	IF(DaysBetweenStartAndEnd>1, 
		DaysBetweenStartAndEnd-2, 
		0)
RETURN 
	REPT(" ", DaysBeforeStart) & 
	UNICHAR(9500) & 
	REPT(UNICHAR(9472), BarsBetween) & 
	UNICHAR(9508)

And here’s the output in a table:

image

You can download the Power BI .pbix file with these examples in here.

Is this going to revolutionise how you design reports? No of course not, but I think it could be a useful trick in certain scenarios. If you do come up with other creative ways to use unicode characters in your reports I would be interested to see the results!